kylin2.4.1訂單案例詳細構建流程

1、Hive訂單數據倉庫構建:css

hive表建立能夠在命令行中直接完成,也能夠在Hue中完成,本文在Hue中的完成,以下圖:html

 下文的樣例文本文件下載地址:https://files-cdn.cnblogs.com/files/qqflying/KylinData.zipui

1. 建立事實表並插入數據編碼

執行1: DROP TABLE IF EXISTS default.fact_order ;spa

執行2:.net

create table default.fact_order (
time_key string,
product_key string,
salesperson_key string,
custom_key string,
quantity_ordered bigint,
order_dollars bigint,
cost_dollars bigint
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS TEXTFILE;命令行

執行3:load data local inpath '/data/fact_order.txt' overwrite into table default.fact_order;3d

 

fact_order.txtcode

2016-05-01,pd001,sp001,ct001,100,2000,1000
2016-05-01,pd001,sp002,ct002,100,2000,1000
2016-05-01,pd001,sp003,ct002,100,2000,1000
2016-05-01,pd002,sp002,ct002,100,2000,1000
2016-05-01,pd003,sp003,ct001,100,2000,1000
2016-05-01,pd001,sp003,ct001,100,2000,1000
2016-05-01,pd001,sp002,ct001,100,2000,1000
2016-05-01,pd001,sp003,ct002,100,2000,1000
2016-05-01,pd002,sp001,ct001,100,2000,1000
2016-05-01,pd003,sp001,ct001,100,2000,1000
2016-05-01,pd004,sp001,ct001,50,1000,600
2016-05-02,pd001,sp001,ct001,50,1000,600
2016-05-02,pd001,sp002,ct002,100,2000,1000
2016-05-02,pd001,sp003,ct002,100,2000,1000
2016-05-02,pd002,sp001,ct001,50,1000,600
2016-05-02,pd003,sp001,ct001,50,1000,600
2016-05-02,pd004,sp001,ct001,50,1000,600
2016-05-03,pd001,sp001,ct001,50,1000,600
2016-05-03,pd001,sp002,ct002,100,2000,1000
2016-05-03,pd001,sp003,ct002,100,2000,1000
2016-05-04,pd002,sp001,ct001,700,14000,10000
2016-05-04,pd003,sp001,ct001,700,14000,10000
2016-05-04,pd004,sp001,ct001,100,2000,1000
2016-05-05,pd001,sp001,ct001,100,2000,1000
2016-05-05,pd001,sp002,ct002,700,14000,10000
2016-05-05,pd001,sp003,ct002,700,14000,10000
2016-05-05,pd002,sp001,ct001,100,2000,1000
2016-05-05,pd003,sp001,ct001,100,2000,1000
2016-05-05,pd004,sp001,ct001,100,2000,1000
2016-05-06,pd001,sp001,ct001,100,2000,1000
2016-05-06,pd001,sp002,ct002,100,2000,1000
2016-05-06,pd001,sp003,ct002,100,2000,1000
2016-05-07,pd002,sp001,ct001,100,2000,1000
2016-05-07,pd003,sp001,ct001,100,2000,1000
2016-05-07,pd004,sp001,ct001,50,1000,600
2016-05-07,pd002,sp001,ct001,100,2000,1000
2016-05-07,pd003,sp001,ct001,100,2000,1000
2016-05-07,pd004,sp001,ct001,50,1000,600
2016-05-08,pd001,sp001,ct001,50,1000,600
2016-05-08,pd001,sp002,ct002,100,2000,1000
2016-05-08,pd001,sp003,ct002,100,2000,1000
2016-05-08,pd001,sp001,ct001,50,1000,600
2016-05-08,pd001,sp002,ct002,100,2000,1000
2016-05-08,pd001,sp003,ct002,100,2000,1000
2016-05-08,pd001,sp001,ct001,50,1000,600
2016-05-08,pd001,sp002,ct002,100,2000,1000
2016-05-08,pd001,sp003,ct002,100,2000,1000
2016-05-09,pd002,sp001,ct001,50,1000,600
2016-05-09,pd003,sp001,ct001,50,1000,600
2016-05-09,pd004,sp001,ct001,50,1000,600
2016-05-09,pd001,sp001,ct001,50,1000,600
2016-05-09,pd002,sp001,ct001,50,1000,600
2016-05-09,pd003,sp001,ct001,50,1000,600
2016-05-09,pd004,sp001,ct001,50,1000,600
2016-05-09,pd001,sp001,ct001,50,1000,600
2016-05-09,pd001,sp002,ct002,100,2000,1000
2016-05-09,pd004,sp003,ct002,100,2000,1000
2016-05-09,pd002,sp001,ct001,700,14000,10000
2016-05-09,pd003,sp003,ct001,700,14000,10000
2016-05-09,pd004,sp003,ct001,100,2000,1000
2016-05-10,pd001,sp001,ct001,100,2000,1000
2016-05-10,pd001,sp002,ct002,700,14000,10000
2016-05-10,pd001,sp003,ct002,700,14000,10000
2016-05-10,pd002,sp001,ct001,100,2000,1000
2016-05-11,pd003,sp003,ct001,100,2000,1000
2016-05-11,pd004,sp001,ct001,100,2000,1000
2016-05-12,pd001,sp001,ct001,100,2000,1000
2016-05-12,pd004,sp002,ct002,100,2000,1000
2016-05-12,pd001,sp003,ct002,100,2000,1000
2016-05-12,pd001,sp001,ct001,100,2000,1000
2016-05-12,pd004,sp002,ct002,100,2000,1000
2016-05-12,pd001,sp003,ct002,100,2000,1000
2016-05-13,pd002,sp001,ct001,100,2000,1000
2016-05-13,pd003,sp001,ct001,100,2000,1000
2016-05-13,pd004,sp001,ct001,50,1000,600
2016-05-14,pd001,sp001,ct001,50,1000,600
2016-05-14,pd001,sp002,ct002,100,2000,1000
2016-05-14,pd001,sp003,ct002,100,2000,1000
2016-05-15,pd002,sp001,ct001,50,1000,600
2016-05-15,pd003,sp001,ct001,50,1000,600
2016-05-15,pd004,sp001,ct001,50,1000,600
2016-05-15,pd002,sp001,ct001,50,1000,600
2016-05-15,pd003,sp001,ct001,50,1000,600
2016-05-15,pd004,sp001,ct001,50,1000,600
2016-05-15,pd002,sp001,ct001,50,1000,600
2016-05-15,pd003,sp001,ct001,50,1000,600
2016-05-15,pd004,sp001,ct001,50,1000,600
2016-05-16,pd001,sp001,ct001,50,1000,600
2016-05-16,pd001,sp002,ct002,100,2000,1000
2016-05-16,pd001,sp003,ct002,100,2000,1000
2016-05-16,pd001,sp001,ct001,50,1000,600
2016-05-16,pd001,sp002,ct002,100,2000,1000
2016-05-16,pd001,sp003,ct002,100,2000,1000
2016-05-17,pd002,sp001,ct001,700,14000,10000
2016-05-17,pd003,sp001,ct001,700,14000,10000
2016-05-17,pd004,sp001,ct001,100,2000,1000
2016-05-17,pd002,sp001,ct001,700,14000,10000
2016-05-17,pd003,sp001,ct001,700,14000,10000
2016-05-17,pd004,sp001,ct001,100,2000,1000
2016-05-18,pd001,sp001,ct001,100,2000,1000
2016-05-18,pd003,sp002,ct001,700,14000,10000
2016-05-18,pd001,sp003,ct002,700,14000,10000
2016-05-19,pd002,sp001,ct001,100,2000,1000
2016-05-19,pd003,sp001,ct002,100,2000,1000
2016-05-20,pd001,sp001,ct001,100,2000,1000
2016-05-20,pd002,sp002,ct002,100,2000,1000
2016-05-20,pd003,sp003,ct001,100,2000,1000
2016-05-20,pd004,sp001,ct001,100,2000,1000
2016-05-20,pd001,sp002,ct002,100,2000,1000
2016-05-20,pd002,sp001,ct002,100,2000,1000orm

2. 建立天維度表dim_day(一樣也分三步執行)

DROP TABLE IF EXISTS default.dim_day ;

create table default.dim_day (
day_key string,
full_day string,
month_name string,
quarter string,
year string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS TEXTFILE; 
load data local inpath '/data/dim_day.txt' overwrite into table default.dim_day;

 dim_day.txt
  
2016-05-01,2016-05-01,201605,2016q2,2016
2016-05-02,2016-05-02,201605,2016q2,2016
2016-05-03,2016-05-03,201605,2016q2,2016
2016-05-04,2016-05-04,201605,2016q2,2016
2016-05-05,2016-05-05,201605,2016q2,2016
2016-05-06,2016-05-06,201605,2016q2,2016
2016-05-07,2016-05-07,201605,2016q2,2016
2016-05-08,2016-05-08,201605,2016q2,2016
2016-05-09,2016-05-09,201605,2016q2,2016
2016-05-10,2016-05-10,201605,2016q2,2016
2016-05-11,2016-05-11,201605,2016q2,2016
2016-05-12,2016-05-12,201605,2016q2,2016
2016-05-13,2016-05-13,201605,2016q2,2016
2016-05-14,2016-05-14,201605,2016q2,2016
2016-05-15,2016-05-15,201605,2016q2,2016
2016-05-16,2016-05-16,201605,2016q2,2016
2016-05-17,2016-05-17,201605,2016q2,2016
2016-05-18,2016-05-18,201605,2016q2,2016
2016-05-19,2016-05-19,201605,2016q2,2016
2016-05-20,2016-05-20,201605,2016q2,2016

3. 建立售賣員的維度表salesperson_dim
 
DROP TABLE IF EXISTS default.dim_salesperson ;
 
create table default.dim_salesperson (
salesperson_key string,
salesperson string,
salesperson_id string,
region string,
region_code string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS TEXTFILE;
  
load data local inpath '/data/dim_salesperson.txt' overwrite into table default.dim_salesperson;
  
 dim_salesperson.txt
  
sp001,hongbin,sp001,beijing,10086
sp002,hongming,sp002,beijing,10086
sp003,hongmei,sp003,beijing,10086

 

4. 建立客戶維度 custom_dim

 
 DROP TABLE IF EXISTS default.dim_custom ;
  
create table default.dim_custom (
custom_key string,
custom_name string,
custorm_id string,
headquarter_states string,
billing_address string,
billing_city string,
billing_state string,
industry_name string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS TEXTFILE;
 
load data local inpath '/data/dim_custom.txt' overwrite into table default.dim_custom;

 dim_custom.txt
  
ct001,custom_john,ct001,beijing,zgx-beijing,beijing,beijing,internet                    
ct002,custom_herry,ct002,henan,shlinjie,shangdang,henan,internet     
 
 
 
 
5. 建立產品維度表並插入數據
 
 DROP TABLE IF EXISTS default.dim_product ;                                              
                                                                                          
create table default.dim_product (                                                      
product_key string,                                                                 
product_name string,                                                                
product_id string,                                                                  
product_desc string,                                                                
sku string,                                                                         
brand string,                                                                       
brand_code string,                                                                  
brand_manager string,                                                               
category string,                                                                    
category_code string                                                                
)                                                                                       
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','                                           
STORED AS TEXTFILE;                                                                     
                       
load data local inpath '/data/dim_product.txt' overwrite into table default.dim_product;      
 dim_product.txt
  
pd001,Box-Large,pd001,Box-Large-des,large1.0,brand001,brandcode001,brandmanager001,Packing,cate001
pd002,Box-Medium,pd001,Box-Medium-des,medium1.0,brand001,brandcode001,brandmanager001,Packing,cate001
pd003,Box-small,pd001,Box-small-des,small1.0,brand001,brandcode001,brandmanager001,Packing,cate001
pd004,Evelope,pd001,Evelope_des,large3.0,brand001,brandcode001,brandmanager001,Pens,cate002

 
這樣一個星型的結構表在hive中建立完畢, 實際上一個離線的數據倉庫已經完成, 它包含一個主題, 即商品訂單.

 

三.Kylin的Project建立與數據同步
1.單擊"Manage Project" 
2.單擊"New Project"
3.輸入"Project Name", WareHouse_01
4.Submit


1.選擇WareHouse_01,選擇"Data Source" tab頁
2.單擊"Load Hive Table"
3.輸入須要同步的表
  "DEFAULT.FACT_ORDER,DEFAULT.DIM_DAY,DEFAULT.DIM_PRODUCT,DEFAULT.DIM_SALESPERSON,DEFAULT.DIM_CUSTOM"
4.Sync

四.Kylin的Model建立
1.選擇"Models" tab頁,單擊"New Model"
2."Model Name"輸入,WareHouse_01_Model
3.選擇"Fact Table"爲 DEFAULT.FACT_ORDER;再 添加Lookup Table;

 


4.選取每張表的哪些列字段做爲Dimensions
 ID Table Name           Columns
 1 DEFAULT.FACT_ORDER  TIME_KEY PRODUCT_KEY SALESPERSON_KEY CUSTOM_KEY
 2 DEFAULT.DIM_DAY          FULL_DAY
 3 DEFAULT.DIM_PRODUCT  PRODUCT_NAME
 4 DEFAULT.DIM_SALESPERSON  SALESPERSON
 5 DEFAULT.DIM_CUSTOM  CUSTOM_NAME

 

5.選取DEFAULT.FACT_ORDER表的哪些列字段做爲measures
        QUANTITY_ORDERED ORDER_DOLLARS COST_DOLLARS

 

6.a.選取 "Partition Date Column"爲DEFAULT.FACT_ORDER.TIME_KEY,格式 yyyy-MM-dd
  b.對於"Filter"條件,因爲沒有要過濾的條件,故不填寫

 

7.Save

 

五.Kylin的Cube建立

 

1.選擇"Models" tab頁,單擊"New Cube「

2.Cube Info:
          "Model Name"選擇,WareHouse_01_Model
           "Cube Name"輸入,cube01

3.Dismensions:
          單擊"Auto Generator",依據狀況選擇維度的列,全選

4.Measures:
          a.單擊"+Measure",添加要聚合計算的度量,添加: sum(QUANTITY_ORDERED),sum(ORDER_DOLLARS)
          b.Expression: SUM/MIN/MAX/COUNT/COUNT_DISTINCT/TOP_N/RAW
5.Refresh Setting:
          a.Auto Merge Thresholds,自動合併閾值,7~28 days
   b.Retention Threshold,保留天數,60
   c.Partition Start Date,很是重要,是後面build cube的開始日期

 

6.Advanced Setting:
        --Aggregation Groups:
   a.Includes: TIME_KEY ,PRODUCT_KEY ,SALESPERSON_KEY , CUSTOM_KEY
   b.Mandatory Dimensions: TIME_KEY
   c.Hierarchy Dimensions: PRODUCT_KEY ,SALESPERSON_KEY ,CUSTOM_KEY
   d.Joint Dimensions: 無
       --Rowkeys:
 TIME_KEY ,PRODUCT_KEY ,SALESPERSON_KEY ,CUSTOM_KEY 4個字段爲dict字典編碼
 
7.Configuration Overwrites: 無

8.Overview:
          保存cube

 

五.Cube Build

1.選擇 cube01,單擊」Action」,選擇Build

2.填寫End Date,Submit

3.單擊」Monitor」,觀察Job

4.等待Process100% (Any Errors)

 

5.返回cube01,查看 cube size 和 Source Records等字段更新

 
六.Hive* kyin 查詢對比

點擊(此處)摺疊或打開

  1. 1.2016-05-01到2016-05-15期間的天天的訂單數量,訂單金額,訂單成本
  2. Hive: 65.816 s
  3. select fact.time_key, sum(fact.quantity_ordered), sum(fact.order_dollars) from fact_order as fact 
    where fact.time_key >= "2016-05-01" and fact.time_key <= "2016-05-15" 
    group by fact.time_key order by fact.time_key;
  4. Kylin: 0.32s-->0.27s 
  5. select fact.time_key, sum(fact.quantity_ordered), sum(fact.order_dollars) from fact_order as fact 
    where fact.time_key between '2016-05-01' and '2016-05-15'
    group by fact.time_key order by fact.time_key

 

  1. 2.2016-05-01到2016-05-15期間的天天的產品的訂單量
  2. Hive: 100.336s
  3. select dday.full_day,dsp.product_name, sum(fact.quantity_ordered) from fact_order as fact 
    inner join dim_day as dday on fact.time_key = dday.day_key 
    inner join dim_product as dsp on fact.product_key = dsp.product_key 
    where dday.full_day >= "2016-05-01" and dday.full_day <= "2016-05-15" 
    group by dday.full_day,dsp.product_name
    order by dday.full_day,dsp.product_name;

     

  4. Kylin:0.93s-->0.39s
  5. select dday.full_day,dsp.product_name, sum(fact.quantity_ordered) from fact_order as fact 
    inner join dim_day as dday on fact.time_key = dday.day_key 
    inner join dim_product as dsp on fact.product_key = dsp.product_key 
    where dday.full_day >= '2016-05-01' and dday.full_day <= '2016-05-15' 
    group by dday.full_day,dsp.product_name
    order by dday.full_day,dsp.product_name

本文參考:

http://blog.itpub.net/30089851/viewspace-2122586/

http://www.mamicode.com/info-detail-2332910.html

相關文章
相關標籤/搜索