標籤(空格分隔): 大數據平臺構建node
- 一:kylin 簡介
- 二:安裝配置kylin
- 三:kylin 運行實例
Apache Kylin™是一個開源的分佈式分析引擎,提供Hadoop/Spark之上的SQL查詢接口及多維分析(OLAP)能力以支持超大規模數據,最初由eBay Inc. 開發並貢獻至開源社區。它能在亞秒內查詢巨大的Hive表。
kylin 軟件下載: 社區版kylin下載地址:https://archive.apache.org/dist/kylin/ ,本次測試使用apache-kylin-2.3.1.tar.gz
login: node-01.flyfish tar -zxvf apache-kylin-2.3.1-cdh57-bin.tar.gz -C /usr/local/ cd /usr/local/ mv apache-kylin-2.3.1-bin/ kylin vim /etc/profile ---- ### kylin #### export KYLIN_HOME=/usr/local/kylin PATH=$PATH:$HOME/bin:$KYLIN_HOME/bin --- source /etc/profile
cd /usr/local/kylin/ ./check-env.sh
su - hdfs hdfs dfs -chmod -R 777 /
重新檢測處理 cd /usr/local/kylin/ ./check-env.sh
啓動kylin ./kylin.sh start
默認用戶名:ADMIN 密碼:KYLIN
cd /usr/local/kylin/bin ./sample.sh
從啓kylin cd /usr/local/kylin/bin ./kylin.sh stop ./kylin.sh start
從新刷新kylin的元數據
impala的加載表: impala-shell -i "INVALIDATE METADATA" 單獨刷新一張表: refrash + 表名 hive 的default庫 當中多了幾張kylin的表
構建cube
這個地方若是機器配置不夠的話,儘可能日期間隔選小一點。
這一步會比較耗時,由於這步會進行預計算,默認是MapReduce做業。
kylin的數據查詢
查詢構建完成的cube 先運行簡單的count,能夠看到耗時4.12s,再次執行基本在0.5s級,基本是毫秒級別 就能夠查詢出來,這是由於kylin 支持緩存的功能
複雜的查詢: select sum(KYLIN_SALES.PRICE) as price_sum,KYLIN_CATEGORY_GROUPINGS.META_CATEG_NAME,KYLIN_CATEGORY_GROUPINGS.CATEG_LVL2_NAME from KYLIN_SALES inner join KYLIN_CATEGORY_GROUPINGS on KYLIN_SALES.LEAF_CATEG_ID = KYLIN_CATEGORY_GROUPINGS.LEAF_CATEG_ID and KYLIN_SALES.LSTG_SITE_ID = KYLIN_CATEGORY_GROUPINGS.SITE_ID group by KYLIN_CATEGORY_GROUPINGS.META_CATEG_NAME,KYLIN_CATEGORY_GROUPINGS.CATEG_LVL2_NAME order by KYLIN_CATEGORY_GROUPINGS.META_CATEG_NAME asc,KYLIN_CATEGORY_GROUPINGS.CATEG_LVL2_NAME desc
create_table.sql department.csv employee.csv
在hdfs 上面建立上傳目錄 hdfs dfs -mkdir /kylin-test hdfs dfs -put department.csv employee.csv /kylin-test
執行create_table.sql create_table 內容以下 --- DROP TABLE IF EXISTS employee; CREATE TABLE employee( id int, name string, deptId int, age int, salary float ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE; DROP TABLE IF EXISTS department; CREATE TABLE department( id int, name string ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE; LOAD DATA INPATH '/kylin-test/employee.csv' OVERWRITE INTO TABLE employee; LOAD DATA INPATH '/kylin-test/department.csv' OVERWRITE INTO TABLE department; ---
在hive 中執行create_table.sql hive -f create_table.sql
hive -e "use default;select * from employee"
hive -e "use default;select * from department"
加載hive數據到kylin當中
建立model,入project的名稱和描述:
選擇事實表,並點擊add Lookup Table查詢表
選擇維度字段
建立cube
cube建立完成
構建cube
查詢測試: select count(*) from department; select max(salary) from EMPLOYEE;
統計各部門員工薪資總和: select d.ID,sum(e.SALARY) as salary from EMPLOYEE as e left join DEPARTMENT as d on e.DEPTID=d.id group by d.ID order by salary desc