1. 環境準備html
zookeeper3.4.12java
mysql5.7node
hive2.3.4mysql
hadoop2.7.3web
JDK1.8sql
hbase1.3.3shell
2. 集羣規劃數據庫
ip地址 | 機器名 | 角色 |
192.168.1.101 | palo101 | hadoop namenode, hadoop datanode, yarn nodeManager, zookeeper, hive, hbase master,hbase region server, |
192.168.1.102 | palo102 | hadoop secondary namenode, hadoop datanode, yarn nodeManager, yarn resource manager, zookeeper, hive, hbase master,hbase region server |
192.168.1.103 | palo103 | hadoop datanode, yarn nodeManager, zookeeper, hive,hbase region server,mysql |
3. 下載kylin2.6apache
wget http://mirrors.tuna.tsinghua.edu.cn/apache/kylin/apache-kylin-2.6.0/apache-kylin-2.6.0-bin-hbase1x.tar.gz #下載kylin2.6.0二進制文件 tar -xzvf apache-kylin-2.6.0-bin-hbase1x.tar.gz #解壓kylin2.6.0二進制壓縮包 mv apache-kylin-2.6.0-bin apache-kylin-2.6.0 #將kylin解壓過的文件重命名(去掉最後的bin) mkdir /usr/local/kylin/ #建立目標存放路徑 mv apache-kylin-2.6.0 /usr/local/kylin/ #將kylin2.6.0文件夾移動到/usr/local/kylin目錄下
4. 添加系統環境變量bootstrap
vim /etc/profile
在文件末尾添加
#kylin export KYLIN_HOME=/usr/local/kylin/apache-kylin-2.6.0 export KYLIN_CONF_HOME=$KYLIN_HOME/conf export PATH=:$PATH:$KYLIN_HOME/bin:$CATALINE_HOME/bin export tomcat_root=$KYLIN_HOME/tomcat #變量名小寫 export hive_dependency=$HIVE_HOME/conf:$HIVE_HOME/lib/*:$HCAT_HOME/share/hcatalog/hive-hcatalog-core-2.3.4.jar #變量名小寫
:wq保存退出,並輸入source /etc/profile使環境變量生效
5. 配置kylin
5.1 配置$KYLIN_HOME/bin/kylin.sh
vim $KYLIN_HOME/bin/kylin.sh
在文件開頭添加
export HBASE_CLASSPATH_PREFIX=${tomcat_root}/bin/bootstrap.jar:${tomcat_root}/bin/tomcat-juli.jar:${tomcat_root}/lib/*:$hive_dependency:$HBASE_CLASSPATH_PREFIX
這麼作的目的是爲了加入$hive_dependency環境,解決後續的兩個問題,都是沒有hive依賴的緣由:
a) kylinweb界面load hive表會失敗
b) cube build的第二步會報org/apache/Hadoop/hive/conf/hiveConf的錯誤。
5.2 hadoop壓縮配置
關於snappy壓縮支持問題,若是支持須要事先從新編譯Hadoop源碼,使得native庫支持snappy.使用snappy可以實現一個適合的壓縮比,使得這個運算的中間結果和最終結果都能佔用較小的存儲空間
本例的hadoop不支持snappy壓縮,這個會致使後續cube build報錯。
vim $KYLIN_HOME/conf/Kylin_job_conf.xml
修改配置文件,將配置項mapreduce.map.output.compress,mapreduce.output.fileoutputformat.compress修改成false
<property> <name>mapreduce.map.output.compress</name> <value>false</value> <description>Compress map outputs</description> </property> <property> <name>mapreduce.output.fileoutputformat.compress</name> <value>false</value> <description>Compress the output of a MapReduce job</description> </property>
還有一個關於壓縮的地方須要修改
vim $KYLIN_HOME/conf/kylin.properties
將kylin.hbase.default.compression.codec設置爲none或者註釋掉
#kylin.storage.hbase.compression-codec=none
5.3 主配置$KYLIN_HOME/conf/kylin.properties
vim $KYLIN_HOME/conf/kylin.properties
修改成:
## The metadata store in hbase ##hbase上存儲kylin元數據 kylin.metadata.url=kylin_metadata@hbase ## metadata cache sync retry times ##元數據同步重試次數 kylin.metadata.sync-retries=3 ## Working folder in HDFS, better be qualified absolute path, make sure user has the right permission to this directory ##hdfs上kylin工做目錄 kylin.env.hdfs-working-dir=/kylin ## kylin zk base path kylin.env.zookeeper-base-path=/kylin ## DEV|QA|PROD. DEV will turn on some dev features, QA and PROD has no difference in terms of functions. #kylin.env=DEV ## Kylin server mode, valid value [all, query, job] ##kylin主節點模式,從節點的模式爲query,只有這一點不同 kylin.server.mode=all ## List of web servers in use, this enables one web server instance to sync up with other servers. ##集羣的信息同步 kylin.server.cluster-servers=192.168.1.131:7070,192.168.1.193:7070,192.168.1.194:7070 ## Display timezone on UI,format like[GMT+N or GMT-N] ##改成中國時間 kylin.web.timezone=GMT+8 ## Timeout value for the queries submitted through the Web UI, in milliseconds ##web查詢超時時間(毫秒) kylin.web.query-timeout=300000 ## Max count of concurrent jobs running ##可併發執行的job數量 kylin.job.max-concurrent-jobs=10 #### ENGINE ### ## Time interval to check hadoop job status ##檢查hdfs job的時間間隔(秒) kylin.engine.mr.yarn-check-interval-seconds=10 ## Hive database name for putting the intermediate flat tables ##build cube 產生的Hive中間表存放的數據庫 kylin.source.hive.database-for-flat-table=kylin_flat_db ## The percentage of the sampling, default 100% kylin.job.cubing.inmem.sampling.percent=100 ## Max job retry on error, default 0: no retry kylin.job.retry=0 ## Compression codec for htable, valid value [none, snappy, lzo, gzip, lz4] ##不採用壓縮 kylin.storage.hbase.compression-codec=none ## The cut size for hbase region, in GB. kylin.storage.hbase.region-cut-gb=5 ## The hfile size of GB, smaller hfile leading to the converting hfile MR has more reducers and be faster. ## Set 0 to disable this optimization. kylin.storage.hbase.hfile-size-gb=2 ## The storage for final cube file in hbase kylin.storage.url=hbase ## The prefix of hbase table kylin.storage.hbase.table-name-prefix=KYLIN_ ## The namespace for hbase storage kylin.storage.hbase.namespace=default ###定義kylin用於MR jobs的job.jar包和hbase的協處理jar包,用於提高性能(添加項) kylin.job.jar=/usr/local/kylin/apache-kylin-2.6.0/lib/kylin-job-2.6.0.jar kylin.coprocessor.local.jar=/usr/local/kylin/apache-kylin-2.6.0/lib/kylin-coprocessor-2.6.0.jar
5.4 將配置好的kylin複製到其餘兩臺機器上去
scp -r /usr/local/kylin/ 192.168.1.102:/usr/local scp -r /usr/local/kylin/ 192.168.1.103:/usr/local
5.5 將192.168.1.102,192.168.1.103上的kylin.server.mode改成query
vim $KYLIN_HOME/conf/kylin.properties
修改項爲
kylin.server.mode=query ###kylin主節點模式,從節點的模式爲query,只有這一點不同
6. 啓動kylin
6.1 前提條件:依賴服務先啓動
a) 啓動zookeeper,全部節點運行
$ZOO_KEEPER_HOME/bin/zkServer.sh start
b) 啓動hadoop,主節點運行
$HADOOP_HOME/bin/start-all.sh
c) 啓動JobHistoryserver服務,master主節點啓動.
$HADOOP_HOME/sbin/mr-jobhistory-daemon.sh start historyserver
d) 啓動hivemetastore服務
nohup $HIVE_HOME/bin/hive --service metastore /dev/null 2>&1 &
e) 啓動hbase集羣,主節點啓動
$HBASE_HOME/bin/start-hbase.sh
啓動後的進程爲:
192.168.1.101
[root@palo101 apache-kylin-2.6.0]# jps 62403 NameNode #hdfs NameNode 31013 NodeManager #yarn NodeManager 22325 Kafka 54217 QuorumPeerMain #zookeeper 7274 Jps 62589 DataNode #hadoop datanode 28895 HRegionServer #hbase region server 8440 HMaster #hbase master
192.168.1.102
[root@palo102 ~]# jps 47474 QuorumPeerMain #zookeeper 15203 NodeManager #yarn NodeManager 15061 ResourceManager #yarn ResourceManager 49877 Jps 6694 HRegionServer #hbase region server 7673 Kafka 37517 SecondaryNameNode #hdfs SecondaryNameNode 37359 DataNode #hadoop datanode
192.168.1.103
[root@palo103 ~]# jps 1185 RunJar #hive metastore 62404 NodeManager #yarn NodeManager 47365 HRegionServer #hbase region server 62342 QuorumPeerMain #zookeeper 20952 ManagerBootStrap 52440 Kafka 31801 RunJar #hive thrift server 47901 DataNode #hadoop datanode 36494 Jps
6.2 檢查配置是否正確
$KYLIN_HOME/bin/check-env.sh
[root@palo101 bin]# $KYLIN_HOME/bin/check-env.sh Retrieving hadoop conf dir... KYLIN_HOME is set to /usr/local/kylin/apache-kylin-2.6.0 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/apache-hive-2.3.4-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/apache-hive-2.3.4-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/apache-hive-2.3.4-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
hive依賴檢查find-hive-dependency.sh
hbase依賴檢查find-hbase-dependency.sh
全部的依賴檢查可吃用check-env.sh
6.3 全部節點運行下面命令來啓動kylin
$KYLIN_HOME/bin/kylin.sh start
啓動時若是出現下面的錯誤
Failed to find metadata store by url: kylin_metadata@hbase
解決辦法 爲:
1)將$HBASE_HOME/conf/hbase-site.html的屬性hbase.rootdir改爲與$HADOOP_HOME/etc/hadoop/core-site.xml中的屬性fs.defaultFS一致
2)進入zk的bin的zkCli,將/hbase刪除,而後重啓hbase能夠解決
6.4 登陸kylin
http://192.168.1.101:7070/kylin, 其餘幾臺也能夠登陸,只要切換相應的ip便可
默認登陸名密碼爲:admin/KYLIN
登陸後的主頁面爲:
7 FAQ
7.1 若是遇到相似下面的錯誤
WARNING: Failed to process JAR
[jar:file:/home/hadoop-2.7.3/contrib/capacity-scheduler/.jar!/] for
這個問題只是一些小bug問題把這個腳本的內容改動一下就行了${HADOOP_HOME}/etc/hadoop/hadoop-env.sh,把下面的這一段循環語句給註釋掉
vim ${HADOOP_HOME}/etc/hadoop/hadoop-env.sh
#for f in $HADOOP_HOME/contrib/capacity-scheduler/*.jar; do # if [ "$HADOOP_CLASSPATH" ]; then # export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$f # else # export HADOOP_CLASSPATH=$f # fi #done
7.2 若是遇到Caused by: java.lang.ClassCastException: com.fasterxml.jackson.datatype.joda.JodaModule cannot be cast to com.fasterxml.jackson.databind.Module的錯誤
產生這個問題的緣由是hive使用的jackson-datatype-joda-2.4.6.jar,而kylin使用的是jackson-databind-2.9.5.jar,jar包版本不一致形成的。
hive:
kylin:
解決辦法爲:
mv $HIVE_HOME/lib/jackson-datatype-joda-2.4.6.jar $HIVE_HOME/lib/jackson-datatype-joda-2.4.6.jarback
即不使用hive的這個jar包,詳情請參見https://issues.apache.org/jira/browse/KYLIN-3129
7.3 若是遇到Failed to load keystore type JKS with path conf/.keystore due to (No such file or directory)
解決辦法爲:
打開apache-kylin-2.6.0/tomcat/conf/server.xml文件,把其中的https的配置刪除掉(或者註釋掉)
<!-- <Connector port="7443" protocol="org.apache.coyote.http11.Http11Protocol" maxThreads="150" SSLEnabled="true" scheme="https" secure="true" keystoreFile="conf/.keystore" keystorePass="changeit" clientAuth="false" sslProtocol="TLS" /> -->
8. 簡單使用入門
8.1 執行官方發佈的樣例數據
$KYLIN_HOME/bin/sample.sh
若是出現Restart Kylin Server or click Web UI => System Tab => Reload Metadata to take effect,就說明示例cube建立成功了,如圖:
8.2 重啓kylin或者從新加載元數據讓數據生效
本例中選擇從新加載元數據,操做如圖所示
8.3 進入hive,查看kylin cube表結構
$HIVE_HOME/bin/hive #進入hive shell客戶端 hive>show databases; #查詢hive中數據庫列表 hive>use kylin_flat_db; #切換到kylin的hive數據庫 hive>show tables; #查詢kylin hive數據庫中的全部表
輸入以下:
[druid@palo101 kafka_2.12-2.1.0]$ $HIVE_HOME/bin/hive SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/home/workspace/apache-hive-2.3.4-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/workspace/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Logging initialized using configuration in file:/home/workspace/apache-hive-2.3.4-bin/conf/hive-log4j2.properties Async: true Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. hive> show databases; OK default dw_sales kylin_flat_db ods_sales Time taken: 1.609 seconds, Fetched: 4 row(s) hive> use kylin_flat_db; OK Time taken: 0.036 seconds hive> show tables; OK kylin_account kylin_cal_dt kylin_category_groupings kylin_country kylin_sales Time taken: 0.321 seconds, Fetched: 5 row(s) hive>
再來看hbase
[druid@palo101 kafka_2.12-2.1.0]$ hbase shell SLF4J: Class path contains multiple SLF4J bindings. SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] HBase Shell; enter 'help<RETURN>' for list of supported commands. Type "exit<RETURN>" to leave the HBase Shell Version 1.3.3, rfd0d55b1e5ef54eb9bf60cce1f0a8e4c1da073ef, Sat Nov 17 21:43:34 CST 2018 hbase(main):001:0> list TABLE dev kylin_metadata test 3 row(s) in 0.3180 seconds => ["dev", "kylin_metadata", "test"]
hbase中多了個叫kylin_metadata的表,說明使用官方示例數據的cube已經建立成功了!
8.4 構建cube
刷新http://192.168.1.101:7070/kylin,咱們發現多了個項目learn_kylin
選擇kylin_sales_model,進行構建
能夠在monitor裏查看構建的進度
Build成功以後model裏面會出現storage信息,以前是沒有的,能夠到hbase裏面去找對應的表,同時cube狀態變爲ready,表示可查詢。
8.5 kylin中進行查詢
至此,kylin集羣部署結束。