[kylin] 部署kylin服務

 

官網:html

http://kylin.apache.org/java

社區:node

https://github.com/KylinOLAP/Kylin/issuesmysql

http://apache-kylin.74782.x6.nabble.com/ linux

源碼:git

https://github.com/apache/kylingithub

博客:web

 Apache Kylin的快速數據立方體算法算法

Apache Kylin (v1.5.0) 發佈,全新設計的新一代sql

Apache基金會宣佈Apache Kylin成爲頂級項目

逐層(By Level)算法 VS 逐塊(By Split) 算法

Kylin正式發佈:面向大數據的終極OLAP引擎方案

Apache Kylin在百度地圖的實踐

京東王曉雨:Apache Kylin在雲海的實踐

 

 

1、工具準備

zookeeper3.4.6 (hadoop、hbase 管理工具)
Hadoop.2.7.1
Hbase1.1.4
Kylin1.5.0-HBase1.1.3
Jdk1.7.80
Hive 2.0.0

2、虛擬主機

192.168.200.165 master1
192.168.200.166 master2
192.168.200.167 slave1
192.168.200.168 slave2

3、安裝mysql

查看是否安裝了mysqlmaster1

[root@master1 ~]# ps -aux | grep mysql
Mysql 3632 0.0 0.0 115348 1648?SsApr01 0:00 /bin/sh /wdcloud/app/mysql/bin/mysqld_safe
mysql 4519 0.5 19.8 13895940 1591664 ? Sl Apr01 29:55
/wdcloud/app/mysql/bin/mysqld
--basedir=/wdcloud/app/mysql
--datadir=/wdcloud/data/mysql/data
--plugin-dir=/wdcloud/app/mysql/lib/mysql/plugin
--log-error=/wdcloud/data/mysql/data/mysql-error.log
--open-files-limit=20000
--pid-file=/wdcloud/data/mysql/data/localhost.localdomain.pid 
--socket=/tmp/mysql.sock
--port=3306

查看mysql版本

[root@master1 ~]# mysql --version
mysql  Ver 14.14 Distrib 5.6.29-76.2, for Linux (x86_64) using  6.2

登陸mysql

[root@master1 ~]# mysql -uroot -p
Enter password:

4、安裝jdk

查看安裝版本

[root@master1 ~]# java -version
java version "1.7.0_80"
Java(TM) SE Runtime Environment (build 1.7.0_80-b15)
Java HotSpot(TM) 64-Bit Server VM (build 24.80-b11, mixed mode)

查看安裝位置

[root@master1 ~]# which java
/jdk1.7.0_80/bin/java

5、安裝zookeeper

1.解壓縮zookeeper到根目錄下,進入目錄,建立文件夾data、datalog、logs,配置環境變量

export ZOOKEEPER_HOME=/zookeeper-3.4.6
export PATH=$PATH:$ZOOKEEPER_HOME/bin

2.進入conf文件夾,複製zoo_sample.cfg 爲zoo.cfg

3.修改zoo.cfg,增長紅色內容,保存退出:

tickTime=2000
initLimit=10
syncLimit=5
dataDir=/zookeeper-3.4.6/data
dataLogDir=/zookeeper-3.4.6/datalog
clientPort=2181
server.0=master1:2888:3888
server.1=master2:2888:3888
server.2=slave1:2888:3888
server.3=slave2:2888:3888

修改conf下的log4j.properties文件,配置log文件生成位置

zookeeper.log.dir=/zookeeper-3.4.6/logs
zookeeper.log.file=zookeeper.log
zookeeper.tracelog.dir=/zookeeper-3.4.6/logs
zookeeper.tracelog.file=zookeeper_trace.log

4.進入data文件夾建立文件「myid」,添加內容 0 保存退出

5.分發zookeeper文件夾到各個虛擬主機上的根目錄上

scp –r /zookeeper-3.4.6 hadoop@master2:/
scp –r /zookeeper-3.4.6 hadoop@slave1:/
scp –r /zookeeper-3.4.6 hadoop@ slave2:/

6.修改每臺主機上的myid ,按照順序 master1 的myid 爲0 master2 的myid 1 以此類推。

7.啓動zookeeper

分別進入各個虛擬主機的zookeeper目錄下啓動zk服務

bin/zkServer.sh start

8.分別查詢zookeeper的狀態

bin/zkServer.sh status
[hadoop@master1 ~]$ zkServer.sh status
JMX enabled by default
Using config: /zookeeper-3.4.6/bin/../conf/zoo.cfg
Mode: follower

Leader 是 zookeeper 主機啓動了

Follower是zookeeper 從機啓動了

 9.中止zookeeper

Bin/zkServer stop

 

6、hadoop 高可用部署

1.解壓縮hadoop2.7.1到hadoop家目錄下,進入目錄,建立文件夾tmp,hdfs/name,hdfs/data

2.進入~/hadoop/etc/hadoop,該文件夾包含了hadoop的大部分配置文件。

3.修改hadoop各配置文件以下:

core-site.xml

<configuration>
         <property>
                <name>fs.default.name</name>
                <value>hdfs://master1:9000</value>
                <final>true</final>
        </property>
        <property>
                <name>hadoop.tmp.dir</name>
                <value>/home/hadoop/hadoop/tmp</value>
                <description>A base for other tempory directories</description>
        </property>
        <property>
                <name>io.file.buffer.size</name>
                <value>131702</value>
         </property>
        <property>
                 <name>fs.checkpoint.period</name>
                 <value>3600</value>
                 <description>多長時間記錄一次hdfs的鏡像,默認一小時</description>
        </property>
        <property>
                 <name>fs.checkpoint.size</name>
                 <value>67108864</value>
                 <description>一次記錄多大的size,默認64M</description>
        </property>
</configuration>

hadoop.env.sh

#設置JAVA_HOME
export JAVA_HOME=/jdk1.7.0_80

#設置HADOOP_CONF_DIR
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/etc/hadoop"}

hbase-site.xml

<configuration>
         <property>
                   <name>hbase.rootdir</name>
                   <value>hdfs://master1:9000/hbase</value>
         </property>
         <property>
                   <name>hbase.cluster.distributed</name>
                   <value>true</value>
         </property>
         <property>
                   <name>hbase.zookeeper.quorum</name>
                   <value>master1,master2,slave1, slave2</value>
         </property>
         <property>
                   <name>hbase.zookeeper.property.dataDir</name>
                   <value> /zookeeper-3.4.6 /data</value>
         </property>
         <property>
                   <name>hbase.zookeeper.property.clientPort</name>
                   <value>2181</value>
         </property>
         <property>
                 <name>hbase.coprocessor.user.region.classes</name>
                 <value>org.apache.hadoop.hbase.coprocessor.AggregateImplementation</value>
         </property>
         <property>
                   <name>hbase.regionserver.wal.codec</name>
                   <value>org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec</value>
         </property>
         <property>
                   <name>hbase.master.loadbalancer.class</name>
                 <value>org.apache.phoenix.hbase.index.balancer.IndexLoadBalancer</value>
         </property>
         <property>
                   <name>hbase.coprocessor.master.classes</name>
                   <value>org.apache.phoenix.hbase.index.master.IndexMasterObserver</value>
         </property>
         <property>
                   <name>phoenix.query.maxServerCacheBytes</name>
                   <value>1073741824</value>
         </property>
         <property>
                   <name>hbase.client.scanner.caching</name>
                   <value>5000</value>
                   <description>HBase客戶端掃描緩存,對查詢性能有很大幫助</description>
         </property>
         <property>
                   <name>hbase.rpc.timeout</name>
                   <value>360000000</value>
         </property>
</configuration>

hdfs-site.xml 

<configuration>
        <property>
                <name>dfs.namenode.name.dir</name>
                <value>/home/hadoop/hadoop/hdfs/name</value>
        </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>/home/hadoop/hadoop/hdfs/data</value>
    </property>
    <property>
        <name>dfs.replication</name>
        <value>4</value>
    </property>
    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>master2:9001</value>
    </property>
    <property>
         <name>dfs.webhdfs.enabled</name>
        <value>true</value>
    </property>
    <property>
         <name>dfs.client.read.shortcircuit</name>
         <value>false</value>
    </property>
</configuration>

mapred-site.xml 

<configuration>
  <property>
     <name>mapreduce.framework.name</name>
     <value>yarn</value>
    </property>
  <property>
     <name>mapreduce.jobtracker.http.address</name>
      <value>NameNode:50030</value>
  </property>
    <property>
        <name>mapreduce.jobhistory.address</name>
        <value>master1:10020</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>master1:19888</value>
    </property>
         <property>
        <name>mapred.compress.map.output</name>
        <value>true</value>
         </property>
</configuration>

新增masters文件,以部署高可用的hadoop,將master2做爲Secondary Name Node 

masters

master2

slaves 

master1
master2
slave1
slave2

yarn-env.sh 

export YARN_CONF_DIR="${YARN_CONF_DIR:-$HADOOP_YARN_HOME/conf}"
export JAVA_HOME=/jdk1.7.0_80
JAVA=$JAVA_HOME/bin/java
JAVA_HEAP_MAX=-Xmx4096m

yarn-site.xml 

<configuration>
         <property>
                   <name>yarn.resourcemanager.zk-address</name>
                  <value>master1:2181,master2:2181,slave1:2181,slave2:2181</value>
         </property>
         <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
       <name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
    <property>
        <name>yarn.resourcemanager.address</name>
        <value>master1:8032</value>
    </property>
    <property>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>master1:8030</value>
    </property>
    <property>
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>master1:8031</value>
    </property>
    <property>
        <name>yarn.resourcemanager.admin.address</name>
        <value>master1:8033</value>
    </property>
    <property>
        <name>yarn.resourcemanager.webapp.address</name>
        <value>master1:8088</value>
    </property>
    <property>
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>2048</value>
         </property>
</configuration>

4.配置hadoop環境變量

export HADOOP_HOME=/home/hadoop/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin


5.查看hadoop是否安裝配置成功
 

[hadoop@master1 ~]$ hadoop version
Hadoop 2.7.1
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git
-r 15ecc87ccf4a0228f35af08fc56de536e6ce657a
Compiled by jenkins on 2015-06-29T06:04Z
Compiled with protoc 2.5.0
From source with checksum fc0a1a23fc1868e4d5ee7fa2b28a58a
This command was run using
/home/hadoop/hadoop/share/hadoop/common/hadoop-common-2.7.1.jar

6.分發hadoop文件夾到各個虛擬主機上的hadoop家目錄上

scp -r  ~/hadoop/ hadoop@master2:~
scp -r  ~/hadoop/ hadoop@slave1:~
scp -r  ~/hadoop/ hadoop@slave2:~

7.從新格式化hdfs系統 

若是集羣剛配置從沒啓動過,直接執行格式化操做。

若是集羣已經格式化了後啓動過,則先執行刪除舊數據的操做後,在執行格式化操做。

 

1)刪除舊數據

在 hdfs-ste.xml 配置了

dfs.name.dir = /home/hadoop/hdfs/name (namenode上存儲hdfs名字空間元數據)

dfs.data.dir = /home/hadoop/hdfs/data (datanode上數據塊的物理存儲位置)

 

在core-site.xml中配置了

hadoop.tmp.dir = /home/hadoop/hadoop/tmp(namenode上本地的hadoop臨時文件夾)

 

將各個集羣節點這三個文件夾下面的文件和目錄所有刪除

 

2)執行格式化命令

hadoop namenode -format

 

3)格式化日誌

DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
16/04/05 02:02:28 INFO namenode.NameNode: STARTUP_MSG:

/***********************************************************
STARTUP_MSG:   Starting NameNode
STARTUP_MSG:   host = master1/192.168.200.165
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 2.7.1
STARTUP_MSG:   classpath =
/home/hadoop/hadoop/etc/hadoop:/home/hadoop/hadoop/share/hadoop/common/lib/commons-configuration-1.6.jar:/home/hadoop/hadoop/share/hadoop/common/lib/curator-client-2.7.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/gson-2.2.4.jar:/home/hadoop/hadoop/share/hadoop/common/lib/activation-1.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jackson-jaxrs-1.9.13.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jsp-api-2.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jaxb-impl-2.2.3-1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/apacheds-kerberos-codec-2.0.0-M15.jar:/home/hadoop/hadoop/share/hadoop/common/lib/commons-io-2.4.jar:/home/hadoop/hadoop/share/hadoop/common/lib/paranamer-2.3.jar:/home/hadoop/hadoop/share/hadoop/common/lib/httpclient-4.2.5.jar:/home/hadoop/hadoop/share/hadoop/common/lib/log4j-1.2.17.jar:/home/hadoop/hadoop/share/hadoop/common/lib/htrace-core-3.1.0-incubating.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jets3t-0.9.0.jar:/home/hadoop/hadoop/share/hadoop/common/lib/zookeeper-3.4.6.jar:/home/hadoop/hadoop/share/hadoop/common/lib/hadoop-auth-2.7.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/java-xmlbuilder-0.4.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jettison-1.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/slf4j-api-1.7.10.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jersey-server-1.9.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jackson-mapper-asl-1.9.13.jar:/home/hadoop/hadoop/share/hadoop/common/lib/avro-1.7.4.jar:/home/hadoop/hadoop/share/hadoop/common/lib/commons-codec-1.4.jar:/home/hadoop/hadoop/share/hadoop/common/lib/commons-cli-1.2.jar:/home/hadoop/hadoop/share/hadoop/common/lib/curator-recipes-2.7.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/commons-net-3.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jetty-util-6.1.26.jar:/home/hadoop/hadoop/share/hadoop/common/lib/protobuf-java-2.5.0.jar:/home/hadoop/hadoop/share/hadoop/common/lib/hadoop-annotations-2.7.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/netty-3.6.2.Final.jar:/home/hadoop/hadoop/share/hadoop/common/lib/commons-digester-1.8.jar:/home/hadoop/hadoop/share/hadoop/common/lib/guava-11.0.2.jar:/home/hadoop/hadoop/share/hadoop/common/lib/commons-compress-1.4.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jsch-0.1.42.jar:/home/hadoop/hadoop/share/hadoop/common/lib/commons-beanutils-1.7.0.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jersey-core-1.9.jar:/home/hadoop/hadoop/share/hadoop/common/lib/api-util-1.0.0-M20.jar:/home/hadoop/hadoop/share/hadoop/common/lib/api-asn1-api-1.0.0-M20.jar:/home/hadoop/hadoop/share/hadoop/common/lib/xz-1.0.jar:/home/hadoop/hadoop/share/hadoop/common/lib/commons-httpclient-3.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/commons-beanutils-core-1.8.0.jar:/home/hadoop/hadoop/share/hadoop/common/lib/stax-api-1.0-2.jar:/home/hadoop/hadoop/share/hadoop/common/lib/asm-3.2.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jackson-xc-1.9.13.jar:/home/hadoop/hadoop/share/hadoop/common/lib/commons-logging-1.1.3.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jersey-json-1.9.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jsr305-3.0.0.jar:/home/hadoop/hadoop/share/hadoop/common/lib/commons-collections-3.2.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/commons-math3-3.1.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jetty-6.1.26.jar:/home/hadoop/hadoop/share/hadoop/common/lib/snappy-java-1.0.4.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/hamcrest-core-1.3.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jaxb-api-2.2.2.jar:/home/hadoop/hadoop/share/hadoop/common/lib/commons-lang-2.6.jar:/home/hadoop/hadoop/share/hadoop/common/lib/junit-4.11.jar:/home/hadoop/hadoop/share/hadoop/common/lib/jackson-core-asl-1.9.13.jar:/home/hadoop/hadoop/share/hadoop/common/lib/mockito-all-1.8.5.jar:/home/hadoop/hadoop/share/hadoop/common/lib/servlet-api-2.5.jar:/home/hadoop/hadoop/share/hadoop/common/lib/httpcore-4.2.5.jar:/home/hadoop/hadoop/share/hadoop/common/lib/curator-framework-2.7.1.jar:/home/hadoop/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar:/home/hadoop/hadoop/share/hadoop/common/lib/apacheds-i18n-2.0.0-M15.jar:/home/hadoop/hadoop/share/hadoop/common/lib/xmlenc-0.52.jar:/home/hadoop/hadoop/share/hadoop/common/hadoop-common-2.7.1.jar:/home/hadoop/hadoop/share/hadoop/common/hadoop-common-2.7.1-tests.jar:/home/hadoop/hadoop/share/hadoop/common/hadoop-nfs-2.7.1.jar:/home/hadoop/hadoop/share/hadoop/hdfs:/home/hadoop/hadoop/share/hadoop/hdfs/lib/xml-apis-1.3.04.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/commons-io-2.4.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/log4j-1.2.17.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/htrace-core-3.1.0-incubating.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/commons-daemon-1.0.13.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/jersey-server-1.9.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/jackson-mapper-asl-1.9.13.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/commons-codec-1.4.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/commons-cli-1.2.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/jetty-util-6.1.26.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/protobuf-java-2.5.0.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/xercesImpl-2.9.1.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/netty-3.6.2.Final.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/guava-11.0.2.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/jersey-core-1.9.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/netty-all-4.0.23.Final.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/asm-3.2.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/commons-logging-1.1.3.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/jsr305-3.0.0.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/jetty-6.1.26.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/commons-lang-2.6.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/leveldbjni-all-1.8.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/jackson-core-asl-1.9.13.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/servlet-api-2.5.jar:/home/hadoop/hadoop/share/hadoop/hdfs/lib/xmlenc-0.52.jar:/home/hadoop/hadoop/share/hadoop/hdfs/hadoop-hdfs-2.7.1-tests.jar:/home/hadoop/hadoop/share/hadoop/hdfs/hadoop-hdfs-nfs-2.7.1.jar:/home/hadoop/hadoop/share/hadoop/hdfs/hadoop-hdfs-2.7.1.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/activation-1.1.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/jackson-jaxrs-1.9.13.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/jaxb-impl-2.2.3-1.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/commons-io-2.4.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/log4j-1.2.17.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/zookeeper-3.4.6.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/jettison-1.1.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/zookeeper-3.4.6-tests.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/jersey-server-1.9.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/jackson-mapper-asl-1.9.13.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/commons-codec-1.4.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/commons-cli-1.2.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/jetty-util-6.1.26.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/protobuf-java-2.5.0.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/jersey-client-1.9.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/aopalliance-1.0.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/netty-3.6.2.Final.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/guava-11.0.2.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/commons-compress-1.4.1.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/jersey-core-1.9.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/jersey-guice-1.9.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/guice-3.0.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/xz-1.0.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/stax-api-1.0-2.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/asm-3.2.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/jackson-xc-1.9.13.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/commons-logging-1.1.3.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/jersey-json-1.9.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/jsr305-3.0.0.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/guice-servlet-3.0.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/commons-collections-3.2.1.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/jetty-6.1.26.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/jaxb-api-2.2.2.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/commons-lang-2.6.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/leveldbjni-all-1.8.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/jackson-core-asl-1.9.13.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/servlet-api-2.5.jar:/home/hadoop/hadoop/share/hadoop/yarn/lib/javax.inject-1.jar:/home/hadoop/hadoop/share/hadoop/yarn/hadoop-yarn-server-common-2.7.1.jar:/home/hadoop/hadoop/share/hadoop/yarn/hadoop-yarn-server-applicationhistoryservice-2.7.1.jar:/home/hadoop/hadoop/share/hadoop/yarn/hadoop-yarn-common-2.7.1.jar:/home/hadoop/hadoop/share/hadoop/yarn/hadoop-yarn-api-2.7.1.jar:/home/hadoop/hadoop/share/hadoop/yarn/hadoop-yarn-applications-unmanaged-am-launcher-2.7.1.jar:/home/hadoop/hadoop/share/hadoop/yarn/hadoop-yarn-client-2.7.1.jar:/home/hadoop/hadoop/share/hadoop/yarn/hadoop-yarn-server-resourcemanager-2.7.1.jar:/home/hadoop/hadoop/share/hadoop/yarn/hadoop-yarn-registry-2.7.1.jar:/home/hadoop/hadoop/share/hadoop/yarn/hadoop-yarn-server-web-proxy-2.7.1.jar:/home/hadoop/hadoop/share/hadoop/yarn/hadoop-yarn-server-sharedcachemanager-2.7.1.jar:/home/hadoop/hadoop/share/hadoop/yarn/hadoop-yarn-server-tests-2.7.1.jar:/home/hadoop/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.7.1.jar:/home/hadoop/hadoop/share/hadoop/yarn/hadoop-yarn-server-nodemanager-2.7.1.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/commons-io-2.4.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/paranamer-2.3.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/log4j-1.2.17.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/jersey-server-1.9.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/jackson-mapper-asl-1.9.13.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/avro-1.7.4.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/protobuf-java-2.5.0.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/hadoop-annotations-2.7.1.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/aopalliance-1.0.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/netty-3.6.2.Final.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/commons-compress-1.4.1.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/jersey-core-1.9.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/jersey-guice-1.9.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/guice-3.0.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/xz-1.0.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/asm-3.2.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/guice-servlet-3.0.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/snappy-java-1.0.4.1.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/hamcrest-core-1.3.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/junit-4.11.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/leveldbjni-all-1.8.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/jackson-core-asl-1.9.13.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/lib/javax.inject-1.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.7.1.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-2.7.1.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-plugins-2.7.1.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-common-2.7.1.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-app-2.7.1.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.1.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.1-tests.jar:/home/hadoop/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-shuffle-2.7.1.jar:/home/hadoop/hadoop/contrib/capacity-scheduler/*.jar:/home/hadoop/hadoop/contrib/capacity-scheduler/*.jar

STARTUP_MSG:   build = https://git-wip-us.apache.org/repos/asf/hadoop.git

-r 15ecc87ccf4a0228f35af08fc56de536e6ce657a; compiled by 'jenkins' on 2015-06-29T06:04Z

STARTUP_MSG:   java = 1.7.0_80

************************************************************/

16/04/05 02:02:28 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
16/04/05 02:02:28 INFO namenode.NameNode: createNameNode [-format]
Formatting using clusterid: CID-070fc765-1b22-4453-83ba-7635ea906e1d
16/04/05 02:02:29 INFO namenode.FSNamesystem: No KeyProvider found.
16/04/05 02:02:29 INFO namenode.FSNamesystem: fsLock is fair:true
16/04/05 02:02:29 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000
16/04/05 02:02:29 INFO blockmanagement.DatanodeManager:
dfs.namenode.datanode.registration.ip-hostname-check=true
16/04/05 02:02:29 INFO blockmanagement.BlockManager:
dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
16/04/05 02:02:29 INFO blockmanagement.BlockManager: The block deletion will start around 2016 Apr 05 02:02:29
16/04/05 02:02:29 INFO util.GSet: Computing capacity for map BlocksMap
16/04/05 02:02:29 INFO util.GSet: VM type= 64-bit
16/04/05 02:02:29 INFO util.GSet: 2.0% max memory 889 MB = 17.8 MB
16/04/05 02:02:29 INFO util.GSet: capacity = 2^21 = 2097152 entries
16/04/05 02:02:30 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false
16/04/05 02:02:30 INFO blockmanagement.BlockManager: defaultReplication= 3
16/04/05 02:02:30 INFO blockmanagement.BlockManager: maxReplication= 512
16/04/05 02:02:30 INFO blockmanagement.BlockManager: minReplication= 1
16/04/05 02:02:30 INFO blockmanagement.BlockManager: maxReplicationStreams= 2
16/04/05 02:02:30 INFO blockmanagement.BlockManager: shouldCheckForEnoughRacks= false
16/04/05 02:02:30 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000
16/04/05 02:02:30 INFO blockmanagement.BlockManager: encryptDataTransfer= false
16/04/05 02:02:30 INFO blockmanagement.BlockManager: maxNumBlocksToLog = 1000
16/04/05 02:02:30 INFO namenode.FSNamesystem: fsOwner= hadoop (auth:SIMPLE)
16/04/05 02:02:30 INFO namenode.FSNamesystem: supergroup= supergroup
16/04/05 02:02:30 INFO namenode.FSNamesystem: isPermissionEnabled = true
16/04/05 02:02:30 INFO namenode.FSNamesystem: HA Enabled: false
16/04/05 02:02:30 INFO namenode.FSNamesystem: Append Enabled: true
16/04/05 02:02:30 INFO util.GSet: Computing capacity for map INodeMap
16/04/05 02:02:30 INFO util.GSet: VM type= 64-bit
16/04/05 02:02:30 INFO util.GSet: 1.0% max memory 889 MB = 8.9 MB
16/04/05 02:02:30 INFO util.GSet: capacity= 2^20 = 1048576 entries
16/04/05 02:02:30 INFO namenode.FSDirectory: ACLs enabled? false
16/04/05 02:02:30 INFO namenode.FSDirectory: XAttrs enabled? true
16/04/05 02:02:30 INFO namenode.FSDirectory: Maximum size of an xattr: 16384
16/04/05 02:02:30 INFO namenode.NameNode: Caching file names occuring more than 10 times
16/04/05 02:02:30 INFO util.GSet: Computing capacity for map cachedBlocks
16/04/05 02:02:30 INFO util.GSet: VM type= 64-bit
16/04/05 02:02:30 INFO util.GSet: 0.25% max memory 889 MB = 2.2 MB
16/04/05 02:02:30 INFO util.GSet: capacity= 2^18 = 262144 entries
16/04/05 02:02:30 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
16/04/05 02:02:30 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0
16/04/05 02:02:30 INFOnamenode.FSNamesystem: dfs.namenode.safemode.extension= 30000
16/04/05 02:02:30 INFO metrics.TopMetrics: NNTop conf:
dfs.namenode.top.window.num.buckets = 10
16/04/05 02:02:30 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10
16/04/05 02:02:30 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25
16/04/05 02:02:30 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
16/04/05 02:02:30 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
16/04/05 02:02:30 INFO util.GSet: Computing capacity for map NameNodeRetryCache
16/04/05 02:02:30 INFO util.GSet: VM type= 64-bit
16/04/05 02:02:30 INFO util.GSet: 0.029999999329447746% max memory 889 MB = 273.1 KB
16/04/05 02:02:30 INFO util.GSet: capacity= 2^15 = 32768 entries
16/04/05 02:02:30 INFO namenode.FSImage: Allocated new BlockPoolId: BP-464058956-192.168.200.165-1459836150356
16/04/05 02:02:30 INFO common.Storage: Storage directory
 /home/hadoop/hadoop/hdfs/name has been successfully formatted.
16/04/05 02:02:30 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
16/04/05 02:02:30 INFO util.ExitUtil: Exiting with status 0
16/04/05 02:02:30 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at master1/192.168.200.165

************************************************************/


8.啓停hadoop
 

#啓動hadoop集羣

注意事項:

1)各個集羣節點之間能免密碼登陸彼此

2)確保各節點配置的myid和zoo.cfg裏的一致,並先啓動zookeeper集羣

 

啓動命令:sbin/start-all.sh 或者依次執行 start-dfs.sh 和 start-yarn.sh

[hadoop@master1 conf]$ start-all.sh
Starting namenodes on [master1]
master1: starting namenode, logging to
/home/hadoop/hadoop/logs/hadoop-hadoop-namenode-master1.out
slave2: starting datanode, logging to
/home/hadoop/hadoop/logs/hadoop-hadoop-datanode-slave2.out
master2: starting datanode,
logging to /home/hadoop/hadoop/logs/hadoop-hadoop-datanode-master2.out
master1: starting datanode, logging to
/home/hadoop/hadoop/logs/hadoop-hadoop-datanode-master1.out
slave1: starting datanode, logging to
/home/hadoop/hadoop/logs/hadoop-hadoop-datanode-slave1.out
Starting secondary namenodes [master2]
master2: starting secondarynamenode, logging to
/home/hadoop/hadoop/logs/hadoop-hadoop-secondarynamenode-master2.out
starting yarn daemons
starting resourcemanager, logging to
/home/hadoop/hadoop/logs/yarn-hadoop-resourcemanager-master1.out
slave2: starting nodemanager, logging to
/home/hadoop/hadoop/logs/yarn-hadoop-nodemanager-slave2.out
master2: starting nodemanager, logging to
/home/hadoop/hadoop/logs/yarn-hadoop-nodemanager-master2.out
master1: starting nodemanager, logging to
/home/hadoop/hadoop/logs/yarn-hadoop-nodemanager-master1.out
slave1: starting nodemanager, logging to
/home/hadoop/hadoop/logs/yarn-hadoop-nodemanager-slave1.out

查看相關守護進程 

NAMENODE(master1)

[hadoop@master1 logs]$ jps
12892 NameNode
13003 DataNode
13295 ResourceManager
13408 NodeManager
13826 QuorumPeerMain

SECONDARY NAMENODE(master2) 

[hadoop@master2 ~]$ jps
10162 SecondaryNameNode
10052 DataNode
10245 NodeManager
4045 QuorumPeerMain

DATANODE(slave1/slave2) 

[hadoop@slave1 ~]$ jps
13902 NodeManager
13789 DataNode
9331 QuorumPeerMain

 
[hadoop@slave2 ~]$ jps
13697 QuorumPeerMain
18324 DataNode
18440 NodeManager

#中止hadoop集羣 

sbin /stop-all.sh

 

啓動成功後能夠訪問web控制檯查看集羣信息:

NAMENODE:http://192.168.200.165:50070/

SECONDARY NAMENODE:http://192.168.200.166:9001

Nodes Of Cluster(YARN做業管理界面): http://192.168.200.166:8088

 

9.啓動jobhistoryserver

由於kylin中須要MapReduce任務調度,因此須要啓動jobhistoryserver

[hadoop@master1 logs]$ mr-jobhistory-daemon.sh start historyserver

查看jobhistoryserver守護進程 

[hadoop@master1 conf]$ jps
24419 JobHistoryServer

 


7、hbase 部署
 

1.解壓縮hbase-1.1.4到hadoop家目錄下,進入目錄hbase

2.進入conf文件夾

3.修改hbase各配置文件以下:

 

hbase-env.sh

export JAVA_HOME=/jdk1.7.0_80

regionservers 

master1
master2
slave1
slave2

hbase-site.xml(已優化) 

<configuration>
        <property>
                <name>hbase.rootdir</name>
                <value>hdfs://master1:9000/hbase</value>
        </property>
        <property>
                <name>hbase.cluster.distributed</name>
                <value>true</value>
        </property>
        <property>
                <name>hbase.zookeeper.quorum</name>
                <value>master1,master2,slave1,slave2</value>
        </property>
        <property>
                <name>hbase.zookeeper.property.dataDir</name>
                <value>/zookeeper-3.4.6/data</value>
        </property>
        <property>
                <name>hbase.zookeeper.property.clientPort</name>
                <value>2181</value>
        </property>
        <property>
                <name>hbase.master.info.bindAddress</name>
                <value>master1</value>
        </property>
        <property>
                <name>hbase.master.info.port</name>
                <value>60010</value>
        </property>
        <property>
                <name>hbase.master.maxclockskew</name>
                <value>200000</value>
                <description>Time difference of regionserver from master</description>
        </property>
        <property>
                <name>hbase.coprocessor.user.region.classes</name>
          <value>org.apache.hadoop.hbase.coprocessor.AggregateImplementation</value>
        </property>
        <property>
                <name>hbase.regionserver.wal.codec</name>
         <value>org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec</value>
        </property>
        <property>
                <name>hbase.master.loadbalancer.class</name>
             <value>org.apache.phoenix.hbase.index.balancer.IndexLoadBalancer</value>
        </property>
        <property>
                <name>hbase.coprocessor.master.classes</name>
             <value>org.apache.phoenix.hbase.index.master.IndexMasterObserver</value>
        </property>
        <property>
                <name>phoenix.query.maxServerCacheBytes</name>
                <value>1073741824</value>
        </property>
        <property>
                <name>phoenix.query.maxGlobalMemoryPercentage</name>
                <value>70</value>
                </property>
        <property>
                <name>hbase.client.scanner.caching</name>
                <value>5000</value>
                <description>HBase客戶端掃描緩存,對查詢性能有很大幫助</description>
        </property>
        <property>
                <name>hbase.rpc.timeout</name>
                <value>360000000</value>
        </property>
        <property>
                <name>zookeeper.session.timeout</name>
                <value>60000</value>
                <description>zk超時時間</description>
        </property>
        <property>
                <name>hbase.regionserver.handler.count</name>
                <value>50</value>
                <description>用戶表接受外來請求的線程數</description>
        </property>
        <property>
                <name>hbase.hregion.max.filesize</name>
                <value>107374182400</value>
               <description>單個ColumnFamily的region大小,若按照ConstantSizeRegionSplitPolicy策略,超過設置的該值則自動split>(100G)</description>
        </property>
        <property>
                <name>perf.hfile.block.cache.size</name>
                <value>0.2</value>
                <description>設置讀寫平衡</description>
        </property>
        <property>
                <name>hbase.regionserver.global.memstore.size</name>
                <value>0.3</value>
                <description>RegionServer進程block進行flush觸發條件:該節點上全部region的memstore之和達到upperLimit*heapsize</description>
        </property>
        <property>
                <name>hbase.regionserver.global.memstore.lowerLimit</name>
                <value>0.3</value>
                <description>RegionServer進程觸發flush的一個條件:該節點上全部region的memstore之和達到lowerLimit*heapsize</description>
        </property>
        <property>
                <name>hbase.zookeeper.property.tickTime</name>
                <value>6000</value>
                <description>Client端與zk發送心跳的時間間隔(6秒)</description>
        </property>
        <property>
                <name>hbase.hstore.blockingStoreFiles</name>
                <value>10</value>
                <description>設置讀寫平衡</description>
        </property>
        <property>
                <name>hbase.hstore.blockingWaitTime</name>
                <value>90000</value>
                <description>block的等待時間(90s)</description>
        </property>
        <property>
                <name>hbase.hregion.memstore.flush.size</name>
                <value>104857600</value>
                <description>memstore大小,當達到該值則會flush到外存設備(100M)</description>
        </property>
        <property>
                <name>hbase.hregion.memstore.mslab.enabled</name>
                <value>true</value>
                <description>是否開啓mslab方案,減小因內存碎片致使的Full GC,提升總體性能</description>
        </property>
        <property>
                <name>hbase.regionserver.region.split.policy</name>
                <value>org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy</value>
                <description>split操做默認的策略</description>
        </property>
        <property>
                <name>hbase.client.write.buffer</name>
                <value>8388608</value>
                <description>客戶端寫buffer,設置autoFlush爲false時,當客戶端寫滿buffer才flush(8m)</description>
        </property>
        <property>
                <name>hbase.hregion.memstore.block.multiplier</name>
                <value>4</value>
                <description>若是memstores超過了flushsize的multiplier倍則會阻塞客戶端的寫</description>
        </property>
        <property>
                <name>hbase.regionserver.regionSplitLimit</name>
                <value>150</value>
                <description>單臺RegionServer上region數上限</description>
        </property>
        <property>
                <name>hbase.regionserver.maxlogs</name>
                <value>16</value>
                <description>若是memstores超過了flushsize的multiplier倍則會阻塞客戶端的寫</description>
        </property>
</configuration>

4.配置環境變量,優化內存 

export HBASE_HOME=/home/hadoop/hbase

export PATH=$PATH:$HBASE_HOME/bin

export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS -Xms1536m -Xmx2048m -Xmn1024m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70
-XX:PermSize=512m -XX:MaxPermSize=512m"

export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS
-Xms1536m -Xmx2048m -Xmn1024m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=70 -XX:PermSize=512m -XX:MaxPermSize=512m"

export HBASE_THRIFT_OPTS="$HBASE_THRIFT_OPTS -Xms1024m -Xmx2048m"

5.分發hbase文件夾到各個虛擬主機上的hadoop家目錄上

scp -r  ~/hbase/ hadoop@master2:~
scp -r  ~/hbase/ hadoop@slave1:~
scp -r  ~/hbase/ hadoop@slave2:~

6.啓停

bin/start-hbase.sh
bin/stop-hbase.sh

啓動成功信息 

[hadoop@master1 hadoop]$ start-hbase.sh
master1: starting zookeeper, logging to
/home/hadoop/hbase/bin/../logs/hbase-hadoop-zookeeper-master1.out
slave2: starting zookeeper, logging to
/home/hadoop/hbase/bin/../logs/hbase-hadoop-zookeeper-slave2.out
master2: starting zookeeper, logging to
/home/hadoop/hbase/bin/../logs/hbase-hadoop-zookeeper-master2.out
slave1: starting zookeeper, logging to
/home/hadoop/hbase/bin/../logs/hbase-hadoop-zookeeper-slave1.out
starting master, logging to /home/hadoop/hbase/logs/hbase-hadoop-master-master1.out
slave2: starting regionserver, logging to
/home/hadoop/hbase/bin/../logs/hbase-hadoop-regionserver-slave2.out
master1: starting regionserver, logging to
/home/hadoop/hbase/bin/../logs/hbase-hadoop-regionserver-master1.out
master2: starting regionserver, logging to
/home/hadoop/hbase/bin/../logs/hbase-hadoop-regionserver-master2.out
slave1: starting regionserver, logging to
/home/hadoop/hbase/bin/../logs/hbase-hadoop-regionserver-slave1.out

查看相關守護進程 

HMASTER(master1)

[hadoop@master1 hadoop]$ jps
14230 HMaster
14379 HRegionServer

HREGIONSERVER(master二、slave一、slave2) 

[hadoop@ master2 ~]$ jps
10574 HRegionServer

[hadoop@slave1 ~]$ jps
14230 HRegionServer

[hadoop@slave2 ~]$ jps
18753 HRegionServer

 

出現的問題:集羣節點的時間不一樣步

org.apache.hadoop.hbase.ClockOutOfSyncException

 

設置時間同步

#yum install ntpdate

# ntpdate 0.asia.pool.ntp.org

#rm -rf /etc/localtime
#ln -s /usr/share/zoneinfo/Asia/Shanghai /etc/localtime

 

查看時間

date +%Y-%m-%d-%H:%M:%S

 

 

8、HIVE部署

僅在master1上部署便可

 

  1. 解壓縮hive-2.0.0到hadoop家目錄下,進入目錄hive

 

  1. 進入conf文件夾

 

  1. 在mysql上建立元數據庫hive

在master1上建立數據庫hive(編碼選latin,若是不選擇latin,會出現問題)

 

#爲hive數據庫受權

grant all on hive.* to 'root'@'%' IDENTIFIED BY 'weidong' with grant option;

flush privilege;

#設置mysql數據庫爲任意IP可鏈接 

update user set host='%' where host='localhost';

 

4.修改hive各配置文件以下: 

hive-site.xml

<configuration>
         <property>
                <name>javax.jdo.option.ConnectionURL</name>
               <value>jdbc:mysql://master1:3306/hive?createDatabaseIfNotExist=true</value>
         </property>
         <property>
                  <name>javax.jdo.option.ConnectionDriverName</name>
                   <value>com.mysql.jdbc.Driver</value>
                   <description>驅動名</description>
         </property>
         <property>
                  <name>javax.jdo.option.ConnectionUserName</name>
                   <value>root</value>
                   <description>用戶名</description>
         </property>
         <property>
                   <name>javax.jdo.option.ConnectionPassword</name>
                   <value>weidong</value>
                   <description>密碼</description>
         </property>
         <property>
                 <name>datanucleus.schema.autoCreateTables</name>
                   <value>true</value>
         </property>
         <property>
                   <name>hive.metastore.warehouse.dir</name>
                   <value>hdfs://master1:9000/home/hadoop/hive/warehouse</value>
                   <description>數據路徑(相對hdfs)</description>
         </property>
         <property>
                   <name>hive.exec.scratchdir</name>
                   <value>hdfs://master1:9000/home/hadoop/hive/warehouse</value>
         </property>
         <property>
                   <name>hive.querylog.location</name>
                   <value>/home/hadoop/hive/logs</value>
         </property>
         <property>
                   <name>hive.aux.jars.path</name>
                   <value>file:///home/hadoop/hbase/lib</value>
         </property>
         <property>
                   <name>hive.metastore.uris</name>
                   <value>thrift://master1:9083</value>
                   <description>運行hive得主機地址及端口</description>
         </property>
</configuration>

 

將日誌配置打開

cp hive-log4j2.properties. template  hive-log4j2.properties
cp hive-exec-log4j2.properties.template  hive-exec-log4j2.properties

 

5.啓動Hive

1)首先須要先啓動元數據庫

hive --service metastore &

 

啓動成功信息 

2016-04-06T11:46:10,157 INFO  [main]:
metastore.HiveMetaStore (HiveMetaStore.java:main(5876)) - Starting hive metastore on port 9083
2016-04-06T11:46:10,210 INFO  [main]:
metastore.HiveMetaStore (HiveMetaStore.java:newRawStore(499)) - 0: Opening raw store with implementation class:org.apache.hadoop.hive.metastore.ObjectStore
2016-04-06T11:46:10,299 INFO  [main]:
metastore.ObjectStore (ObjectStore.java:initialize(318)) - ObjectStore, initialize called
2016-04-06T11:46:12,284 INFO  [main]:
metastore.ObjectStore (ObjectStore.java:getPMF(402)) - Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
2016-04-06T11:46:15,345 INFO  [main]:
metastore.MetaStoreDirectSql (MetaStoreDirectSql.java:<init>(142)) - Using direct SQL, underlying DB is MYSQL
2016-04-06T11:46:15,352 INFO  [main]:
metastore.ObjectStore (ObjectStore.java:setConf(301)) - Initialized ObjectStore
2016-04-06T11:46:16,051 INFO  [main]:
metastore.HiveMetaStore (HiveMetaStore.java:createDefaultRoles_core(586)) - Added admin role in metastore
2016-04-06T11:46:16,058 INFO  [main]:
metastore.HiveMetaStore (HiveMetaStore.java:createDefaultRoles_core(595)) - Added public role in metastore
2016-04-06T11:46:16,239 INFO  [main]:
metastore.HiveMetaStore (HiveMetaStore.java:addAdminUsers_core(635)) - No user is added in admin role, since config is empty
2016-04-06T11:46:16,715 INFO  [main]:
metastore.HiveMetaStore (HiveMetaStore.java:startMetaStore(6020)) - Starting DB backed MetaStore Server with SetUGI enabled
2016-04-06T11:46:16,729 INFO  [main]:
metastore.HiveMetaStore (HiveMetaStore.java:startMetaStore(6077)) - Started the new metaserver on port [9083]...
2016-04-06T11:46:16,729 INFO  [main]:
metastore.HiveMetaStore (HiveMetaStore.java:startMetaStore(6079)) - Options.minWorkerThreads = 200
2016-04-06T11:46:16,729 INFO  [main]:
metastore.HiveMetaStore (HiveMetaStore.java:startMetaStore(6081)) - Options.maxWorkerThreads = 1000
2016-04-06T11:46:16,730 INFO  [main]:
metastore.HiveMetaStore (HiveMetaStore.java:startMetaStore(6083)) - TCP keepalive = true

 

2)啓動hive客戶端 

輸入hive命令便可

[hadoop@master conf]$ hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/home/hadoop/hive/lib/hive-jdbc-2.1.0-SNAPSHOT-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/home/hadoop/hive/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/home/hadoop/spark/lib/spark-assembly-1.4.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
ERROR StatusLogger No log4j2 configuration file found. Using default configuration: logging only errors to the console.
Logging initialized using configuration in file:/home/hadoop/hive/conf/hive-log4j2.properties
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive>

 檢驗hive客戶端下可否查看/建立表

6.中止HIVE

ps –aux | grep hive 查看hive目前的進程PID,用kill殺掉便可。

 

7.經常使用命令

hive> show databases;
OK
default
Time taken: 1.881 seconds, Fetched: 1 row(s)
hive> use default;
OK
Time taken: 0.081 seconds
hive> create table kylin_test(test_count int);
OK
Time taken: 2.9 seconds
hive> show tables;
OK
kylin_test
Time taken: 0.151 seconds, Fetched: 1 row(s)
hive> select * from kylin_test;
OK
Time taken: 0.318 seconds

在hive數據庫裏查詢 

 

9、kylin部署

僅在master1上部署便可

 

1.瞭解kylin的兩種二進制包

 

預打包的二進制安裝包:apache-kylin-1.5.0-bin.tar.gz

特別二進制包:apache-kylin-1.5.0-HBase1.1.3-bin.tar.gz


說明:特別二進制包是一個在HBase 1.1+環境上編譯的Kylin快照二進制包;安裝它須要HBase 1.1.3或更高版本,不然以前版本中有一個已知的關於fuzzy key過濾器的缺陷,會致使Kylin查詢結果缺乏記錄:HBASE-14269。此外還需注意的是,這不是一個正式的發佈版(每隔幾周rebase KYLIN 1.3.x 分支上最新的改動),沒有通過完整的測試。

 

2.解壓縮apache-kylin-1.5.0-HBase1.1.3-bin.tar.gz到hadoop家目錄下,進入目錄kylin

 

3.在/etc/profile裏配置KYLIN環境變量和一個名爲hive_dependency的變量

export KYLIN_HOME=/home/hadoop/kylin
export PATH=$PATH:$ KYLIN_HOME/bin
 
export hive_dependency=/home/hadoop/hive/conf:/home/hadoop/hive/lib/*:/home/hadoop/hive/hcatalog/share/hcatalog/hive-hcatalog-core-2.0.0.jar

 

這個配置須要在從節點master2,slave1,slave2上同時配置,由於kylin提交的任務交給mr後,hadoop集羣將任務分發給從節點時,須要hive的依賴信息,若是不配置,則mr任務將報錯爲: hcatalogXXX找不到。 

 

4.修改kylin的啓動腳本kylin.sh

 

1)顯式聲明 KYLIN_HOME
export KYLIN_HOME=/home/Hadoop/kylin
 
2)在HBASE_CLASSPATH_PREFIX中顯示增長$hive_dependency依賴 
export HBASE_CLASSPATH_PREFIX=${tomcat_root}/bin/bootstrap.jar:${tomcat_root}/bin/tomcat-juli.jar:${tomcat_root}/lib/*:$hive_dependency:$HBASE_CLASSPATH_PREFIX

5.檢查環境是否設置成功

[hadoop@master1 conf]$ check-env.sh
KYLIN_HOME is set to /home/hadoop/kylin

 

6.進入conf文件夾,修改kylin各配置文件以下:

kylin.properties

kylin.owner=wdcloud@kylin.apache.org
kylin.rest.servers=master1:7070

kylin.hdfs.working.dir=/home/hadoop/kylin/kylin_hdfs_working_dir
kylin.job.remote.cli.working.dir=/home/hadoop/kylin/kylin_job_working_dir

 #定義kylin用於MR jobs的job.jar包和hbase的協處理jar包,用於提高性能。

kylin.job.jar=/home/hadoop/kylin/lib/kylin-job-1.5.0-SNAPSHOT.jar
kylin.coprocessor.local.jar=/home/hadoop/kylin/lib/kylin-coprocessor-1.5.0-SNAPSHOT.jar

 

將kylin_hive_conf.xml和kylin_job_conf.xml的副本數設置爲4

<property>
  <name>dfs.replication</name>
  <value>4</value>
  <description>Block replication</description>
</property>

7.啓動和中止kylin 

 

#確認必須啓動的服務:

1)hadoop2的hdfs/yarn/jobhistory服務

    start-dfs.sh

    start-yarn.sh

    mr-jobhistory-daemon.sh start historyserver

2)hive 元數據庫:hive --service metastore &

3)zookeeper

4)hbase :start-hbase.sh

 

#檢查hive和hbase的依賴

[hadoop@master1 kylin]$ find-hive-dependency.sh
[hadoop@master1 kylin]$ find-hbase-dependency.sh

 

#啓動和中止kylin的命令以下: 

[hadoop@master1 kylin]$ kylin.sh start
[hadoop@master1 kylin]$ kylin.sh stop

 


Web訪問地址
 

http://192.168.200.165:7070/kylin/login

默認的登陸username/password 是 ADMIN/KYLIN.

 

 

10、kylin測試

1.測試Kylin自帶的sample

Kylin提供一個自動化腳原本建立測試CUBE,這個腳本也會自動建立出相應的hive數據表。

 

運行sample例子的步驟:

 

① 運行${KYLIN_HOME}/bin/sample.sh腳本

[hadoop@master1 ~]$ sample.sh

 

關鍵提示信息:

KYLIN_HOME is set to /home/hadoop/kylin
Going to create sample tables in hive...
Sample hive tables are created successfully; Going to create sample cube...
Sample cube is created successfully in project 'learn_kylin'; Restart Kylin server or reload the metadata from web UI to see the change.

 

#在MYSQL中查看此sample建立了哪幾張表

# select DB_ID,OWNER,SD_ID,TBL_NAME from TBLS;

 #在hive客戶端查看建立的表和數據量(1w條)

 

hive> show tables;
OK
kylin_cal_dt
kylin_category_groupings
kylin_sales
Time taken: 1.835 seconds, Fetched: 3 row(s)

hive> select count(*) from kylin_sales;
OK
10000
Time taken: 65.351 seconds, Fetched: 1 row(s)

 

② 重啓kylin server 刷新緩存 

 

[hadoop@master1 ~]$ kylin.sh stop
[hadoop@master1 ~]$ kylin.sh start

③ 使用默認的用戶名密碼ADMIN/KYLIN訪問192.168.200.165:7070/kylin 

進入控制檯後選擇project爲learn_kylin的那個項目。

 ④ 選擇測試cube 「kylin_sales_cube」,點擊「Action」-「Build」,選擇一個2014-01-01之後的日期,這是爲了選擇所有的10000條測試記錄。

 選擇一個生成日期

 點擊提交會出現重建任務成功提交的提示

 ⑤ 在監控臺查看這個任務的執行進度,直到這個任務100%完成。

 任務完成

 切換到model控制檯會發現cube的狀態成爲了ready,表示能夠執行sql查詢了

 執行過程當中,在hive裏會生成臨時表,待任務100%完成後,這張表會自動刪除

kylin_intermediate_kylin_sales_cube_desc_20120201000000_20120201000000

 執行過程當中,在hbase裏會生成永久的計算結果表,如:

KYLIN_PTQIXMC64A

 若是build了兩個以上的segment。還能夠執行merge操做:

 完成Merge任務

 這時候HBASE裏面不一樣的Segment表示的多張表也同時合併成了一張表,以節省磁盤空間

 Build過程當中出現的問題:

 當任務執行到第五步:建立HTable的時候,報錯說建立的表不可用。

最終致使整個任務的失敗 ERROR

 

2016-04-07 12:40:57,823 ERROR [pool-7-thread-5] steps.CubeHTableUtil:135 : Failed to create HTable

java.lang.IllegalArgumentException: table KYLIN_9USQAHQQXC created, but is not available due to some reasons

         at com.google.common.base.Preconditions.checkArgument(Preconditions.java:92)

         at org.apache.kylin.storage.hbase.steps.CubeHTableUtil.createHTable(CubeHTableUtil.java:132)

         at org.apache.kylin.storage.hbase.steps.CreateHTableJob.run(CreateHTableJob.java:104)

         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)

         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)

         at org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60)

         at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)

         at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)

         at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)

         at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124)

         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

         at java.lang.Thread.run(Thread.java:745)

 

 

在社區提問:

http://apache-kylin.74782.x6.nabble.com/an-error-occurred-when-build-a-sample-cube-at-step-5-create-HTable-td4102.html 

 

出錯的緣由:

由於在kylin中默認使用了snappy壓縮算法致使的。

HDFS報錯日誌:

2016-04-12 12:05:05,726 ERROR [RS_OPEN_REGION-slave2:16020-0]

handler.OpenRegionHandler: Failed open of

region=KYLIN_VKRC32OKFP,,1460433926913.73fb906719a75b2733f046e87fbe8105., starting to roll back the global memstore size.

org.apache.hadoop.hbase.DoNotRetryIOException: Compression algorithm 'snappy' previously failed test.

  at org.apache.hadoop.hbase.util.CompressionTest.testCompression

(CompressionTest.java:91)

         at org.apache.hadoop.hbase.regionserver.HRegion.checkCompressionCodecs

(HRegion.java:6300)

         at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6251)

         at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6218)

         at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6189)

         at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6145)

         at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6096)

         at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion

(OpenRegionHandler.java:362)

         at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process

(OpenRegionHandler.java:129)

         at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)

         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

         at java.lang.Thread.run(Thread.java:745)

2016-04-12 12:05:05,727 INFO  [RS_OPEN_REGION-slave2:16020-0]

coordination.ZkOpenRegionCoordination: Opening of region {ENCODED =>

73fb906719a75b2733f046e87fbe8105, NAME =>

'KYLIN_VKRC32OKFP,,1460433926913.73fb906719a75b2733f046e87fbe8105.', STARTKEY => '', ENDKEY => '\x00\x01'} failed, transitioning from OPENING to FAILED_OPEN in ZK, expecting version 1

2016-04-12 12:05:05,775 INFO  [PriorityRpcServer.handler=18,queue=0,port=16020] regionserver.RSRpcServices: Open

KYLIN_VKRC32OKFP,\x00\x01,1460433926913.06978b9fb1e423563a5aae7e1df044d8.

 

解決方法:禁用壓縮或者使用LZO做爲壓縮算法

 

官網給出的禁用壓縮算法的方法以下:

To disable compressing MR jobs you need to modify

$KYLIN_HOME/conf/kylin_job_conf.xml by removing all configuration entries related to compression(Just grep the keyword 「compress」). To disable compressing hbase tables you need to open $KYLIN_HOME/conf/kylin.properties and remove the line starting with kylin.hbase.default.compression.codec.

 

 

⑥ 切換到Insight 窗口執行SQL語句,例如:

select part_dt, sum(price) as total_selled, count(distinct seller_id) as sellers from kylin_sales group by part_dt order by part_dt;

 

在Kylin中執行如上的sql統計只用了0.46s (十次取平均值)

 

 

在Hive裏執行同一條sql統計語句,花費時間高達136秒

hive> select part_dt, sum(price) as total_selled, count(distinct seller_id) as sellers from kylin_sales group by part_dt order by part_dt;

Time taken: 136.489 seconds, Fetched: 731 row(s)

 

可見:kylin執行這條sql明顯提高了效率。

 

其餘測試語句:

 

①       select * from kylin_sales;(1s內)

 

②    各個時間段內的銷售額及購買量(0.39秒)

select part_dt, sum(price) as total_selled, count(distinct seller_id) as sellers

from kylin_sales

group by part_dt

order by part_dt;

 

③    查詢某一時間的銷售額及購買量(0.40秒)

select part_dt, sum(price) as total_selled, count(distinct seller_id) as sellers from kylin_sales

where part_dt = '2014-01-01'

group by part_dt;

發現報錯:

Error while compiling generated Java code:

public static class Record3_0 implements java.io.Serializable {           

public java.math.BigDecimal f0;

    public boolean f1;

public org.apache.kylin.common.hll.HyperLogLogPlusCounter f2;         

public Record3_0(java.math.BigDecimal f0, boolean f1, ...

 

這是由於part_dt是date類型,在解析string到date的時候出問題,應將sql語句改成:

select part_dt, sum(price) as total_selled, count(distinct seller_id) as sellers

from kylin_sales

where part_dt between '2014-01-01' and '2014-01-01'

group by part_dt;

或者

select part_dt, sum(price) as total_selled, count(distinct seller_id) as sellers

from kylin_sales

where part_dt = date '2014-01-01'

group by part_dt;

 

④上面查詢只用到了fact table,而沒有用到lookup table。若是查詢各個時間段全部二級商品類型的銷售額,則須要fact table與lookup table作inner join(1.36s)

select fact.part_dt, lookup.CATEG_LVL2_NAME, count(distinct seller_id) as sellers

from kylin_sales fact

inner join KYLIN_CATEGORY_GROUPINGS lookup

on fact.LEAF_CATEG_ID = lookup.LEAF_CATEG_ID and fact.LSTG_SITE_ID = lookup.SITE_ID

group by fact.part_dt, lookup.CATEG_LVL2_NAME

order by fact.part_dt desc

相關文章
相關標籤/搜索