Apache HBase 是一個高可靠性、高性能、面向列、可伸縮的分佈式存儲系統,是NoSQL數據庫,基於Google Bigtable思想的開源實現,可在廉價的PC Server上搭建大規模結構化存儲集羣,利用Hadoop HDFS做爲其文件存儲系統,利用Hadoop MapReduce來處理HBase海量數據,使用Zookeeper協調服務器集羣。Apache HBase官網有詳細的介紹文檔。html
Apache HBase的徹底分佈式集羣安裝部署並不複雜,下面是部署的詳細過程:java
一、規劃HBase集羣節點node
本實驗有4個節點,要配置HBase Master、Master-backup、RegionServer,節點主機操做系統爲Centos 6.9,各節點的進程規劃以下:linux
主機 | IP | 節點進程 |
---|---|---|
hd1 | 172.17.0.1 | Master、Zookeeper |
hd2 | 172.17.0.2 | Master-backup、RegionServer、Zookeeper |
hd3 | 172.17.0.3 | RegionServer、Zookeeper |
hd4 | 172.17.0.4 | RegionServer |
二、安裝 JDK、Zookeeper、Hadoopdocker
各服務器節點關閉防火牆、設置selinux爲disabledshell
安裝 JDK、Zookeeper、Apache Hadoop 分佈式集羣(具體過程詳見我另外一篇博文:Apache Hadoop 2.8分佈式集羣搭建超詳細過程)數據庫
安裝後設置環境變量,這些變量在安裝配置HBase時須要用到express
export JAVA_HOME=/usr/java/jdk1.8.0_131 export CLASSPATH=.:$JAVA_HOME/lib:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar export PATH=$PATH:$JAVA_HOME/bin export HADOOP_HOME=/home/ahadoop/hadoop-2.8.0 export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin export ZOOKEEPER_HOME=/home/ahadoop/zookeeper-3.4.10 export PATH=$PATH:$ZOOKEEPER_HOME/bin
三、安裝NTP,實現服務器節點間的時間一致apache
若是服務器節點之間時間不一致,可能會引起HBase的異常,這一點在HBase官網上有特別強調。在這裏,設置第1個節點hd1爲NTP的服務端節點,也即該節點(hd1)從國家授時中心同步時間,而後其它節點(hd二、hd三、hd4)做爲客戶端從hd1同步時間centos
(1)安裝 NTP
# 安裝 NTP 服務 yum -y install ntp # 設置爲開機啓動 chkconfig --add ntpd chkconfig ntpd on
啓動 NTP 服務
service ntpd start
(2)配置NTP服務端
在節點hd1,編輯 /etc/ntp.conf 文件,配置NTP服務,具體的配置改動項見如下中文註釋
vi /etc/ntp.conf # For more information about this file, see the man pages # ntp.conf(5), ntp_acc(5), ntp_auth(5), ntp_clock(5), ntp_misc(5), ntp_mon(5). driftfile /var/lib/ntp/drift # Permit time synchronization with our time source, but do not # permit the source to query or modify the service on this system. restrict default nomodify notrap nopeer noquery # Permit all access over the loopback interface. This could # be tightened as well, but to do so would effect some of # the administrative functions. restrict 127.0.0.1 restrict ::1 # Hosts on local network are less restricted. #restrict 192.168.1.0 mask 255.255.255.0 nomodify notrap # 添加容許接收請求的網絡範圍 restrict 172.17.0.0 mask 255.255.255.0 nomodify notrap # Use public servers from the pool.ntp.org project. # Please consider joining the pool (http://www.pool.ntp.org/join.html). #server 0.centos.pool.ntp.org iburst # 同步時鐘的服務器 server 210.72.145.44 perfer # 中國國家受時中心 server 202.112.10.36 # 1.cn.pool.ntp.org server 59.124.196.83 # 0.asia.pool.ntp.org #broadcast 192.168.1.255 autokey # broadcast server #broadcastclient # broadcast client #broadcast 224.0.1.1 autokey # multicast server #multicastclient 224.0.1.1 # multicast client #manycastserver 239.255.254.254 # manycast server #manycastclient 239.255.254.254 autokey # manycast client # 容許上層時間服務器主動修改本機時間 restrict 210.72.145.44 nomodify notrap noquery restrict 202.112.10.36 nomodify notrap noquery restrict 59.124.196.83 nomodify notrap noquery # 外部時間服務器不可用時,以本地時間做爲時間服務 server 127.0.0.1 # local clock fudge 127.0.0.1 stratum 10 # Enable public key cryptography. #crypto includefile /etc/ntp/crypto/pw # Key file containing the keys and key identifiers used when operating # with symmetric key cryptography. keys /etc/ntp/keys # Specify the key identifiers which are trusted. #trustedkey 4 8 42 # Specify the key identifier to use with the ntpdc utility. #requestkey 8 # Specify the key identifier to use with the ntpq utility. #controlkey 8 # Enable writing of statistics records. #statistics clockstats cryptostats loopstats peerstats # Disable the monitoring facility to prevent amplification attacks using ntpdc # monlist command when default restrict does not include the noquery flag. See # CVE-2013-5211 for more details. # Note: Monitoring will not be disabled with the limited restriction flag. disable monitor
重啓 NTP 服務
service ntpd restart
而後查看ntp狀態
[root@31d48048cb1e ahadoop]# service ntpd status ntpd dead but pid file exists
這時發現有報錯,原來ntpd服務有一個限制,ntpd僅同步更改與ntp server時差在1000s內的時間,而查了服務器節點的時間與實際時間差已超過了1000s,所以,必須先手動修改下操做系統時間與ntp server相差時間在1000s之內,而後再去同步服務
# 若是操做系統的時區有錯,先修改下時區(亞洲-上海) cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime # 修改日期、時間 date -s 20170703 date -s 15:32:00
其實還有另一個小技巧,就是在安裝好NTP服務後,先經過授時服務器得到準確的時間,這樣也不用手工修改了,命令以下:
ntpdate -u pool.ntp.orgpool.ntp.org
【注意】若是是在docker裏面執行同步時間操做,系統會報錯
9 Jan 05:13:57 ntpdate[7299]: step-systime: Operation not permitted
若是出現這個錯誤,說明系統不容許自行設置時間。在docker裏面,因爲docker容器共享的是宿主機的內核,而修改系統時間是內核層面的功能,所以,在 docker 裏面是沒法修改時間
(3)配置NTP客戶端
在節點hd二、hd三、hd4編輯 /etc/ntp.conf 文件,配置 NPT 客戶端,具體的配置改動項,見如下的中文註釋
vi /etc/ntp.conf # For more information about this file, see the man pages # ntp.conf(5), ntp_acc(5), ntp_auth(5), ntp_clock(5), ntp_misc(5), ntp_mon(5). driftfile /var/lib/ntp/drift # Permit time synchronization with our time source, but do not # permit the source to query or modify the service on this system. restrict default nomodify notrap nopeer noquery # Permit all access over the loopback interface. This could # be tightened as well, but to do so would effect some of # the administrative functions. restrict 127.0.0.1 restrict ::1 # Hosts on local network are less restricted. #restrict 192.168.1.0 mask 255.255.255.0 nomodify notrap # Use public servers from the pool.ntp.org project. # Please consider joining the pool (http://www.pool.ntp.org/join.html). #server 0.centos.pool.ntp.org iburst # 同步服務端的時間 server 172.17.0.1 restrict 172.17.0.1 nomodify notrap noquery # 同步失敗,則使用本地的時間 server 127.0.0.1 fudge 127.0.0.1 stratum 10 #broadcast 192.168.1.255 autokey # broadcast server #broadcastclient # broadcast client #broadcast 224.0.1.1 autokey # multicast server #multicastclient 224.0.1.1 # multicast client #manycastserver 239.255.254.254 # manycast server #manycastclient 239.255.254.254 autokey # manycast client # Enable public key cryptography. #crypto includefile /etc/ntp/crypto/pw # Key file containing the keys and key identifiers used when operating # with symmetric key cryptography. keys /etc/ntp/keys # Specify the key identifiers which are trusted. #trustedkey 4 8 42 # Specify the key identifier to use with the ntpdc utility. #requestkey 8 # Specify the key identifier to use with the ntpq utility. #controlkey 8 # Enable writing of statistics records. #statistics clockstats cryptostats loopstats peerstats # Disable the monitoring facility to prevent amplification attacks using ntpdc # monlist command when default restrict does not include the noquery flag. See # CVE-2013-5211 for more details. # Note: Monitoring will not be disabled with the limited restriction flag. disable monitor
重啓NTP服務
service ntpd restart
啓動後,查看時間的同步狀況
$ ntpq -p $ ntpstat
四、修改ulimit
在Apache HBase官網的介紹中有提到,使用 HBase 推薦修改ulimit,以增長同時打開文件的數量,推薦 nofile 至少 10,000 但最好 10,240 (It is recommended to raise the ulimit to at least 10,000, but more likely 10,240, because the value is usually expressed in multiples of 1024.)
修改 /etc/security/limits.conf 文件,在最後加上nofile(文件數量)、nproc(進程數量)屬性,以下:
vi /etc/security/limits.conf * soft nofile 65536 * hard nofile 65536 * soft nproc 65536 * hard nproc 65536
修改後,重啓服務器生效
reboot
五、安裝配置Apache HBase
Apache HBase 官網提供了默認配置說明、參考的配置例子,建議在配置以前先閱讀一下。
在本實驗中,採用了獨立的zookeeper配置,也hadoop共用,zookeeper具體配置方法可參考個人另外一篇博客。其實在HBase中,還支持使用內置的zookeeper服務,但若是是在生產環境中,建議單獨部署,方便平常的管理。
(1)下載Apache HBase
從官網上面下載最新的二進制版本:hbase-1.2.6-bin.tar.gz
而後解壓
tar -zxvf hbase-1.2.6-bin.tar.gz
配置環境變量
vi ~/.bash_profile export HBASE_HOME=/home/ahadoop/hbase-1.2.6 export PATH=$PATH:$HBASE_HOME/bin # 使用環境變量生效 source ~/.bash_profile
(2)複製hdfs-site.xml配置文件
複製$HADOOP_HOME/etc/hadoop/hdfs-site.xml到$HBASE_HOME/conf目錄下,這樣以保證hdfs與hbase兩邊一致,這也是官網所推薦的方式。在官網中提到一個例子,例如hdfs中配置的副本數量爲5,而默認爲3,若是沒有將最新的hdfs-site.xml複製到$HBASE_HOME/conf目錄下,則hbase將會按3份備份,從而兩邊不一致,致使會出現異常。
cp $HADOOP_HOME/etc/hadoop/hdfs-site.xml $HBASE_HOME/conf/
(3)配置hbase-site.xml
編輯 $HBASE_HOME/conf/hbase-site.xml
<configuration> <property> <name>hbase.zookeeper.property.clientPort</name> <value>2181</value> </property> <property> <name>hbase.zookeeper.quorum</name> <value>hd1,hd2,hd3</value> <description>The directory shared by RegionServers. </description> </property> <property> <name>hbase.zookeeper.property.dataDir</name> <value>/home/ahadoop/zookeeper-data</value> <description> 注意這裏的zookeeper數據目錄與hadoop ha的共用,也即要與 zoo.cfg 中配置的一致 Property from ZooKeeper config zoo.cfg. The directory where the snapshot is stored. </description> </property> <property> <name>hbase.rootdir</name> <value>hdfs://hd1:9000/hbase</value> <description>The directory shared by RegionServers. 官網屢次強調這個目錄不要預先建立,hbase會自行建立,不然會作遷移操做,引起錯誤 至於端口,有些是8020,有些是9000,看 $HADOOP_HOME/etc/hadoop/hdfs-site.xml 裏面的配置,本實驗配置的是 dfs.namenode.rpc-address.hdcluster.nn1 , dfs.namenode.rpc-address.hdcluster.nn2 </description> </property> <property> <name>hbase.cluster.distributed</name> <value>true</value> <description>分佈式集羣配置,這裏要設置爲true,若是是單節點的,則設置爲false The mode the cluster will be in. Possible values are false: standalone and pseudo-distributed setups with managed ZooKeeper true: fully-distributed with unmanaged ZooKeeper Quorum (see hbase-env.sh) </description> </property> </configuration>
(4)配置regionserver文件
編輯 $HBASE_HOME/conf/regionservers 文件,輸入要運行 regionserver 的主機名
hd2 hd3 hd4
(5)配置 backup-masters 文件(master備用節點)
HBase 支持運行多個 master 節點,所以不會出現單點故障的問題,但只能有一個活動的管理節點(active master),其他爲備用節點(backup master),編輯 $HBASE_HOME/conf/backup-masters 文件進行配置備用管理節點的主機名
hd2
(6)配置 hbase-env.sh 文件
編輯 $HBASE_HOME/conf/hbase-env.sh 配置環境變量,因爲本實驗是使用單獨配置的zookeeper,所以,將其中的 HBASE_MANAGES_ZK 設置爲 false
export HBASE_MANAGES_ZK=false
到此,HBase 配置完畢
六、啓動 Apache HBase
可以使用 $HBASE_HOME/bin/start-hbase.sh 指令啓動整個集羣,若是要使用該命令,則集羣的節點必須實現ssh的免密碼登陸,這樣才能到不一樣的節點啓動服務
爲了更加深刻了解HBase啓動過程,本實驗將對各個節點依次啓動進程,經查看 start-hbase.sh 腳本,裏面的啓動順序以下
if [ "$distMode" == 'false' ] then "$bin"/hbase-daemon.sh --config "${HBASE_CONF_DIR}" $commandToRun master $@ else "$bin"/hbase-daemons.sh --config "${HBASE_CONF_DIR}" $commandToRun zookeeper "$bin"/hbase-daemon.sh --config "${HBASE_CONF_DIR}" $commandToRun master "$bin"/hbase-daemons.sh --config "${HBASE_CONF_DIR}" \ --hosts "${HBASE_REGIONSERVERS}" $commandToRun regionserver "$bin"/hbase-daemons.sh --config "${HBASE_CONF_DIR}" \ --hosts "${HBASE_BACKUP_MASTERS}" $commandToRun master-backup fi
也就是使用 hbase-daemon.sh 命令依次啓動 zookeeper、master、regionserver、master-backup
所以,咱們也按照這個順序,在各個節點進行啓動
在啓動HBase以前,必須先啓動Hadoop,以便於HBase初始化、讀取存儲在hdfs上的數據
(1)啓動zookeeper(hd一、hd二、hd3節點)
zkServer.sh start &
(2)啓動hadoop分佈式集羣(集羣的具體配置和節點規劃,見個人另外一篇博客)
# 啓動 journalnode(hd1,hd2,hd3) hdfs journalnode & # 啓動 namenode active(hd1) hdfs namenode & # 啓動 namenode standby(hd2) hdfs namenode & # 啓動ZookeeperFailoverController(hd1,hd2) hdfs zkfc & # 啓動 datanode(hd2,hd3,hd4) hdfs datanode &
(3)啓動hbase master(hd1)
hbase-daemon.sh start master &
(4)啓動hbase regionserver(hd二、hd三、hd4)
hbase-daemon.sh start regionserver &
(5)啓動hbase backup-master(hd2)
hbase-daemon.sh start master --backup &
這裏很奇怪,在 $HBASE_HOME/bin/start-hbase.sh 寫着啓動 backup-master 的命令爲
"$bin"/hbase-daemons.sh --config "${HBASE_CONF_DIR}" \ --hosts "${HBASE_BACKUP_MASTERS}" $commandToRun master-backup
但實際按這個指令執行時,卻報錯提示沒法加載類 master-backup
[ahadoop@1620d6ed305d ~]$ hbase-daemon.sh start master-backup & [5] 1113 [ahadoop@1620d6ed305d ~]$ starting master-backup, logging to /home/ahadoop/hbase-1.2.6/logs/hbase-ahadoop-master-backup-1620d6ed305d.out Error: Could not find or load main class master-backup
最後經查資料,才改用瞭如下命令爲啓動 backup-master
hbase-daemon.sh start master --backup &
通過以上步驟,就已成功地啓動了hbase集羣,可到每一個節點裏面使用 jps 指令查看 hbase 的啓動進程狀況。
啓動後,再查看 hdfs 、zookeeper 的 /hbase 目錄,發現均已初始化,而且已寫入了相應的文件,以下
[ahadoop@ee8319514df6 ~]$ hadoop fs -ls /hbase 17/07/02 13:14:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 7 items drwxr-xr-x - ahadoop supergroup 0 2017-07-02 12:55 /hbase/.tmp drwxr-xr-x - ahadoop supergroup 0 2017-07-02 12:55 /hbase/MasterProcWALs drwxr-xr-x - ahadoop supergroup 0 2017-07-02 13:03 /hbase/WALs drwxr-xr-x - ahadoop supergroup 0 2017-07-02 12:55 /hbase/data -rw-r--r-- 3 ahadoop supergroup 42 2017-07-02 12:55 /hbase/hbase.id -rw-r--r-- 3 ahadoop supergroup 7 2017-07-02 12:55 /hbase/hbase.version drwxr-xr-x - ahadoop supergroup 0 2017-07-02 12:55 /hbase/oldWALs
[ahadoop@31d48048cb1e ~]$ zkCli.sh -server hd1:2181 Connecting to hd1:2181 2017-07-05 11:31:44,663 [myid:] - INFO [main:Environment@100] - Client environment:zookeeper.version=3.4.10-39d3a4f269333c922ed3db283be479f9deacaa0f, built on 03/23/2017 10:13 GMT 2017-07-05 11:31:44,667 [myid:] - INFO [main:Environment@100] - Client environment:host.name=31d48048cb1e 2017-07-05 11:31:44,668 [myid:] - INFO [main:Environment@100] - Client environment:java.version=1.8.0_131 2017-07-05 11:31:44,672 [myid:] - INFO [main:Environment@100] - Client environment:java.vendor=Oracle Corporation 2017-07-05 11:31:44,673 [myid:] - INFO [main:Environment@100] - Client environment:java.home=/usr/java/jdk1.8.0_131/jre 2017-07-05 11:31:44,674 [myid:] - INFO [main:Environment@100] - Client environment:java.class.path=/home/ahadoop/zookeeper-3.4.10/bin/../build/classes:/home/ahadoop/zookeeper-3.4.10/bin/../build/lib/*.jar:/home/ahadoop/zookeeper-3.4.10/bin/../lib/slf4j-log4j12-1.6.1.jar:/home/ahadoop/zookeeper-3.4.10/bin/../lib/slf4j-api-1.6.1.jar:/home/ahadoop/zookeeper-3.4.10/bin/../lib/netty-3.10.5.Final.jar:/home/ahadoop/zookeeper-3.4.10/bin/../lib/log4j-1.2.16.jar:/home/ahadoop/zookeeper-3.4.10/bin/../lib/jline-0.9.94.jar:/home/ahadoop/zookeeper-3.4.10/bin/../zookeeper-3.4.10.jar:/home/ahadoop/zookeeper-3.4.10/bin/../src/java/lib/*.jar:/home/ahadoop/zookeeper-3.4.10/bin/../conf:.:/usr/java/jdk1.8.0_131/lib:/usr/java/jdk1.8.0_131/lib/dt.jar:/usr/java/jdk1.8.0_131/lib/tools.jar:/home/ahadoop/apache-ant-1.10.1/lib 2017-07-05 11:31:44,674 [myid:] - INFO [main:Environment@100] - Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib 2017-07-05 11:31:44,675 [myid:] - INFO [main:Environment@100] - Client environment:java.io.tmpdir=/tmp 2017-07-05 11:31:44,675 [myid:] - INFO [main:Environment@100] - Client environment:java.compiler=<NA> 2017-07-05 11:31:44,678 [myid:] - INFO [main:Environment@100] - Client environment:os.name=Linux 2017-07-05 11:31:44,679 [myid:] - INFO [main:Environment@100] - Client environment:os.arch=amd64 2017-07-05 11:31:44,679 [myid:] - INFO [main:Environment@100] - Client environment:os.version=3.10.105-1.el6.elrepo.x86_64 2017-07-05 11:31:44,680 [myid:] - INFO [main:Environment@100] - Client environment:user.name=ahadoop 2017-07-05 11:31:44,680 [myid:] - INFO [main:Environment@100] - Client environment:user.home=/home/ahadoop 2017-07-05 11:31:44,681 [myid:] - INFO [main:Environment@100] - Client environment:user.dir=/home/ahadoop 2017-07-05 11:31:44,686 [myid:] - INFO [main:ZooKeeper@438] - Initiating client connection, connectString=hd1:2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@799f7e29 Welcome to ZooKeeper! 2017-07-05 11:31:44,724 [myid:] - INFO [main-SendThread(31d48048cb1e:2181):ClientCnxn$SendThread@1032] - Opening socket connection to server 31d48048cb1e/172.17.0.1:2181. Will not attempt to authenticate using SASL (unknown error) JLine support is enabled 2017-07-05 11:31:44,884 [myid:] - INFO [main-SendThread(31d48048cb1e:2181):ClientCnxn$SendThread@876] - Socket connection established to 31d48048cb1e/172.17.0.1:2181, initiating session [zk: hd1:2181(CONNECTED) 0] 2017-07-05 11:31:44,912 [myid:] - INFO [main-SendThread(31d48048cb1e:2181):ClientCnxn$SendThread@1299] - Session establishment complete on server 31d48048cb1e/172.17.0.1:2181, sessionid = 0x15d10c18fc70002, negotiated timeout = 30000 WATCHER:: WatchedEvent state:SyncConnected type:None path:null [zk: hd1:2181(CONNECTED) 1] ls /hbase [replication, meta-region-server, rs, splitWAL, backup-masters, table-lock, flush-table-proc, region-in-transition, online-snapshot, running, recovering-regions, draining, hbaseid, table]
七、HBase 測試使用
使用hbase shell進入到 hbase 的交互命令行界面,這時可進行測試使用
hbase shell
(1)查看集羣狀態和節點數量
hbase(main):001:0> status 1 active master, 1 backup masters, 4 servers, 0 dead, 0.5000 average load
(2)建立表
hbase(main):002:0> create 'testtable','c1','c2' 0 row(s) in 1.4850 seconds => Hbase::Table - testtable
hbase建立表create命令語法爲:表名、列名一、列名二、列名3……
(3)查看錶
hbase(main):003:0> list 'testtable' TABLE testtable 1 row(s) in 0.0400 seconds => ["testtable"]
(4)導入數據
hbase(main):004:0> put 'testtable','row1','c1','row1_c1_value' 0 row(s) in 0.2230 seconds hbase(main):005:0> put 'testtable','row2','c2:s1','row1_c2_s1_value' 0 row(s) in 0.0310 seconds hbase(main):006:0> put 'testtable','row2','c2:s2','row1_c2_s2_value' 0 row(s) in 0.0170 seconds
導入數據的命令put的語法爲表名、行值、列名(列名可加冒號,表示這個列簇下面還有子列)、列數據
(5)全表掃描數據
hbase(main):007:0> scan 'testtable' ROW COLUMN+CELL row1 column=c1:, timestamp=1499225862922, value=row1_c1_value row2 column=c2:s1, timestamp=1499225869471, value=row1_c2_s1_value row2 column=c2:s2, timestamp=1499225870375, value=row1_c2_s2_value 2 row(s) in 0.0820 seconds
(6)根據條件查詢數據
hbase(main):008:0> get 'testtable','row1' COLUMN CELL c1: timestamp=1499225862922, value=row1_c1_value 1 row(s) in 0.0560 seconds hbase(main):009:0> get 'testtable','row2' COLUMN CELL c2:s1 timestamp=1499225869471, value=row1_c2_s1_value c2:s2 timestamp=1499225870375, value=row1_c2_s2_value 2 row(s) in 0.0350 seconds
(7)表失效
使用 disable 命令可將某張表失效,失效後該表將不能使用,例如執行全表掃描操做,會報錯,以下
hbase(main):010:0> disable 'testtable' 0 row(s) in 2.3090 seconds hbase(main):011:0> scan 'testtable' ROW COLUMN+CELL ERROR: testtable is disabled. Here is some help for this command: Scan a table; pass table name and optionally a dictionary of scanner specifications. Scanner specifications may include one or more of: TIMERANGE, FILTER, LIMIT, STARTROW, STOPROW, ROWPREFIXFILTER, TIMESTAMP, MAXLENGTH or COLUMNS, CACHE or RAW, VERSIONS, ALL_METRICS or METRICS If no columns are specified, all columns will be scanned. To scan all members of a column family, leave the qualifier empty as in 'col_family'. The filter can be specified in two ways: 1. Using a filterString - more information on this is available in the Filter Language document attached to the HBASE-4176 JIRA 2. Using the entire package name of the filter. If you wish to see metrics regarding the execution of the scan, the ALL_METRICS boolean should be set to true. Alternatively, if you would prefer to see only a subset of the metrics, the METRICS array can be defined to include the names of only the metrics you care about. Some examples: hbase> scan 'hbase:meta' hbase> scan 'hbase:meta', {COLUMNS => 'info:regioninfo'} hbase> scan 'ns1:t1', {COLUMNS => ['c1', 'c2'], LIMIT => 10, STARTROW => 'xyz'} hbase> scan 't1', {COLUMNS => ['c1', 'c2'], LIMIT => 10, STARTROW => 'xyz'} hbase> scan 't1', {COLUMNS => 'c1', TIMERANGE => [1303668804, 1303668904]} hbase> scan 't1', {REVERSED => true} hbase> scan 't1', {ALL_METRICS => true} hbase> scan 't1', {METRICS => ['RPC_RETRIES', 'ROWS_FILTERED']} hbase> scan 't1', {ROWPREFIXFILTER => 'row2', FILTER => " (QualifierFilter (>=, 'binary:xyz')) AND (TimestampsFilter ( 123, 456))"} hbase> scan 't1', {FILTER => org.apache.hadoop.hbase.filter.ColumnPaginationFilter.new(1, 0)} hbase> scan 't1', {CONSISTENCY => 'TIMELINE'} For setting the Operation Attributes hbase> scan 't1', { COLUMNS => ['c1', 'c2'], ATTRIBUTES => {'mykey' => 'myvalue'}} hbase> scan 't1', { COLUMNS => ['c1', 'c2'], AUTHORIZATIONS => ['PRIVATE','SECRET']} For experts, there is an additional option -- CACHE_BLOCKS -- which switches block caching for the scanner on (true) or off (false). By default it is enabled. Examples: hbase> scan 't1', {COLUMNS => ['c1', 'c2'], CACHE_BLOCKS => false} Also for experts, there is an advanced option -- RAW -- which instructs the scanner to return all cells (including delete markers and uncollected deleted cells). This option cannot be combined with requesting specific COLUMNS. Disabled by default. Example: hbase> scan 't1', {RAW => true, VERSIONS => 10} Besides the default 'toStringBinary' format, 'scan' supports custom formatting by column. A user can define a FORMATTER by adding it to the column name in the scan specification. The FORMATTER can be stipulated: 1. either as a org.apache.hadoop.hbase.util.Bytes method name (e.g, toInt, toString) 2. or as a custom class followed by method name: e.g. 'c(MyFormatterClass).format'. Example formatting cf:qualifier1 and cf:qualifier2 both as Integers: hbase> scan 't1', {COLUMNS => ['cf:qualifier1:toInt', 'cf:qualifier2:c(org.apache.hadoop.hbase.util.Bytes).toInt'] } Note that you can specify a FORMATTER by column only (cf:qualifier). You cannot specify a FORMATTER for all columns of a column family. Scan can also be used directly from a table, by first getting a reference to a table, like such: hbase> t = get_table 't' hbase> t.scan Note in the above situation, you can still provide all the filtering, columns, options, etc as described above.
(8)表從新生效
使用 enable 可以使表從新生效,表生效後,便可對錶進行操做,例如進行全表掃描操做
hbase(main):012:0> enable 'testtable' 0 row(s) in 1.2800 seconds hbase(main):013:0> scan 'testtable' ROW COLUMN+CELL row1 column=c1:, timestamp=1499225862922, value=row1_c1_value row2 column=c2:s1, timestamp=1499225869471, value=row1_c2_s1_value row2 column=c2:s2, timestamp=1499225870375, value=row1_c2_s2_value 2 row(s) in 0.0590 seconds
(9)刪除數據表
使用drop命令對錶進行刪除,但只有表在失效的狀況下,才能進行刪除,不然會報錯,以下
hbase(main):014:0> drop 'testtable' ERROR: Table testtable is enabled. Disable it first. Here is some help for this command: Drop the named table. Table must first be disabled: hbase> drop 't1' hbase> drop 'ns1:t1'
先對錶失效,而後再刪除,則可順序刪除表
hbase(main):008:0> disable 'testtable' 0 row(s) in 2.3170 seconds hbase(main):012:0> drop 'testtable' 0 row(s) in 1.2740 seconds
(10)退出 hbase shell
quit
以上就是使用hbase shell進行簡單的測試和使用
八、HBase 管理頁面
HBase 還提供了管理頁面,供用戶查看,可更加方便地查看集羣狀態
在瀏覽器中輸入 http://172.17.0.1:16010 地址(默認端口爲 16010),便可進入到管理頁面,以下圖
查看HBase裏面的表信息,點擊上方的菜單欄 Table Details 可查看全部表信息,以下圖
在主頁的 Tables 下面也會列出表名出來,點擊可查看某張表的信息,以下圖
在 Tables 中點擊 System Tables 查看系統表,主要是元數據、命名空間,以下圖
以上就是Apache HBase集羣配置,以及測試使用的詳細過程,歡迎你們批評指正,共同交流進步。
歡迎關注本人的微信公衆號「大數據與人工智能Lab」(BigdataAILab),獲取更多資訊