127.0.0.1 localhost 127.0.0.1 ubuntu.ubuntu-domain ubuntu
Quickstart 會讓你啓動和運行一個單節點單機HBase。html
這部分描述單節點單機HBase的配置。一個單例擁有全部的HBase守護線程—Master,RegionServers和ZooKeeper,運行一個單獨JVM持久化到本地文件系統。這是咱們最基礎的部署文檔。咱們將會向你展現如何經過hbase shell CLI在HBase中建立一個表格,在表中插入行,執行put和scan操做,讓表使能和啓動和中止HBase等等操做。java
除了下載HBase,這個過程大概須要不到10分鐘地時間。node
HBase 0.94.x以前的版本但願回送IP地址爲127.0.0.1,而UBuntu和其餘發行版默認是127.0.1.1,這將會給你形成麻煩。查看 Why does HBase care about /etc/hosts? 得到更多細節ios 在Ubuntu上運行0.94.x以前版本的HBase,/etc/hosts文檔應該如下面所寫的模板來保證正常運行web 127.0.0.1 localhostshell 127.0.0.1 ubuntu.ubuntu-domain ubuntu數據庫 hbase-0.96.0版本以後的已經修復了。apache |
HBase 須要安裝JDK。查看 Java 來得到每一個HBase版本所支持的JDK版本。ubuntu
過程: 下載, 配置, 和啓動單機模式HBase瀏覽器
JAVA_HOME=/usr
6. 編輯conf/hbase-site.xml,該文檔是HBase配置文件。在這個時間點你只須要在本地文件系統中指定HBase和ZooKeeper寫數據的目錄。默認狀況下,會在/tmp目錄下建立一個新目錄。許多服務器會配置爲一旦reboot那麼會刪除/tmp目錄下的內容,因此你應該在別的地方存儲數據。接下來的配置將會存儲HBase的數據在hbase目錄下,放在用戶testuser的主目錄下。新安裝的HBase下<configuration> 標籤裏面的內容是空,粘貼 <property>標籤到<configuration> 下進行配置。
Example 1. Example hbase-site.xml for Standalone HBase
<configuration>
<property>
<name>hbase.rootdir</name>
<value>file:///home/testuser/hbase</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/testuser/zookeeper</value>
</property>
</configuration>
你不須要建立HBase數據目錄。HBase將會爲你建立。若是你本身建立了,HBase將會試圖一個你並不想要的遷移。
上面例子中hbase.rootdir 指向本地文件系統的目錄。咱們用‘file:/’前綴來表示本地文件系統。將HBase的home目錄配置在已有的HDFS實例上,設置hbase.rootdir 指向你的HDFS實例,例如 hdfs://namenode.example.org:8020/hbase.關於這個變量的細節,請查看下面在HDFS上部署單機HBase部分。 |
7. bin/start-hbase.sh腳本將提供一個簡便的方式來啓動HBase。發出這個命令而且運行良好的話,一條標準的成功啓動的信息會打印在控制檯上。你能夠經過jps命令來判斷你是否已經運行一個HMaster進程。在單價模式下,HBase會在這個單獨的JVM中啓動HMater,HRegionServer和ZooKeeper守護進程。在 http://localhost:16010查看HBase WebUI 。
須要安裝Java而且使之可用。若是你已經安裝了,可是卻報錯提示你還沒有安裝,可能安裝在一個非標準路徑下,編輯conf/hbase-env.sh而且修改JAVA_HOME,將包含bin/java的目錄賦給它 |
使在你HBase安裝目錄下的bin/ 下用hbase shell命令行來鏈接HBase。在這個例子中,會打印一些你在啓動的HBase shell用時遺漏的用法和版本信息。HBase Shell用>符號來表示結束。
$ ./bin/hbase shell
hbase(main):001:0>
輸出help按下Enter,顯示HBase Shell的基礎使用信息,以及一些示例命令。須要注意的是表名,行,列都必須用引用符號。
使用create命令來建立一個新表。你必須指定表名和列族名
hbase(main):001:0> create 'test', 'cf'
0 row(s) in 0.4170 seconds
=> Hbase::Table - test
使用list命令
hbase(main):002:0> list 'test'
TABLE
test
1 row(s) in 0.0180 seconds
=> ["test"]
使用put命令來插入數據.
hbase(main):003:0> put 'test', 'row1', 'cf:a', 'value1'
0 row(s) in 0.0850 seconds
hbase(main):004:0> put 'test', 'row2', 'cf:b', 'value2'
0 row(s) in 0.0110 seconds
hbase(main):005:0> put 'test', 'row3', 'cf:c', 'value3'
0 row(s) in 0.0100 seconds
在這裏,咱們每次插入一條數據,總共三條。第一次將value1插入到row1,列cf:a中。HBase中的列以列族名爲前綴,例子中是cf,後面是冒號和列的限定符後綴,例子中是a。
使用scan命令行來掃描表格的數據。你能夠限制你的掃描,可是此時此刻,全部的數據都被獲取了。
hbase(main):006:0> scan 'test'
ROW COLUMN+CELL
row1 column=cf:a, timestamp=1421762485768, value=value1
row2 column=cf:b, timestamp=1421762491785, value=value2
row3 column=cf:c, timestamp=1421762496210, value=value3
3 row(s) in 0.0230 seconds
用get命令一次獲取一行數據
hbase(main):007:0> get 'test', 'row1'
COLUMN CELL
cf:a timestamp=1421762485768, value=value1
1 row(s) in 0.0350 seconds
若是你想要刪除一個表或者改變它的配置,以及其餘一些狀況,你首先須要用disable命令來禁用表。
hbase(main):008:0> disable 'test'
0 row(s) in 1.1820 seconds
hbase(main):009:0> enable 'test'
0 row(s) in 0.1770 seconds
在啓用‘test’以後再次禁用‘test’
hbase(main):010:0> disable 'test'
0 row(s) in 1.1820 seconds
用drop命令來刪除表
hbase(main):011:0> drop 'test'
0 row(s) in 0.1370 seconds
使用exit來與HBase斷開鏈接,但HBase仍然在後臺運行
過程: 關閉 HBase
跟bin/start-hbase.sh腳本同樣方便地啓動HBase,用bin/stop-hbase.sh腳原本中止它。
$ ./bin/stop-hbase.sh
stopping hbase....................
$
在發出這個命令以後,將花費幾分鐘的時間來關閉。使用jps來確保HMaster和HRegionServer已經關閉。
上面的內容已經向你展現瞭如何啓動和中止一個單機HBase。在下一部分咱們將提供其餘模式的部署。
在經過 quickstart 啓動了單機模式以後,你能夠從新配置來運行僞分佈式模式。僞分佈式模式意味着HBase仍然運行在一個節點上,可是每一個HBase的守護進程(HMaster, HRegionServer, and ZooKeeper)運行在單獨的進程中:在單機模式中全部的守護進程都運行在一個JVM實例中。默認狀況下,除非你配置像 quickstart中所描述的配置 hbase.rootdir屬性,你的數據仍然存儲在/tmp/中。在此次演示中,咱們將數據存儲在HDFS中,確保你HDFS是可用的。你能夠跳過HDFS配置繼續將數據存儲在本地文件系統中
Hadoop配置 這個過程假設你已經在本地系統或者遠程系統中配置好Hadoop和HDFS,而且可以運行和確保可用。也假定你使用Hadoop2. Setting up a Single Node Cluster 將引導如何搭建單節點Hadoop |
若是你已經完成 quickstart 中的指導而且HBase仍然在運行,請中止他。這個過程將建立一個新的目錄來儲存它的數據,因此以前你建立的數據庫將會丟失。
編輯hbase-site.xml 進行配置. 第一,添加下面 property來 指導 HBase運行分佈式模式, 每一個守護進程運行在一個JVM上。
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
接下來, 將hbase.rootdir由本地系統改成HDFS實例的地址, 使用 hdfs://// URI 語法. 在這個例子當中, HDFS 運行在端口 8020上.
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:8020/hbase</value>
</property>
你不須要在HDFS上建立一個目錄。HBase會本身建立。若是你本身建立了,HBase會試圖作一些你並不想要的遷移。
使用 bin/start-hbase.sh 命令來啓動HBase. 若是你的系統配置是正確的話,使用jps命令將會看到HMaster和HRegionServer已經運行。
若是全部都運行正確的話,HBase將會在HDFS中建立它的目錄。在上面的配置中,它將存儲在HDFS的/hbase中。你能夠在Hadoop的bin/下使用hadoop fs命令行來列出這個目錄下的全部文件。
$ ./bin/hadoop fs -ls /hbase
Found 7 items
drwxr-xr-x - hbase users 0 2014-06-25 18:58 /hbase/.tmp
drwxr-xr-x - hbase users 0 2014-06-25 21:49 /hbase/WALs
drwxr-xr-x - hbase users 0 2014-06-25 18:48 /hbase/corrupt
drwxr-xr-x - hbase users 0 2014-06-25 18:58 /hbase/data
-rw-r--r-- 3 hbase users 42 2014-06-25 18:41 /hbase/hbase.id
-rw-r--r-- 3 hbase users 7 2014-06-25 18:41 /hbase/hbase.version
drwxr-xr-x - hbase users 0 2014-06-25 21:49 /hbase/oldWALs
你可使用HBase Shell來建立一個表格,插入數據,掃描和獲取數據,使用方法和 shell exercises所展現的同樣。
在同一個硬件環境上運行多個HMaster實例的狀況不能出如今生產環境,一樣僞分佈式也是不容許的。這個步驟只適用於測試和學習 |
HMaster服務器控制HBase 集羣。你能夠啓動9個HMaster服務器,那麼10個HMaster一塊兒執行計算。使用local-master-backup.sh來啓動一個HMaster備用服務器。你想要啓動的每一個備用服務器都要添加一個表明master的端口參數。每一個備用HMaster使用三個端口(默認是16010,16020,16030)端口都是以默認默認端口進行偏移的,偏移量爲2的話,備用HMaster的端口會是16012,16022,16032。下面的指令用來啓動3個端口分別爲16012/16022/1603二、 16013/16023/16033和16015/16025/16035的HMaster。
$ ./bin/local-master-backup.sh 2 3 5
想要殺掉一個備用master而不是關掉整個進程,你須要找到他的ID(PID)。PID存儲在一個名字爲/tmp/hbase-USER-X-master.pid的文件中。該文件裏面的內容只有PID。你可使用kill-9命令來殺掉PID。下面的命令殺掉端口爲偏移量1的master,而集羣仍然運行:
$ cat /tmp/hbase-testuser-1-master.pid |xargs kill -9
HRegionServer被HMaster指導管理它StoreFiles裏的數據。一般來講,集羣中的每一個節點都運行一個HReigionServer,運行多個HRegionServer在同一系統當中能夠用來測試僞分佈式模式。使用local-regionservers.sh命令運行多個RegionServers。跟local-master-backup.sh同樣,爲每一個實例提供端口偏移量。每一個RegionServer須要兩個端口,默認端口爲16020和16030。然而,1.0.0版本的基本端口已經被HMaster所使用,因此RegionServer沒法使用默認端口。全部基本端口改成16200和16300。你能夠在一個服務中運行99額外RegionServer而不是一個HMaster或者HMaster。下面的命令用來啓動端口從16202/16302開始連續的額外的RegionServer。
$ .bin/local-regionservers.sh start 2 3 4 5
使用local-regionservers.sh 命令 和要關閉的server的偏移量參數來手動中止RegionServer。
$ .bin/local-regionservers.sh stop 3
你可使用 quickstart 中闡述的命令 bin/stop-hbase.sh 來中止HBase。
事實上,你須要一個全分佈式的配置來測試完整的HBase而且將它用在真實世界的應用場景中。在一個分佈式配置中,集羣包括多個節點,每一個節點運行一個或者多個HBase守護進程。這些包括主要的和備用Master實例,多個ZooKeeper節點和多個RegionServer節點。
這個高級配置比quickstart中多添加了兩個節點,結構以下:
Table 1. Distributed Cluster Demo Architecture |
|||
Node Name |
Master |
ZooKeeper |
RegionServer |
node-a.example.com |
yes |
yes |
no |
node-b.example.com |
backup |
yes |
yes |
node-c.example.com |
no |
yes |
yes |
這個快速啓動設定每一個節點都是一個虛擬機並且他們在一樣的網絡上。它搭建在以前的quickstart和Pseudo-Distributed Local Install之上,設定你以前配置系統爲node-a。在繼續操做以前請中止HBase。
防火牆也應該關閉確保所節點都可以互相通訊。若是你看到no route to host的報錯,檢查你的防火牆。 |
node-a 須要登陸到node-b和node-c來啓動守護進程。最簡單的實現方法是在全部的主機上使用相同用戶名,配置無密鑰SSH登錄。
登錄那個要運行HBase的用戶,使用下面命令生成一個SSH密鑰對:
$ ssh-keygen -t rsa
若是該命令成功執行,那麼密鑰對的路徑就會打印到標準輸出。公鑰的默認名字爲 id_rsa.pub。
在node-b和node-c,登錄HBase用戶而且在用戶的home目錄下建立.ssh/目錄,若是該目錄不存在的話。若是已經存在,要意識到他可能已經包含其餘密鑰了。
使用scp或者其餘安全的方式將密鑰安全地從node-a複製到其餘每一個節點上。每一個節點上若是不存在 .ssh/authorized_keys 這個文件的話,那麼建立一個,而後將id_rsa.pub 文件的內容添加到該文件末端。須要說明的是你須要在node-a作一樣的操做。
$ cat id_rsa.pub >> ~/.ssh/authorized_keys
若是一切運行順利的話,那麼你可使用SSH用相同的用戶名而不須要密鑰的狀況下登錄其餘節點。
node-a將會運行主master和ZooKeeper進程,可是沒有RegionServers。在node-a將RegionServer停掉。
儘管你想要在node-a運行一個RegionServer,你應該給他指定一個主機名便於其餘服務能夠和它通信。在這個例子當中,主機名爲node-a.example.com。這使得你能夠分佈配置到集羣每一個節點來避免主機名衝突。保存文檔。
因此在conf/目錄下建立一個名爲backup-master的新文件,而後添加一行node-b的主機名。在這個示例當中,主機名爲node-b.example.com
事實上,你應該認真的配置你的ZooKeeper。你能夠在 zookeeper找到更多關於ZooKeeper的細節。這個配置會指導HBase的啓動和管理集羣的每一個節點中的ZooKeeper實例。
On node-a, edit conf/hbase-site.xml and add the following properties.
<property>
<name>hbase.zookeeper.quorum</name>
<value>node-a.example.com,node-b.example.com,node-c.example.com</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/usr/local/zookeeper</value>
</property>
node-b 將會運行一個備用master 服務器和一個ZooKeeper 實例.
在node-b下下載和解壓HBase,跟你在quickstart和僞分佈式中所作的同樣。
集羣中的每一個節點須要相同的配置信息。複製conf/下的內容到node-b和node-c下conf/。
若是你在以前測試中忘記中止HBase,就會出錯。用jps命令行檢查HBase是否運行。看看HMaster,HRegionServer和HQuorumPeer是否存在,若是存在,那麼殺掉。
在node-a上,運行start-hbase.sh命令。就會打出相似下面的輸出:
$ bin/start-hbase.sh
node-c.example.com: starting zookeeper, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-zookeeper-node-c.example.com.out
node-a.example.com: starting zookeeper, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-zookeeper-node-a.example.com.out
node-b.example.com: starting zookeeper, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-zookeeper-node-b.example.com.out
starting master, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-master-node-a.example.com.out
node-c.example.com: starting regionserver, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-regionserver-node-c.example.com.out
node-b.example.com: starting regionserver, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-regionserver-node-b.example.com.out
node-b.example.com: starting master, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-master-nodeb.example.com.out
先啓動ZooKeeper,接着master,而後RegionServers,最後是備用masters。
在集羣中的每一個節點,運行jps命令檢查每一個服務是否正常運行。你可能會看到其餘用於其餘目的Java進程也運行着。
Example 2. node-a jps Output
$ jps
20355 Jps
20071 HQuorumPeer
20137 HMaster
Example 3. node-b jps Output
$ jps
15930 HRegionServer
16194 Jps
15838 HQuorumPeer
16010 HMaster
Example 4. node-a jps Output
$ jps
13901 Jps
13639 HQuorumPeer
13737 HRegionServer
ZooKeeper 進程名字 HQuorumPeer 進程就是ZooKeeper實例由HBase啓動用來控制HBase的。若是你在這裏使用ZooKeeper,那麼會限制集羣中每一個節點有一個實例而且只適用於測試。若是ZooKeeper運行在HBase以外,那麼進程名爲QuorumPeer。請到 zookeeper查看更多關於ZooKeeper配置包括若是用外部ZooKeeper控制HBase。 |
Web訪問端口改變 |
若是HBase的版本高於0.98.x,那麼登錄master的端口由60010改成16010,登錄RegionServer的端口由60030改成16030。
若是配置都正確的話,你應該可以使用瀏覽器經過 http://node-a.example.com:16010/ 鏈接Master,經過 http://node-b.example.com:16010/ 鏈接備用Master。若是你只能經過本地主機登錄而其餘主機不能,檢查你的防火牆規則。你能夠經過ip:16030來鏈接RegionServers,也能夠在Master的Web界面中點擊相關連接來登錄。
正如你配置的三個節點,事情並不老是如你所想。你能夠經過殺死進程觀察log來看看當主Master或者RegionServer消失時發生了什麼?
下面是原文
Quickstart will get you up and running on a single-node, standalone instance of HBase.
This section describes the setup of a single-node standalone HBase. A standalone instance has all HBase daemons — the Master, RegionServers, and ZooKeeper — running in a single JVM persisting to the local filesystem. It is our most basic deploy profile. We will show you how to create a table in HBase using the hbase shell
CLI, insert rows into the table, perform put and scan operations against the table, enable or disable the table, and start and stop HBase.
Apart from downloading HBase, this procedure should take less than 10 minutes.
Prior to HBase 0.94.x, HBase expected the loopback IP address to be 127.0.0.1. Ubuntu and some other distributions default to 127.0.1.1 and this will cause problems for you. See Why does HBase care about /etc/hosts? for detail The following /etc/hosts file works correctly for HBase 0.94.x and earlier, on Ubuntu. Use this as a template if you run into trouble. 127.0.0.1 localhost 127.0.0.1 ubuntu.ubuntu-domain ubuntu This issue has been fixed in hbase-0.96.0 and beyond. |
HBase requires that a JDK be installed. See Java for information about supported JDK versions.
Choose a download site from this list of Apache Download Mirrors. Click on the suggested top link. This will take you to a mirror of HBase Releases. Click on the folder named stable and then download the binary file that ends in .tar.gz to your local filesystem. Do not download the file ending in src.tar.gz for now.
Extract the downloaded file, and change to the newly-created directory.
$ tar xzvf hbase-2.0.0-SNAPSHOT-bin.tar.gz $ cd hbase-2.0.0-SNAPSHOT/
You are required to set the JAVA_HOME
environment variable before starting HBase. You can set the variable via your operating system’s usual mechanism, but HBase provides a central mechanism, conf/hbase-env.sh. Edit this file, uncomment the line starting with JAVA_HOME
, and set it to the appropriate location for your operating system. The JAVA_HOME
variable should be set to a directory which contains the executable file bin/java. Most modern Linux operating systems provide a mechanism, such as /usr/bin/alternatives on RHEL or CentOS, for transparently switching between versions of executables such as Java. In this case, you can set JAVA_HOME
to the directory containing the symbolic link to bin/java, which is usually /usr.
JAVA_HOME=/usr
Edit conf/hbase-site.xml, which is the main HBase configuration file. At this time, you only need to specify the directory on the local filesystem where HBase and ZooKeeper write data. By default, a new directory is created under /tmp. Many servers are configured to delete the contents of /tmp upon reboot, so you should store the data elsewhere. The following configuration will store HBase’s data in the hbase directory, in the home directory of the user called testuser
. Paste the <property>
tags beneath the <configuration>
tags, which should be empty in a new HBase install.
<configuration> <property> <name>hbase.rootdir</name> <value>file:///home/testuser/hbase</value> </property> <property> <name>hbase.zookeeper.property.dataDir</name> <value>/home/testuser/zookeeper</value> </property> </configuration>
You do not need to create the HBase data directory. HBase will do this for you. If you create the directory, HBase will attempt to do a migration, which is not what you want.
The hbase.rootdir in the above example points to a directory in the local filesystem. The 'file:/' prefix is how we denote local filesystem. To home HBase on an existing instance of HDFS, set the hbase.rootdir to point at a directory up on your instance: e.g. hdfs://namenode.example.org:8020/hbase. For more on this variant, see the section below on Standalone HBase over HDFS. |
The bin/start-hbase.sh script is provided as a convenient way to start HBase. Issue the command, and if all goes well, a message is logged to standard output showing that HBase started successfully. You can use the jps
command to verify that you have one running process called HMaster
. In standalone mode HBase runs all daemons within this single JVM, i.e. the HMaster, a single HRegionServer, and the ZooKeeper daemon. Go to http://localhost:16010 to view the HBase Web UI.
Java needs to be installed and available. If you get an error indicating that Java is not installed, but it is on your system, perhaps in a non-standard location, edit the conf/hbase-env.sh file and modify the JAVA_HOME setting to point to the directory that contains bin/javayour system. |
Connect to HBase.
Connect to your running instance of HBase using the hbase shell
command, located in the bin/ directory of your HBase install. In this example, some usage and version information that is printed when you start HBase Shell has been omitted. The HBase Shell prompt ends with a >
character.
$ ./bin/hbase shell hbase(main):001:0>
Display HBase Shell Help Text.
Type help
and press Enter, to display some basic usage information for HBase Shell, as well as several example commands. Notice that table names, rows, columns all must be enclosed in quote characters.
Create a table.
Use the create
command to create a new table. You must specify the table name and the ColumnFamily name.
hbase(main):001:0> create 'test', 'cf' 0 row(s) in 0.4170 seconds => Hbase::Table - test
List Information About your Table
Use the list
command to
hbase(main):002:0> list 'test' TABLE test 1 row(s) in 0.0180 seconds => ["test"]
Put data into your table.
To put data into your table, use the put
command.
hbase(main):003:0> put 'test', 'row1', 'cf:a', 'value1' 0 row(s) in 0.0850 seconds hbase(main):004:0> put 'test', 'row2', 'cf:b', 'value2' 0 row(s) in 0.0110 seconds hbase(main):005:0> put 'test', 'row3', 'cf:c', 'value3' 0 row(s) in 0.0100 seconds
Here, we insert three values, one at a time. The first insert is at row1
, column cf:a
, with a value of value1
. Columns in HBase are comprised of a column family prefix, cf
in this example, followed by a colon and then a column qualifier suffix, a
in this case.
Scan the table for all data at once.
One of the ways to get data from HBase is to scan. Use the scan
command to scan the table for data. You can limit your scan, but for now, all data is fetched.
hbase(main):006:0> scan 'test' ROW COLUMN+CELL row1 column=cf:a, timestamp=1421762485768, value=value1 row2 column=cf:b, timestamp=1421762491785, value=value2 row3 column=cf:c, timestamp=1421762496210, value=value3 3 row(s) in 0.0230 seconds
Get a single row of data.
To get a single row of data at a time, use the get
command.
hbase(main):007:0> get 'test', 'row1' COLUMN CELL cf:a timestamp=1421762485768, value=value1 1 row(s) in 0.0350 seconds
Disable a table.
If you want to delete a table or change its settings, as well as in some other situations, you need to disable the table first, using the disable
command. You can re-enable it using the enable
command.
hbase(main):008:0> disable 'test' 0 row(s) in 1.1820 seconds hbase(main):009:0> enable 'test' 0 row(s) in 0.1770 seconds
Disable the table again if you tested the enable
command above:
hbase(main):010:0> disable 'test' 0 row(s) in 1.1820 seconds
Drop the table.
To drop (delete) a table, use the drop
command.
hbase(main):011:0> drop 'test' 0 row(s) in 0.1370 seconds
Exit the HBase Shell.
To exit the HBase Shell and disconnect from your cluster, use the quit
command. HBase is still running in the background.
In the same way that the bin/start-hbase.sh script is provided to conveniently start all HBase daemons, the bin/stop-hbase.sh script stops them.
$ ./bin/stop-hbase.sh stopping hbase.................... $
After issuing the command, it can take several minutes for the processes to shut down. Use the jps
to be sure that the HMaster and HRegionServer processes are shut down.
The above has shown you how to start and stop a standalone instance of HBase. In the next sections we give a quick overview of other modes of hbase deploy.
After working your way through quickstart standalone mode, you can re-configure HBase to run in pseudo-distributed mode. Pseudo-distributed mode means that HBase still runs completely on a single host, but each HBase daemon (HMaster, HRegionServer, and ZooKeeper) runs as a separate process: in standalone mode all daemons ran in one jvm process/instance. By default, unless you configure the hbase.rootdir
property as described in quickstart, your data is still stored in /tmp/. In this walk-through, we store your data in HDFS instead, assuming you have HDFS available. You can skip the HDFS configuration to continue storing your data in the local filesystem.
Hadoop Configuration
This procedure assumes that you have configured Hadoop and HDFS on your local system and/or a remote system, and that they are running and available. It also assumes you are using Hadoop 2. The guide onSetting up a Single Node Cluster in the Hadoop documentation is a good starting point. |
Stop HBase if it is running.
If you have just finished quickstart and HBase is still running, stop it. This procedure will create a totally new directory where HBase will store its data, so any databases you created before will be lost.
Configure HBase.
Edit the hbase-site.xml configuration. First, add the following property. which directs HBase to run in distributed mode, with one JVM instance per daemon.
<property> <name>hbase.cluster.distributed</name> <value>true</value> </property>
Next, change the hbase.rootdir
from the local filesystem to the address of your HDFS instance, using the hdfs:////
URI syntax. In this example, HDFS is running on the localhost at port 8020.
<property> <name>hbase.rootdir</name> <value>hdfs://localhost:8020/hbase</value> </property>
You do not need to create the directory in HDFS. HBase will do this for you. If you create the directory, HBase will attempt to do a migration, which is not what you want.
Start HBase.
Use the bin/start-hbase.sh command to start HBase. If your system is configured correctly, the jps
command should show the HMaster and HRegionServer processes running.
Check the HBase directory in HDFS.
If everything worked correctly, HBase created its directory in HDFS. In the configuration above, it is stored in /hbase/ on HDFS. You can use the hadoop fs
command in Hadoop’s bin/ directory to list this directory.
$ ./bin/hadoop fs -ls /hbase Found 7 items drwxr-xr-x - hbase users 0 2014-06-25 18:58 /hbase/.tmp drwxr-xr-x - hbase users 0 2014-06-25 21:49 /hbase/WALs drwxr-xr-x - hbase users 0 2014-06-25 18:48 /hbase/corrupt drwxr-xr-x - hbase users 0 2014-06-25 18:58 /hbase/data -rw-r--r-- 3 hbase users 42 2014-06-25 18:41 /hbase/hbase.id -rw-r--r-- 3 hbase users 7 2014-06-25 18:41 /hbase/hbase.version drwxr-xr-x - hbase users 0 2014-06-25 21:49 /hbase/oldWALs
Create a table and populate it with data.
You can use the HBase Shell to create a table, populate it with data, scan and get values from it, using the same procedure as in shell exercises.
Start and stop a backup HBase Master (HMaster) server.
Running multiple HMaster instances on the same hardware does not make sense in a production environment, in the same way that running a pseudo-distributed cluster does not make sense for production. This step is offered for testing and learning purposes only. |
The HMaster server controls the HBase cluster. You can start up to 9 backup HMaster servers, which makes 10 total HMasters, counting the primary. To start a backup HMaster, use the local-master-backup.sh
. For each backup master you want to start, add a parameter representing the port offset for that master. Each HMaster uses three ports (16010, 16020, and 16030 by default). The port offset is added to these ports, so using an offset of 2, the backup HMaster would use ports 16012, 16022, and 16032. The following command starts 3 backup servers using ports 16012/16022/16032, 16013/16023/16033, and 16015/16025/16035.
$ ./bin/local-master-backup.sh 2 3 5
To kill a backup master without killing the entire cluster, you need to find its process ID (PID). The PID is stored in a file with a name like /tmp/hbase-USER-X-master.pid. The only contents of the file is the PID. You can use the kill -9
command to kill that PID. The following command will kill the master with port offset 1, but leave the cluster running:
$ cat /tmp/hbase-testuser-1-master.pid |xargs kill -9
Start and stop additional RegionServers
The HRegionServer manages the data in its StoreFiles as directed by the HMaster. Generally, one HRegionServer runs per node in the cluster. Running multiple HRegionServers on the same system can be useful for testing in pseudo-distributed mode. The local-regionservers.sh
command allows you to run multiple RegionServers. It works in a similar way to the local-master-backup.sh
command, in that each parameter you provide represents the port offset for an instance. Each RegionServer requires two ports, and the default ports are 16020 and 16030. However, the base ports for additional RegionServers are not the default ports since the default ports are used by the HMaster, which is also a RegionServer since HBase version 1.0.0. The base ports are 16200 and 16300 instead. You can run 99 additional RegionServers that are not a HMaster or backup HMaster, on a server. The following command starts four additional RegionServers, running on sequential ports starting at 16202/16302 (base ports 16200/16300 plus 2).
$ .bin/local-regionservers.sh start 2 3 4 5
To stop a RegionServer manually, use the local-regionservers.sh
command with the stop
parameter and the offset of the server to stop.
$ .bin/local-regionservers.sh stop 3
Stop HBase.
You can stop HBase the same way as in the quickstart procedure, using the bin/stop-hbase.sh command.
In reality, you need a fully-distributed configuration to fully test HBase and to use it in real-world scenarios. In a distributed configuration, the cluster contains multiple nodes, each of which runs one or more HBase daemon. These include primary and backup Master instances, multiple ZooKeeper nodes, and multiple RegionServer nodes.
This advanced quickstart adds two more nodes to your cluster. The architecture will be as follows:
Node Name | Master | ZooKeeper | RegionServer |
---|---|---|---|
node-a.example.com |
yes |
yes |
no |
node-b.example.com |
backup |
yes |
yes |
node-c.example.com |
no |
yes |
yes |
This quickstart assumes that each node is a virtual machine and that they are all on the same network. It builds upon the previous quickstart, Pseudo-Distributed Local Install, assuming that the system you configured in that procedure is now node-a
. Stop HBase on node-a
before continuing.
Be sure that all the nodes have full access to communicate, and that no firewall rules are in place which could prevent them from talking to each other. If you see any errors like no route to host , check your firewall. |
node-a
needs to be able to log into node-b
and node-c
(and to itself) in order to start the daemons. The easiest way to accomplish this is to use the same username on all hosts, and configure password-less SSH login from node-a
to each of the others.
On node-a
, generate a key pair.
While logged in as the user who will run HBase, generate a SSH key pair, using the following command:
$ ssh-keygen -t rsa
If the command succeeds, the location of the key pair is printed to standard output. The default name of the public key is id_rsa.pub.
Create the directory that will hold the shared keys on the other nodes.
On node-b
and node-c
, log in as the HBase user and create a .ssh/ directory in the user’s home directory, if it does not already exist. If it already exists, be aware that it may already contain other keys.
Copy the public key to the other nodes.
Securely copy the public key from node-a
to each of the nodes, by using the scp
or some other secure means. On each of the other nodes, create a new file called .ssh/authorized_keys if it does not already exist, and append the contents of the id_rsa.pub file to the end of it. Note that you also need to do this for node-a
itself.
$ cat id_rsa.pub >> ~/.ssh/authorized_keys
Test password-less login.
If you performed the procedure correctly, if you SSH from node-a
to either of the other nodes, using the same username, you should not be prompted for a password.
Since node-b
will run a backup Master, repeat the procedure above, substituting node-b
everywhere you see node-a
. Be sure not to overwrite your existing .ssh/authorized_keys files, but concatenate the new key onto the existing file using the >>
operator rather than the >
operator.
node-a
node-a
will run your primary master and ZooKeeper processes, but no RegionServers. . Stop the RegionServer from starting on node-a
.
Edit conf/regionservers and remove the line which contains localhost
. Add lines with the hostnames or IP addresses for node-b
and node-c
.
Even if you did want to run a RegionServer on node-a
, you should refer to it by the hostname the other servers would use to communicate with it. In this case, that would be node-a.example.com
. This enables you to distribute the configuration to each node of your cluster any hostname conflicts. Save the file.
Configure HBase to use node-b
as a backup master.
Create a new file in conf/ called backup-masters, and add a new line to it with the hostname for node-b
. In this demonstration, the hostname is node-b.example.com
.
Configure ZooKeeper
In reality, you should carefully consider your ZooKeeper configuration. You can find out more about configuring ZooKeeper in zookeeper. This configuration will direct HBase to start and manage a ZooKeeper instance on each node of the cluster.
On node-a
, edit conf/hbase-site.xml and add the following properties.
<property> <name>hbase.zookeeper.quorum</name> <value>node-a.example.com,node-b.example.com,node-c.example.com</value> </property> <property> <name>hbase.zookeeper.property.dataDir</name> <value>/usr/local/zookeeper</value> </property>
Everywhere in your configuration that you have referred to node-a
as localhost
, change the reference to point to the hostname that the other nodes will use to refer to node-a
. In these examples, the hostname is node-a.example.com
.
node-b
and
node-c
node-b
will run a backup master server and a ZooKeeper instance.
Download and unpack HBase.
Download and unpack HBase to node-b
, just as you did for the standalone and pseudo-distributed quickstarts.
Copy the configuration files from node-a
to node-b
.and node-c
.
Each node of your cluster needs to have the same configuration information. Copy the contents of the conf/directory to the conf/ directory on node-b
and node-c
.
Be sure HBase is not running on any node.
If you forgot to stop HBase from previous testing, you will have errors. Check to see whether HBase is running on any of your nodes by using the jps
command. Look for the processes HMaster
, HRegionServer
, and HQuorumPeer
. If they exist, kill them.
Start the cluster.
On node-a
, issue the start-hbase.sh
command. Your output will be similar to that below.
$ bin/start-hbase.sh node-c.example.com: starting zookeeper, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-zookeeper-node-c.example.com.out node-a.example.com: starting zookeeper, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-zookeeper-node-a.example.com.out node-b.example.com: starting zookeeper, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-zookeeper-node-b.example.com.out starting master, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-master-node-a.example.com.out node-c.example.com: starting regionserver, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-regionserver-node-c.example.com.out node-b.example.com: starting regionserver, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-regionserver-node-b.example.com.out node-b.example.com: starting master, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-master-nodeb.example.com.out
ZooKeeper starts first, followed by the master, then the RegionServers, and finally the backup masters.
Verify that the processes are running.
On each node of the cluster, run the jps
command and verify that the correct processes are running on each server. You may see additional Java processes running on your servers as well, if they are used for other purposes.
node-a
jps
Output
$ jps 20355 Jps 20071 HQuorumPeer 20137 HMaster
node-b
jps
Output
$ jps 15930 HRegionServer 16194 Jps 15838 HQuorumPeer 16010 HMaster
node-a
jps
Output
$ jps 13901 Jps 13639 HQuorumPeer 13737 HRegionServer
ZooKeeper Process Name
The |
Browse to the Web UI.
Web UI Port Changes
Web UI Port Changes |
In HBase newer than 0.98.x, the HTTP ports used by the HBase Web UI changed from 60010 for the Master and 60030 for each RegionServer to 16010 for the Master and 16030 for the RegionServer.
If everything is set up correctly, you should be able to connect to the UI for the Master http://node-a.example.com:16010/
or the secondary master at http://node-b.example.com:16010/
for the secondary master, using a web browser. If you can connect via localhost
but not from another host, check your firewall rules. You can see the web UI for each of the RegionServers at port 16030 of their IP addresses, or by clicking their links in the web UI for the Master.
Test what happens when nodes or services disappear.
With a three-node cluster like you have configured, things will not be very resilient. Still, you can test what happens when the primary Master or a RegionServer disappears, by killing the processes and watching the logs.