在Ubuntu18.04下配置HBase

HBase在HDFS基礎上提供了高可靠, 列存儲, 可擴展的數據庫系統. HBase僅能經過主鍵(row key)和主鍵的range來檢索數據, 主要用來存儲非結構化和半結構化的鬆散數據. 與Hadoop同樣, HBase依靠橫向擴展, 經過不斷增長廉價的普通服務器來增長計算和存儲能力. 適合使用HBase的數據表特色爲:html

  • 數量巨大: 一個表能夠存儲數億行, 數百萬列
  • 列存儲: 面向列的存儲和權限控制, 列族獨立檢索. 
  • 稀疏字段: 數據中的空(null)字段不佔用存儲空間, 所以適合於存儲很是稀疏的表

Row Key
Row key是用來檢索記錄的主鍵, 訪問table中的行只有三種方式:java

  • 經過單個row key訪問
  • 經過row key的range
  • 全表掃描

Row key能夠是任意字符串, 最大長度是64KB, 實際應用中通常使用10 ~ 100Bytes
在HBase內部, Row key保存爲字節數組, 存儲時, 數據按照Row key的字典序(byte order)排序存儲, 設計key時要充分利用排序存儲這個特性, 將常常一塊兒讀取的行存儲在一塊兒. 注意: 字典排序對int排序的結果是1, 10, 100, 11, 12, 13, 14, 15, 16, 17, 18, 19, 2, 20, 21... 若是使行鍵按整數大小排序, 必須在左邊填充0.
行的一次讀寫是原子操做, 不論一次讀寫多少列. 這個設計決策可以使用戶很容易的理解程序在同一行進行併發更新時的行爲.node

Column Family CF, 列族
HBase表中的每一個列都歸屬於某個CF. CF是表的schema的一部分(而列不是), 必須在使用表以前定義. 列名都以CF做爲前綴, 例如cf:username, cf:code都屬於cf這個列族. 訪問控制, 磁盤和和內存的使用統計都是在列族這個層面進行的. 實際應用中, 列族上的控制權限能幫助咱們管理不一樣類型的應用, 咱們容許一些應用能夠添加新的基本數據, 一些應用能夠讀取基本數據並建立繼承的列族, 一些應用只容許瀏覽數據(甚至可能由於隱私的緣由不能瀏覽全部的數據).mysql

Timestamp 時間戳
HBase中經過Row key和Columns肯定的一個存儲單元稱爲cell. 每一個cell都保存着同一份數據的多個版本, 版本經過時間戳來索引. 時間戳的類型是64位整型數據, 時間戳能夠由HBase自動賦值(在數據寫入時). 此時時間戳是精確到毫秒的當前系統時間. 時間戳也能夠由客戶端顯式的賦值. 若是應用程序要避免數據版本衝突, 就必須本身生成具備惟一性的時間戳. 每一個cell中不一樣版本的數據按照時間倒序排序, 即最新的數據排列在最前面.
爲了不數據存在過多版本形成的管理(包括存儲和索引)的負擔, HBase提供了兩種數據版本回收方式: 一是保存數據的最後n個版本, 二是保存最近一段時間內的版本(好比近十天). 用戶能夠針對每一個列族進行設置. git

Cell
因爲{row key, column{=<family>+<label>}, version} 肯定的惟一單元. cell中的數據是沒有類型的,所有是以字節碼形式存儲.github

系統設置

安裝ntpweb

避免服務器間時間不一樣步sql

設置ulimitshell

參考自 https://www.cloudera.com/documentation/enterprise/5-9-x/topics/cdh_ig_hbase_config.html數據庫

對於運行hdfs和hbase的用戶, 其文件打開數量限制和運行進程數量限制能夠經過 ulimit -n 和 ulimit -u 查看和設置, 若是須要在啓動時自動應用, 能夠寫入該用戶的 .bashrc

另外一種配置方式是經過 PAM (Pluggable Authentication Modules). 
修改 /etc/security/limits.conf, 對每一個須要調整的用戶增長兩行配置, 例如

hdfs  -       nofile  32768
hdfs  -       nproc   2048
hbase -       nofile  32768
hbase -       nproc   2048

爲了讓配置生效, 須要修改 /etc/pam.d/common-session 在裏面增長一行

session required  pam_limits.so

Zookeeper配置

Zookeeper集羣的節點數量和配置
數量上, 只運行一個節點也能夠, 可是在生產環境通常會運行3~7個(奇數)節點, 數量越多對單個節點故障的容忍度就越高. 使用奇數是由於若是使用偶數的話, 選舉須要的法定人數(quorum)更高. 4個節點和5個節點須要的quorum都是3. 配置上, 建議給每一個節點1GB的內存, 若是能夠的話, 每一個節點使用本身的獨立硬盤. 對於負載很高的集羣, 建議將節點運行在獨立的機器上, 與RegionServer(DataNodes and TaskTrackers)分開.

HBase設置

主節點

解壓, 修改 conf/regionservers, 刪除localhost, 添加從節點的hostname, 這些主機會隨着主節點的啓動而啓動, 中止而中止

vm149
vm150

若是須要有backup master, 在conf/ 下面添加配置文件 backup-masters, 添加對應的hostname

修改 conf/hbase-env.sh

export JAVA_HOME==/opt/jdk/latest
export HBASE_MANAGES_ZK=false
export HBASE_LOG_DIR=/home/tomcat/run/hbase/logs

HBASE_MANAGES_ZK=false表示使用外置的zookeeper
HBASE_LOG_DIR 若是不使用安裝目錄下的logs存放日誌, 須要在這裏指定日誌路徑, 不然可能在啓動時沒法寫入

修改 conf/hbase-site.xml

<configuration>
    <property>
      <name>hbase.cluster.distributed</name>
      <value>true</value>
    </property>
    <property>
      <name>hbase.rootdir</name>
      <value>hdfs://vm148:9000/hbase</value>
    </property>
    <property>
      <name>hbase.zookeeper.property.clientPort</name>
      <value>2222</value>
      <description>Property from ZooKeeper's config zoo.cfg. The port at which the clients will connect.</description>
    </property>
    <property>
      <name>hbase.zookeeper.quorum</name>  
      <value>vm151,vm152,vm153</value>
      <description>For a fully-distributed setup, this should be set to a full list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in hbase-env.sh this is the list of servers which we will start/stop ZooKeeper on.</description>
    </property>
</configuration>

默認的端口是2181, 若是不是標準端口, 則須要在配置中體現. }
zookeeper.quorum這個參數是必須的, 這裏會列出zk集羣的全部節點. 由於用的是獨立管理的zk集羣, 因此其餘的zk參數都不須要.

啓動後, 能夠經過./zkCli.sh 在裏面 ls /hbase 來檢查是否正確鏈接

從節點

將配置好的目錄從主節點直接複製到從節點

啓動

啓動順序

start-dfs.sh (主節點)
start-yarn.sh (主節點)
zkServer.sh start (各個zk節點)
start-hbase.sh (主節點)

啓動後, 訪問主節點的 16010 端口 http://vm148:16010/ 就能看到HBase的webui

其餘配置

設置dfs.datanode.max.transfer.threads

dfs.datanode.max.transfer.threads 是HDFS的參數, 用於替換掉做廢的參數dfs.datanode.max.xciever. 這個參數用於控制HDFS datanode在同一時間服務的文件數量上限. 修改配置文件 etc/hadoop/conf/hdfs-site.xml, 增長如下條目

<property>
  <name>dfs.datanode.max.transfer.threads</name>
  <value>4096</value>
</property>

配置HBase的BlockCache

默認配置下, HBase使用的是單獨的on-heap cache, 若是配置了BucketCache, 那麼on-heap cache就只用於Bloom filters和索引, 而off-heap的BucketCache則用於數據cache. 這種形式稱爲Blockcache配置. 這樣可使用更大的內存緩存, 也能夠避免jvm gc帶來的影響.

命令行參考

進入shell環境

./bin/hbase shell

列出全部table: list (若是list後面加'table name', 能夠用於檢查 table 是否存在)

hbase(main):001:0> list
TABLE                                                                                           
users                                                                                     
1 row(s)
Took 0.5786 seconds                                                                             
=> ["users"]

顯示table明細: describe 'table name'

hbase(main):003:0> describe 'users'
Table users is ENABLED                                                                    
users                                                                                     
COLUMN FAMILIES DESCRIPTION                                                                     
{NAME => 'cf', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false
', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE',
 TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_IN
DEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS
_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536'}         
1 row(s)
Took 0.2581 seconds          

啓用, 停用table: disable / enable 'table name'

hbase(main):004:0> disable 'users'
Took 0.5861 seconds                                                                             
hbase(main):005:0> enable 'users'
Took 0.7949 seconds

建立table: create 'table name', 'cf field', (cf能夠多個)

hbase(main):006:0> create 'test','cf'
Created table test
Took 1.3285 seconds                                                                             
=> Hbase::Table - test

hbase(main):008:0> create 'test2','cf1','cf2','cf3'
Created table test2
Took 1.2728 seconds 
=> Hbase::Table - test2

刪除table: drop 'table name' 在drop以前須要先disable.
注意: 在drop後不會當即釋放磁盤空間, 默認狀況下, 空間會在5分鐘後釋放.

hbase(main):011:0> disable 'test2'
Took 0.4568 seconds                                                                             
hbase(main):012:0> drop 'test2'
Took 0.5034 seconds 

列出table記錄: scan 'table name'

hbase(main):013:0> scan 'test'
ROW                       COLUMN+CELL                                                           
0 row(s)
Took 0.1512 seconds 

新增記錄:  put 'table name', 'row id', 'cf field', 'value' 
對於一個row id, 每次只能put一個字段值, 給同一個row id分別put不一樣字段的值, 在scan時其實是顯示爲多行的

hbase(main):026:0> put 'test','row001','cf:a','001'
Took 0.0884 seconds                                                                             
hbase(main):027:0> put 'test','row002','cf:a','002'
Took 0.0076 seconds                                                                             
hbase(main):028:0> put 'test','row003','cf:b','001'
Took 0.0086 seconds                                                                             
hbase(main):029:0> scan 'test'
ROW                       COLUMN+CELL                                                           
 row001                   column=cf:a, timestamp=1548510719243, value=001                       
 row002                   column=cf:a, timestamp=1548510724943, value=002                       
 row003                   column=cf:b, timestamp=1548510733680, value=001                       
3 row(s)
Took 0.0477 seconds    

讀取一個row id的全部字段記錄:  get 'table name', 'row id'

hbase(main):032:0> get 'test', 'row001'
COLUMN                    CELL                                                                  
 cf:a                     timestamp=1548510719243, value=001                                    
 cf:b                     timestamp=1548510892749, value=003                                    
1 row(s)
Took 0.0491 seconds               

刪除一個row id 在指定字段上的記錄: delete 'table name', 'row id', 'cf field'

hbase(main):033:0> delete 'test', 'row001', 'cf:b'
Took 0.0298 seconds                                                                             
hbase(main):034:0> get 'test', 'row001'
COLUMN                    CELL                                                                  
 cf:a                     timestamp=1548510719243, value=001                                    
1 row(s)
Took 0.0323 seconds

若是要刪除一整個row id, 要使用 deleteall:

hbase(main):045:0> deleteall 'test', 'row004'
Took 0.0081 seconds

統計row id數量: count 'table name'

hbase(main):039:0> scan 'test'
ROW                       COLUMN+CELL                                                           
 row001                   column=cf:a, timestamp=1548510719243, value=001                       
 row001                   column=cf:b, timestamp=1548511393583, value=003                       
 row002                   column=cf:a, timestamp=1548510724943, value=002                       
 row002                   column=cf:b, timestamp=1548511400007, value=002                       
 row003                   column=cf:b, timestamp=1548510733680, value=001                       
3 row(s)
Took 0.0409 seconds                                                                             
hbase(main):040:0> count 'test'
3 row(s)
Took 0.0178 seconds                                                                             
=> 3

.清空table: truncate 'table name'
這個命令其實是執行了disable, drop, recreate三個步驟

hbase(main):047:0> truncate 'test'
Truncating 'test' table (it may take a while):
Disabling table...
Truncating table...
Took 2.1415 seconds

將csv文件導入hbase

假定csv文件在當前文件系統目錄下(不是hdfs), csv文件以逗號分隔, 要將其導入目標表格爲test:

$ /opt/hbase/latest/bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator=',' -Dimporttsv.columns=HBASE_ROW_KEY,cf:interest,cf:future_interest,cf:quota_amount,cf:quota_count,cf:quota_extra_interest test output.csv

由於默認使用的是TSV格式, 對於CSV格式須要特別指定分隔符爲','. 
目標字段使用importtsv,columns參數指定, 根據csv文件中的列依次對應hbase table中的cf字段.

導入過程的完整輸出爲

$ /opt/hbase/latest/bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator=',' -Dimporttsv.columns=HBASE_ROW_KEY,cf:interest,cf:future_interest,cf:quota_amount,cf:quota_count,cf:quota_extra_interest test output.csv
2019-01-26 14:35:52,566 WARN  [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2019-01-26 14:35:52,949 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18] zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.10-39d3a4f269333c922ed3db283be479f9deacaa0f, built on 03/23/2017 10:13 GMT
2019-01-26 14:35:52,949 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18] zookeeper.ZooKeeper: Client environment:host.name=vm148
2019-01-26 14:35:52,949 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18] zookeeper.ZooKeeper: Client environment:java.version=1.8.0_192
2019-01-26 14:35:52,949 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18] zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation
2019-01-26 14:35:52,949 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18] zookeeper.ZooKeeper: Client environment:java.home=/opt/jdk/jdk1.8.0_192/jre
2019-01-26 14:35:52,949 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18] zookeeper.ZooKeeper: opt/hbase/latest/bin/../lib/protobuf-java-2.5.0.jar:/opt/hbase/latest/bin/../lib/snappy-java-1.0.5.jar:/opt/hbase/latest/bin/../lib/spymemcached-2.12.2.jar:/opt/hbase/latest/bin/../lib/validation-api-1.1.0.Final.jar:/opt/hbase/latest/bin/../lib/xmlenc-0.52.jar:/opt/hbase/latest/bin/../lib/xz-1.0.jar:/opt/hbase/latest/bin/../lib/zookeeper-3.4.10.jar:/opt/hbase/latest/bin/../lib/client-facing-thirdparty/audience-annotations-0.5.0.jar:/opt/hbase/latest/bin/../lib/client-facing-thirdparty/commons-logging-1.2.jar:/opt/hbase/latest/bin/../lib/client-facing-thirdparty/findbugs-annotations-1.3.9-1.jar:/opt/hbase/latest/bin/../lib/client-facing-thirdparty/htrace-core4-4.2.0-incubating.jar:/opt/hbase/latest/bin/../lib/client-facing-thirdparty/log4j-1.2.17.jar:/opt/hbase/latest/bin/../lib/client-facing-thirdparty/slf4j-api-1.7.25.jar:/opt/hbase/latest/bin/../lib/client-facing-thirdparty/htrace-core-3.1.0-incubating.jar:/opt/hbase/latest/bin/../lib/client-facing-thirdparty/slf4j-log4j12-1.7.25.jar
2019-01-26 14:35:52,949 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18] zookeeper.ZooKeeper: Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
2019-01-26 14:35:52,949 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18] zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
2019-01-26 14:35:52,949 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18] zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
2019-01-26 14:35:52,949 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18] zookeeper.ZooKeeper: Client environment:os.name=Linux
2019-01-26 14:35:52,949 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18] zookeeper.ZooKeeper: Client environment:os.arch=amd64
2019-01-26 14:35:52,949 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18] zookeeper.ZooKeeper: Client environment:os.version=4.15.0-43-generic
2019-01-26 14:35:52,949 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18] zookeeper.ZooKeeper: Client environment:user.name=tomcat
2019-01-26 14:35:52,949 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18] zookeeper.ZooKeeper: Client environment:user.home=/home/tomcat
2019-01-26 14:35:52,950 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18] zookeeper.ZooKeeper: Client environment:user.dir=/home/tomcat
2019-01-26 14:35:52,951 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18] zookeeper.ZooKeeper: Initiating client connection, connectString=vm148:2181,vm149:2181,vm150:2181 sessionTimeout=90000 watcher=org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$$Lambda$12/835683477@2b267d81
2019-01-26 14:35:52,969 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18-SendThread(vm150:2181)] zookeeper.ClientCnxn: Opening socket connection to server vm150/192.168.31.150:2181. Will not attempt to authenticate using SASL (unknown error)
2019-01-26 14:35:52,974 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18-SendThread(vm150:2181)] zookeeper.ClientCnxn: Socket connection established to vm150/192.168.31.150:2181, initiating session
2019-01-26 14:35:52,986 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18-SendThread(vm150:2181)] zookeeper.ClientCnxn: Session establishment complete on server vm150/192.168.31.150:2181, sessionid = 0x3002261518a0002, negotiated timeout = 40000
2019-01-26 14:35:54,071 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18] zookeeper.ZooKeeper: Session: 0x3002261518a0002 closed
2019-01-26 14:35:54,074 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18-EventThread] zookeeper.ClientCnxn: EventThread shut down for session: 0x3002261518a0002
2019-01-26 14:35:54,095 INFO  [main] Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
2019-01-26 14:35:54,096 INFO  [main] jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
2019-01-26 14:35:54,126 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x42f8285e] zookeeper.ZooKeeper: Initiating client connection, connectString=vm148:2181,vm149:2181,vm150:2181 sessionTimeout=90000 watcher=org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$$Lambda$12/835683477@2b267d81
2019-01-26 14:35:54,130 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x42f8285e-SendThread(vm150:2181)] zookeeper.ClientCnxn: Opening socket connection to server vm150/192.168.31.150:2181. Will not attempt to authenticate using SASL (unknown error)
2019-01-26 14:35:54,134 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x42f8285e-SendThread(vm150:2181)] zookeeper.ClientCnxn: Socket connection established to vm150/192.168.31.150:2181, initiating session
2019-01-26 14:35:54,138 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x42f8285e-SendThread(vm150:2181)] zookeeper.ClientCnxn: Session establishment complete on server vm150/192.168.31.150:2181, sessionid = 0x3002261518a0003, negotiated timeout = 40000
2019-01-26 14:35:54,416 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x42f8285e] zookeeper.ZooKeeper: Session: 0x3002261518a0003 closed
2019-01-26 14:35:54,416 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x42f8285e-EventThread] zookeeper.ClientCnxn: EventThread shut down for session: 0x3002261518a0003
2019-01-26 14:35:54,579 INFO  [main] input.FileInputFormat: Total input paths to process : 1
2019-01-26 14:35:54,615 INFO  [main] mapreduce.JobSubmitter: number of splits:1
2019-01-26 14:35:54,752 INFO  [main] mapreduce.JobSubmitter: Submitting tokens for job: job_local98574210_0001
2019-01-26 14:35:55,026 INFO  [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354849/hbase-hadoop2-compat-2.1.2.jar <- /home/tomcat/hbase-hadoop2-compat-2.1.2.jar
2019-01-26 14:35:55,084 INFO  [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/hbase-hadoop2-compat-2.1.2.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354849/hbase-hadoop2-compat-2.1.2.jar
2019-01-26 14:35:55,686 INFO  [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354850/jackson-core-2.9.2.jar <- /home/tomcat/jackson-core-2.9.2.jar
2019-01-26 14:35:55,693 INFO  [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/jackson-core-2.9.2.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354850/jackson-core-2.9.2.jar
2019-01-26 14:35:55,713 INFO  [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354851/hbase-metrics-2.1.2.jar <- /home/tomcat/hbase-metrics-2.1.2.jar
2019-01-26 14:35:55,722 INFO  [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/hbase-metrics-2.1.2.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354851/hbase-metrics-2.1.2.jar
2019-01-26 14:35:55,744 INFO  [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354852/hadoop-common-2.7.7.jar <- /home/tomcat/hadoop-common-2.7.7.jar
2019-01-26 14:35:55,746 INFO  [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/hadoop-common-2.7.7.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354852/hadoop-common-2.7.7.jar
2019-01-26 14:35:55,746 INFO  [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354853/zookeeper-3.4.10.jar <- /home/tomcat/zookeeper-3.4.10.jar
2019-01-26 14:35:55,754 INFO  [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/zookeeper-3.4.10.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354853/zookeeper-3.4.10.jar
2019-01-26 14:35:55,755 INFO  [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354854/hbase-protocol-shaded-2.1.2.jar <- /home/tomcat/hbase-protocol-shaded-2.1.2.jar
2019-01-26 14:35:55,758 INFO  [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/hbase-protocol-shaded-2.1.2.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354854/hbase-protocol-shaded-2.1.2.jar
2019-01-26 14:35:55,758 INFO  [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354855/hbase-client-2.1.2.jar <- /home/tomcat/hbase-client-2.1.2.jar
2019-01-26 14:35:55,760 INFO  [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/hbase-client-2.1.2.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354855/hbase-client-2.1.2.jar
2019-01-26 14:35:55,760 INFO  [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354856/hadoop-mapreduce-client-core-2.7.7.jar <- /home/tomcat/hadoop-mapreduce-client-core-2.7.7.jar
2019-01-26 14:35:55,762 INFO  [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/hadoop-mapreduce-client-core-2.7.7.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354856/hadoop-mapreduce-client-core-2.7.7.jar
2019-01-26 14:35:55,762 INFO  [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354857/hbase-shaded-netty-2.1.0.jar <- /home/tomcat/hbase-shaded-netty-2.1.0.jar
2019-01-26 14:35:55,763 INFO  [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/hbase-shaded-netty-2.1.0.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354857/hbase-shaded-netty-2.1.0.jar
2019-01-26 14:35:55,763 INFO  [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354858/commons-lang3-3.6.jar <- /home/tomcat/commons-lang3-3.6.jar
2019-01-26 14:35:55,766 INFO  [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/commons-lang3-3.6.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354858/commons-lang3-3.6.jar
2019-01-26 14:35:55,766 INFO  [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354859/hbase-mapreduce-2.1.2.jar <- /home/tomcat/hbase-mapreduce-2.1.2.jar
2019-01-26 14:35:55,768 INFO  [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/hbase-mapreduce-2.1.2.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354859/hbase-mapreduce-2.1.2.jar
2019-01-26 14:35:55,768 INFO  [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354860/metrics-core-3.2.1.jar <- /home/tomcat/metrics-core-3.2.1.jar
2019-01-26 14:35:55,770 INFO  [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/metrics-core-3.2.1.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354860/metrics-core-3.2.1.jar
2019-01-26 14:35:55,770 INFO  [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354861/hbase-common-2.1.2.jar <- /home/tomcat/hbase-common-2.1.2.jar
2019-01-26 14:35:55,771 INFO  [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/hbase-common-2.1.2.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354861/hbase-common-2.1.2.jar
2019-01-26 14:35:55,771 INFO  [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354862/htrace-core4-4.2.0-incubating.jar <- /home/tomcat/htrace-core4-4.2.0-incubating.jar
2019-01-26 14:35:55,775 INFO  [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/client-facing-thirdparty/htrace-core4-4.2.0-incubating.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354862/htrace-core4-4.2.0-incubating.jar
2019-01-26 14:35:55,775 INFO  [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354863/hbase-hadoop-compat-2.1.2.jar <- /home/tomcat/hbase-hadoop-compat-2.1.2.jar
2019-01-26 14:35:55,777 INFO  [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/hbase-hadoop-compat-2.1.2.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354863/hbase-hadoop-compat-2.1.2.jar
2019-01-26 14:35:55,777 INFO  [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354864/hbase-zookeeper-2.1.2.jar <- /home/tomcat/hbase-zookeeper-2.1.2.jar
2019-01-26 14:35:55,778 INFO  [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/hbase-zookeeper-2.1.2.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354864/hbase-zookeeper-2.1.2.jar
2019-01-26 14:35:55,779 INFO  [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354865/hbase-shaded-miscellaneous-2.1.0.jar <- /home/tomcat/hbase-shaded-miscellaneous-2.1.0.jar
2019-01-26 14:35:55,780 INFO  [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/hbase-shaded-miscellaneous-2.1.0.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354865/hbase-shaded-miscellaneous-2.1.0.jar
2019-01-26 14:35:55,781 INFO  [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354866/protobuf-java-2.5.0.jar <- /home/tomcat/protobuf-java-2.5.0.jar
2019-01-26 14:35:55,782 INFO  [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/protobuf-java-2.5.0.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354866/protobuf-java-2.5.0.jar
2019-01-26 14:35:55,782 INFO  [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354867/jackson-annotations-2.9.2.jar <- /home/tomcat/jackson-annotations-2.9.2.jar
2019-01-26 14:35:55,784 INFO  [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/jackson-annotations-2.9.2.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354867/jackson-annotations-2.9.2.jar
2019-01-26 14:35:55,784 INFO  [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354868/hbase-server-2.1.2.jar <- /home/tomcat/hbase-server-2.1.2.jar
2019-01-26 14:35:55,786 INFO  [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/hbase-server-2.1.2.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354868/hbase-server-2.1.2.jar
2019-01-26 14:35:55,786 INFO  [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354869/hbase-metrics-api-2.1.2.jar <- /home/tomcat/hbase-metrics-api-2.1.2.jar
2019-01-26 14:35:55,787 INFO  [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/hbase-metrics-api-2.1.2.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354869/hbase-metrics-api-2.1.2.jar
2019-01-26 14:35:55,788 INFO  [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354870/jackson-databind-2.9.2.jar <- /home/tomcat/jackson-databind-2.9.2.jar
2019-01-26 14:35:55,789 INFO  [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/jackson-databind-2.9.2.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354870/jackson-databind-2.9.2.jar
2019-01-26 14:35:55,790 INFO  [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354871/hbase-protocol-2.1.2.jar <- /home/tomcat/hbase-protocol-2.1.2.jar
2019-01-26 14:35:55,791 INFO  [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/hbase-protocol-2.1.2.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354871/hbase-protocol-2.1.2.jar
2019-01-26 14:35:55,791 INFO  [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354872/hbase-shaded-protobuf-2.1.0.jar <- /home/tomcat/hbase-shaded-protobuf-2.1.0.jar
2019-01-26 14:35:55,799 INFO  [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/hbase-shaded-protobuf-2.1.0.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354872/hbase-shaded-protobuf-2.1.0.jar
2019-01-26 14:35:55,852 INFO  [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354849/hbase-hadoop2-compat-2.1.2.jar
2019-01-26 14:35:55,853 INFO  [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354850/jackson-core-2.9.2.jar
2019-01-26 14:35:55,853 INFO  [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354851/hbase-metrics-2.1.2.jar
2019-01-26 14:35:55,853 INFO  [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354852/hadoop-common-2.7.7.jar
2019-01-26 14:35:55,853 INFO  [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354853/zookeeper-3.4.10.jar
2019-01-26 14:35:55,853 INFO  [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354854/hbase-protocol-shaded-2.1.2.jar
2019-01-26 14:35:55,853 INFO  [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354855/hbase-client-2.1.2.jar
2019-01-26 14:35:55,853 INFO  [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354856/hadoop-mapreduce-client-core-2.7.7.jar
2019-01-26 14:35:55,853 INFO  [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354857/hbase-shaded-netty-2.1.0.jar
2019-01-26 14:35:55,853 INFO  [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354858/commons-lang3-3.6.jar
2019-01-26 14:35:55,853 INFO  [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354859/hbase-mapreduce-2.1.2.jar
2019-01-26 14:35:55,853 INFO  [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354860/metrics-core-3.2.1.jar
2019-01-26 14:35:55,853 INFO  [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354861/hbase-common-2.1.2.jar
2019-01-26 14:35:55,853 INFO  [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354862/htrace-core4-4.2.0-incubating.jar
2019-01-26 14:35:55,853 INFO  [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354863/hbase-hadoop-compat-2.1.2.jar
2019-01-26 14:35:55,854 INFO  [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354864/hbase-zookeeper-2.1.2.jar
2019-01-26 14:35:55,854 INFO  [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354865/hbase-shaded-miscellaneous-2.1.0.jar
2019-01-26 14:35:55,854 INFO  [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354866/protobuf-java-2.5.0.jar
2019-01-26 14:35:55,854 INFO  [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354867/jackson-annotations-2.9.2.jar
2019-01-26 14:35:55,854 INFO  [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354868/hbase-server-2.1.2.jar
2019-01-26 14:35:55,854 INFO  [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354869/hbase-metrics-api-2.1.2.jar
2019-01-26 14:35:55,854 INFO  [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354870/jackson-databind-2.9.2.jar
2019-01-26 14:35:55,854 INFO  [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354871/hbase-protocol-2.1.2.jar
2019-01-26 14:35:55,854 INFO  [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354872/hbase-shaded-protobuf-2.1.0.jar
2019-01-26 14:35:55,858 INFO  [main] mapreduce.Job: The url to track the job: http://localhost:8080/
2019-01-26 14:35:55,858 INFO  [main] mapreduce.Job: Running job: job_local98574210_0001
2019-01-26 14:35:55,861 INFO  [Thread-55] mapred.LocalJobRunner: OutputCommitter set in config null
2019-01-26 14:35:55,892 INFO  [Thread-55] mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.hbase.mapreduce.TableOutputCommitter
2019-01-26 14:35:55,936 INFO  [Thread-55] mapred.LocalJobRunner: Waiting for map tasks
2019-01-26 14:35:55,938 INFO  [LocalJobRunner Map Task Executor #0] mapred.LocalJobRunner: Starting task: attempt_local98574210_0001_m_000000_0
2019-01-26 14:35:55,995 INFO  [LocalJobRunner Map Task Executor #0] mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
2019-01-26 14:35:56,000 INFO  [LocalJobRunner Map Task Executor #0] mapred.MapTask: Processing split: file:/home/tomcat/output.csv:0+1703
2019-01-26 14:35:56,008 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x53b1bab3] zookeeper.ZooKeeper: Initiating client connection, connectString=vm148:2181,vm149:2181,vm150:2181 sessionTimeout=90000 watcher=org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$$Lambda$12/835683477@2b267d81
2019-01-26 14:35:56,009 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x53b1bab3-SendThread(vm149:2181)] zookeeper.ClientCnxn: Opening socket connection to server vm149/192.168.31.149:2181. Will not attempt to authenticate using SASL (unknown error)
2019-01-26 14:35:56,009 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x53b1bab3-SendThread(vm149:2181)] zookeeper.ClientCnxn: Socket connection established to vm149/192.168.31.149:2181, initiating session
2019-01-26 14:35:56,016 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x53b1bab3-SendThread(vm149:2181)] zookeeper.ClientCnxn: Session establishment complete on server vm149/192.168.31.149:2181, sessionid = 0x200226284420008, negotiated timeout = 40000
2019-01-26 14:35:56,021 INFO  [LocalJobRunner Map Task Executor #0] mapreduce.TableOutputFormat: Created table instance for test
2019-01-26 14:35:56,047 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x3b9f7f95] zookeeper.ZooKeeper: Initiating client connection, connectString=vm148:2181,vm149:2181,vm150:2181 sessionTimeout=90000 watcher=org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$$Lambda$12/835683477@2b267d81
2019-01-26 14:35:56,048 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x3b9f7f95-SendThread(vm149:2181)] zookeeper.ClientCnxn: Opening socket connection to server vm149/192.168.31.149:2181. Will not attempt to authenticate using SASL (unknown error)
2019-01-26 14:35:56,049 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x3b9f7f95-SendThread(vm149:2181)] zookeeper.ClientCnxn: Socket connection established to vm149/192.168.31.149:2181, initiating session
2019-01-26 14:35:56,052 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x3b9f7f95-SendThread(vm149:2181)] zookeeper.ClientCnxn: Session establishment complete on server vm149/192.168.31.149:2181, sessionid = 0x200226284420009, negotiated timeout = 40000
2019-01-26 14:35:56,116 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x3b9f7f95] zookeeper.ZooKeeper: Session: 0x200226284420009 closed
2019-01-26 14:35:56,116 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x3b9f7f95-EventThread] zookeeper.ClientCnxn: EventThread shut down for session: 0x200226284420009
2019-01-26 14:35:56,138 INFO  [LocalJobRunner Map Task Executor #0] mapred.LocalJobRunner: 
2019-01-26 14:35:56,280 INFO  [LocalJobRunner Map Task Executor #0] mapred.Task: Task:attempt_local98574210_0001_m_000000_0 is done. And is in the process of committing
2019-01-26 14:35:56,289 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x53b1bab3] zookeeper.ZooKeeper: Session: 0x200226284420008 closed
2019-01-26 14:35:56,289 INFO  [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x53b1bab3-EventThread] zookeeper.ClientCnxn: EventThread shut down for session: 0x200226284420008
2019-01-26 14:35:56,296 INFO  [LocalJobRunner Map Task Executor #0] mapred.LocalJobRunner: map
2019-01-26 14:35:56,296 INFO  [LocalJobRunner Map Task Executor #0] mapred.Task: Task 'attempt_local98574210_0001_m_000000_0' done.
2019-01-26 14:35:56,303 INFO  [LocalJobRunner Map Task Executor #0] mapred.Task: Final Counters for attempt_local98574210_0001_m_000000_0: Counters: 16
	File System Counters
		FILE: Number of bytes read=37574934
		FILE: Number of bytes written=38237355
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
	Map-Reduce Framework
		Map input records=30
		Map output records=30
		Input split bytes=93
		Spilled Records=0
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=8
		Total committed heap usage (bytes)=62849024
	ImportTsv
		Bad Lines=0
	File Input Format Counters 
		Bytes Read=1703
	File Output Format Counters 
		Bytes Written=0
2019-01-26 14:35:56,304 INFO  [LocalJobRunner Map Task Executor #0] mapred.LocalJobRunner: Finishing task: attempt_local98574210_0001_m_000000_0
2019-01-26 14:35:56,304 INFO  [Thread-55] mapred.LocalJobRunner: map task executor complete.
2019-01-26 14:35:56,860 INFO  [main] mapreduce.Job: Job job_local98574210_0001 running in uber mode : false
2019-01-26 14:35:56,862 INFO  [main] mapreduce.Job:  map 100% reduce 0%
2019-01-26 14:35:56,866 INFO  [main] mapreduce.Job: Job job_local98574210_0001 completed successfully
2019-01-26 14:35:56,899 INFO  [main] mapreduce.Job: Counters: 16
	File System Counters
		FILE: Number of bytes read=37574934
		FILE: Number of bytes written=38237355
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
	Map-Reduce Framework
		Map input records=30
		Map output records=30
		Input split bytes=93
		Spilled Records=0
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=8
		Total committed heap usage (bytes)=62849024
	ImportTsv
		Bad Lines=0
	File Input Format Counters 
		Bytes Read=1703
	File Output Format Counters 
		Bytes Written=0

.將tsv導入hbase, 這邊使用的文件直接導入, 2.6GB, 5kw條記錄花了整整29分鐘, 不知道是否是放到hdfs裏再導入會快一些?

/opt/hbase/latest/bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns=HBASE_ROW_KEY,cf:key1,cf:key2,cf:key3,cf:key4,cf:key5 worktable posts.txt

Update 2019-01-28: 關於TSV文件裏的雙引號, 字段值當中的tab字符:

若是mysql -e導出時使用了OPTIONALLY ENCLOSED BY '\"', 那麼導出的tsv文件中, 凡是字符串類型的字段, 都會加上雙引號, 在經過上面的語句導入到HBase中以後, 會發現雙引號出如今了字段的value當中. 因此mysql -e時, 不建議使用 OPTIONALLY ENCLOSED BY '\"' 參數

若是mysql的記錄中, 字段的值包含了tab, 那麼在導出時, 會被自動轉義, 以下

40	2	,\	[bot]	1528869876
41	2	[bot],	1528869876
42	2	t\	[bot]"	1528869876
43	2	t\	[bot]'	1528869876
44	2	't\	[bot]'	1528869876
45	2	"t\	[bot]"	1528869876
46	2	t\	[bot]	1528869876
47	2	tab\	\	[bot]	1528869876

這個和是否使用OPTIONALLY ENCLOSED BY '\"' 有關, 上面是不加此參數的, 下面是加了此參數輸出的內容

40	2	",	[bot]"	1528869876
41	2	"[bot],"	1528869876
42	2	"t	[bot]\""	1528869876
43	2	"t	[bot]'"	1528869876
44	2	"'t	[bot]'"	1528869876
45	2	"\"t	[bot]\""	1528869876
46	2	"t	[bot]"	1528869876
47	2	"tab		[bot]"	1528869876

能夠看到, 加了此參數後, 就再也不轉義tab, 而是轉義雙引號.

而對於importTSV, 對於以上兩種TSV文件, 這幾行帶tab的數據都是不能正常導入的, 會被處理爲Bad Line. 由於importTSV處理分隔符時是簡單地對單字符逐個處理, 並不會識別轉義的tab. 具體的代碼能夠查看其源代碼 https://github.com/apache/hbase/blob/master/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/ImportTsv.java aaa其中的 public ParsedLine parse(byte[] lineBytes, int length) 方法.

因此若是字段值帶tab的話, 要麼換一個不衝突的分隔符, 要麼在生成TSV時替換成別的內容(例如空格)

==

在hbase shell裏, count 'worktable' 速度很是慢, 花了一個小時才count完

Current count: 49458000, row: 9999791                                                           
49458230 row(s)
Took 3684.2802 seconds                                                                          
=> 49458230

get的速度很快

hbase(main):056:0> get 'smth','1995'
COLUMN                    CELL                                                                  
 cf:post_time             timestamp=1548515983185, value=876546980                              
 cf:user_id               timestamp=1548515983185, value=554                                    
 cf:username              timestamp=1548515983185, value="aaa"                                 
1 row(s)
Took 0.0882 seconds                                                                             
hbase(main):057:0> get 'smth','49471229'
COLUMN                    CELL                                                                  
 cf:post_time             timestamp=1548515983185, value=1546941261                             
 cf:user_id               timestamp=1548515983185, value=161838                                 
 cf:username              timestamp=1548515983185, value="bbb"                          
1 row(s)
Took 0.0873 seconds

.

相關文章
相關標籤/搜索