HBase在HDFS基礎上提供了高可靠, 列存儲, 可擴展的數據庫系統. HBase僅能經過主鍵(row key)和主鍵的range來檢索數據, 主要用來存儲非結構化和半結構化的鬆散數據. 與Hadoop同樣, HBase依靠橫向擴展, 經過不斷增長廉價的普通服務器來增長計算和存儲能力. 適合使用HBase的數據表特色爲:html
Row Key
Row key是用來檢索記錄的主鍵, 訪問table中的行只有三種方式:java
Row key能夠是任意字符串, 最大長度是64KB, 實際應用中通常使用10 ~ 100Bytes
在HBase內部, Row key保存爲字節數組, 存儲時, 數據按照Row key的字典序(byte order)排序存儲, 設計key時要充分利用排序存儲這個特性, 將常常一塊兒讀取的行存儲在一塊兒. 注意: 字典排序對int排序的結果是1, 10, 100, 11, 12, 13, 14, 15, 16, 17, 18, 19, 2, 20, 21... 若是使行鍵按整數大小排序, 必須在左邊填充0.
行的一次讀寫是原子操做, 不論一次讀寫多少列. 這個設計決策可以使用戶很容易的理解程序在同一行進行併發更新時的行爲.node
Column Family CF, 列族
HBase表中的每一個列都歸屬於某個CF. CF是表的schema的一部分(而列不是), 必須在使用表以前定義. 列名都以CF做爲前綴, 例如cf:username, cf:code都屬於cf這個列族. 訪問控制, 磁盤和和內存的使用統計都是在列族這個層面進行的. 實際應用中, 列族上的控制權限能幫助咱們管理不一樣類型的應用, 咱們容許一些應用能夠添加新的基本數據, 一些應用能夠讀取基本數據並建立繼承的列族, 一些應用只容許瀏覽數據(甚至可能由於隱私的緣由不能瀏覽全部的數據).mysql
Timestamp 時間戳
HBase中經過Row key和Columns肯定的一個存儲單元稱爲cell. 每一個cell都保存着同一份數據的多個版本, 版本經過時間戳來索引. 時間戳的類型是64位整型數據, 時間戳能夠由HBase自動賦值(在數據寫入時). 此時時間戳是精確到毫秒的當前系統時間. 時間戳也能夠由客戶端顯式的賦值. 若是應用程序要避免數據版本衝突, 就必須本身生成具備惟一性的時間戳. 每一個cell中不一樣版本的數據按照時間倒序排序, 即最新的數據排列在最前面.
爲了不數據存在過多版本形成的管理(包括存儲和索引)的負擔, HBase提供了兩種數據版本回收方式: 一是保存數據的最後n個版本, 二是保存最近一段時間內的版本(好比近十天). 用戶能夠針對每一個列族進行設置. git
Cell
因爲{row key, column{=<family>+<label>}, version} 肯定的惟一單元. cell中的數據是沒有類型的,所有是以字節碼形式存儲.github
安裝ntpweb
避免服務器間時間不一樣步sql
設置ulimitshell
參考自 https://www.cloudera.com/documentation/enterprise/5-9-x/topics/cdh_ig_hbase_config.html數據庫
對於運行hdfs和hbase的用戶, 其文件打開數量限制和運行進程數量限制能夠經過 ulimit -n 和 ulimit -u 查看和設置, 若是須要在啓動時自動應用, 能夠寫入該用戶的 .bashrc
另外一種配置方式是經過 PAM (Pluggable Authentication Modules).
修改 /etc/security/limits.conf, 對每一個須要調整的用戶增長兩行配置, 例如
hdfs - nofile 32768 hdfs - nproc 2048 hbase - nofile 32768 hbase - nproc 2048
爲了讓配置生效, 須要修改 /etc/pam.d/common-session 在裏面增長一行
session required pam_limits.so
Zookeeper集羣的節點數量和配置
數量上, 只運行一個節點也能夠, 可是在生產環境通常會運行3~7個(奇數)節點, 數量越多對單個節點故障的容忍度就越高. 使用奇數是由於若是使用偶數的話, 選舉須要的法定人數(quorum)更高. 4個節點和5個節點須要的quorum都是3. 配置上, 建議給每一個節點1GB的內存, 若是能夠的話, 每一個節點使用本身的獨立硬盤. 對於負載很高的集羣, 建議將節點運行在獨立的機器上, 與RegionServer(DataNodes and TaskTrackers)分開.
主節點
解壓, 修改 conf/regionservers, 刪除localhost, 添加從節點的hostname, 這些主機會隨着主節點的啓動而啓動, 中止而中止
vm149
vm150
若是須要有backup master, 在conf/ 下面添加配置文件 backup-masters, 添加對應的hostname
修改 conf/hbase-env.sh
export JAVA_HOME==/opt/jdk/latest export HBASE_MANAGES_ZK=false export HBASE_LOG_DIR=/home/tomcat/run/hbase/logs
HBASE_MANAGES_ZK=false表示使用外置的zookeeper
HBASE_LOG_DIR 若是不使用安裝目錄下的logs存放日誌, 須要在這裏指定日誌路徑, 不然可能在啓動時沒法寫入
修改 conf/hbase-site.xml
<configuration> <property> <name>hbase.cluster.distributed</name> <value>true</value> </property> <property> <name>hbase.rootdir</name> <value>hdfs://vm148:9000/hbase</value> </property> <property> <name>hbase.zookeeper.property.clientPort</name> <value>2222</value> <description>Property from ZooKeeper's config zoo.cfg. The port at which the clients will connect.</description> </property> <property> <name>hbase.zookeeper.quorum</name> <value>vm151,vm152,vm153</value> <description>For a fully-distributed setup, this should be set to a full list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in hbase-env.sh this is the list of servers which we will start/stop ZooKeeper on.</description> </property> </configuration>
默認的端口是2181, 若是不是標準端口, 則須要在配置中體現. }
zookeeper.quorum這個參數是必須的, 這裏會列出zk集羣的全部節點. 由於用的是獨立管理的zk集羣, 因此其餘的zk參數都不須要.
啓動後, 能夠經過./zkCli.sh 在裏面 ls /hbase 來檢查是否正確鏈接
從節點
將配置好的目錄從主節點直接複製到從節點
啓動順序
start-dfs.sh (主節點)
start-yarn.sh (主節點)
zkServer.sh start (各個zk節點)
start-hbase.sh (主節點)
啓動後, 訪問主節點的 16010 端口 http://vm148:16010/ 就能看到HBase的webui
設置dfs.datanode.max.transfer.threads
dfs.datanode.max.transfer.threads 是HDFS的參數, 用於替換掉做廢的參數dfs.datanode.max.xciever. 這個參數用於控制HDFS datanode在同一時間服務的文件數量上限. 修改配置文件 etc/hadoop/conf/hdfs-site.xml, 增長如下條目
<property> <name>dfs.datanode.max.transfer.threads</name> <value>4096</value> </property>
配置HBase的BlockCache
默認配置下, HBase使用的是單獨的on-heap cache, 若是配置了BucketCache, 那麼on-heap cache就只用於Bloom filters和索引, 而off-heap的BucketCache則用於數據cache. 這種形式稱爲Blockcache配置. 這樣可使用更大的內存緩存, 也能夠避免jvm gc帶來的影響.
進入shell環境
./bin/hbase shell
列出全部table: list (若是list後面加'table name', 能夠用於檢查 table 是否存在)
hbase(main):001:0> list TABLE users 1 row(s) Took 0.5786 seconds => ["users"]
顯示table明細: describe 'table name'
hbase(main):003:0> describe 'users' Table users is ENABLED users COLUMN FAMILIES DESCRIPTION {NAME => 'cf', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false ', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_IN DEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS _ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536'} 1 row(s) Took 0.2581 seconds
啓用, 停用table: disable / enable 'table name'
hbase(main):004:0> disable 'users' Took 0.5861 seconds hbase(main):005:0> enable 'users' Took 0.7949 seconds
建立table: create 'table name', 'cf field', (cf能夠多個)
hbase(main):006:0> create 'test','cf' Created table test Took 1.3285 seconds => Hbase::Table - test hbase(main):008:0> create 'test2','cf1','cf2','cf3' Created table test2 Took 1.2728 seconds => Hbase::Table - test2
刪除table: drop 'table name' 在drop以前須要先disable.
注意: 在drop後不會當即釋放磁盤空間, 默認狀況下, 空間會在5分鐘後釋放.
hbase(main):011:0> disable 'test2' Took 0.4568 seconds hbase(main):012:0> drop 'test2' Took 0.5034 seconds
列出table記錄: scan 'table name'
hbase(main):013:0> scan 'test' ROW COLUMN+CELL 0 row(s) Took 0.1512 seconds
新增記錄: put 'table name', 'row id', 'cf field', 'value'
對於一個row id, 每次只能put一個字段值, 給同一個row id分別put不一樣字段的值, 在scan時其實是顯示爲多行的
hbase(main):026:0> put 'test','row001','cf:a','001' Took 0.0884 seconds hbase(main):027:0> put 'test','row002','cf:a','002' Took 0.0076 seconds hbase(main):028:0> put 'test','row003','cf:b','001' Took 0.0086 seconds hbase(main):029:0> scan 'test' ROW COLUMN+CELL row001 column=cf:a, timestamp=1548510719243, value=001 row002 column=cf:a, timestamp=1548510724943, value=002 row003 column=cf:b, timestamp=1548510733680, value=001 3 row(s) Took 0.0477 seconds
讀取一個row id的全部字段記錄: get 'table name', 'row id'
hbase(main):032:0> get 'test', 'row001' COLUMN CELL cf:a timestamp=1548510719243, value=001 cf:b timestamp=1548510892749, value=003 1 row(s) Took 0.0491 seconds
刪除一個row id 在指定字段上的記錄: delete 'table name', 'row id', 'cf field'
hbase(main):033:0> delete 'test', 'row001', 'cf:b' Took 0.0298 seconds hbase(main):034:0> get 'test', 'row001' COLUMN CELL cf:a timestamp=1548510719243, value=001 1 row(s) Took 0.0323 seconds
若是要刪除一整個row id, 要使用 deleteall:
hbase(main):045:0> deleteall 'test', 'row004' Took 0.0081 seconds
統計row id數量: count 'table name'
hbase(main):039:0> scan 'test' ROW COLUMN+CELL row001 column=cf:a, timestamp=1548510719243, value=001 row001 column=cf:b, timestamp=1548511393583, value=003 row002 column=cf:a, timestamp=1548510724943, value=002 row002 column=cf:b, timestamp=1548511400007, value=002 row003 column=cf:b, timestamp=1548510733680, value=001 3 row(s) Took 0.0409 seconds hbase(main):040:0> count 'test' 3 row(s) Took 0.0178 seconds => 3
.清空table: truncate 'table name'
這個命令其實是執行了disable, drop, recreate三個步驟
hbase(main):047:0> truncate 'test' Truncating 'test' table (it may take a while): Disabling table... Truncating table... Took 2.1415 seconds
假定csv文件在當前文件系統目錄下(不是hdfs), csv文件以逗號分隔, 要將其導入目標表格爲test:
$ /opt/hbase/latest/bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator=',' -Dimporttsv.columns=HBASE_ROW_KEY,cf:interest,cf:future_interest,cf:quota_amount,cf:quota_count,cf:quota_extra_interest test output.csv
由於默認使用的是TSV格式, 對於CSV格式須要特別指定分隔符爲','.
目標字段使用importtsv,columns參數指定, 根據csv文件中的列依次對應hbase table中的cf字段.
導入過程的完整輸出爲
$ /opt/hbase/latest/bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator=',' -Dimporttsv.columns=HBASE_ROW_KEY,cf:interest,cf:future_interest,cf:quota_amount,cf:quota_count,cf:quota_extra_interest test output.csv 2019-01-26 14:35:52,566 WARN [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2019-01-26 14:35:52,949 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18] zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.10-39d3a4f269333c922ed3db283be479f9deacaa0f, built on 03/23/2017 10:13 GMT 2019-01-26 14:35:52,949 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18] zookeeper.ZooKeeper: Client environment:host.name=vm148 2019-01-26 14:35:52,949 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18] zookeeper.ZooKeeper: Client environment:java.version=1.8.0_192 2019-01-26 14:35:52,949 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18] zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation 2019-01-26 14:35:52,949 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18] zookeeper.ZooKeeper: Client environment:java.home=/opt/jdk/jdk1.8.0_192/jre 2019-01-26 14:35:52,949 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18] zookeeper.ZooKeeper: opt/hbase/latest/bin/../lib/protobuf-java-2.5.0.jar:/opt/hbase/latest/bin/../lib/snappy-java-1.0.5.jar:/opt/hbase/latest/bin/../lib/spymemcached-2.12.2.jar:/opt/hbase/latest/bin/../lib/validation-api-1.1.0.Final.jar:/opt/hbase/latest/bin/../lib/xmlenc-0.52.jar:/opt/hbase/latest/bin/../lib/xz-1.0.jar:/opt/hbase/latest/bin/../lib/zookeeper-3.4.10.jar:/opt/hbase/latest/bin/../lib/client-facing-thirdparty/audience-annotations-0.5.0.jar:/opt/hbase/latest/bin/../lib/client-facing-thirdparty/commons-logging-1.2.jar:/opt/hbase/latest/bin/../lib/client-facing-thirdparty/findbugs-annotations-1.3.9-1.jar:/opt/hbase/latest/bin/../lib/client-facing-thirdparty/htrace-core4-4.2.0-incubating.jar:/opt/hbase/latest/bin/../lib/client-facing-thirdparty/log4j-1.2.17.jar:/opt/hbase/latest/bin/../lib/client-facing-thirdparty/slf4j-api-1.7.25.jar:/opt/hbase/latest/bin/../lib/client-facing-thirdparty/htrace-core-3.1.0-incubating.jar:/opt/hbase/latest/bin/../lib/client-facing-thirdparty/slf4j-log4j12-1.7.25.jar 2019-01-26 14:35:52,949 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18] zookeeper.ZooKeeper: Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib 2019-01-26 14:35:52,949 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18] zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp 2019-01-26 14:35:52,949 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18] zookeeper.ZooKeeper: Client environment:java.compiler=<NA> 2019-01-26 14:35:52,949 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18] zookeeper.ZooKeeper: Client environment:os.name=Linux 2019-01-26 14:35:52,949 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18] zookeeper.ZooKeeper: Client environment:os.arch=amd64 2019-01-26 14:35:52,949 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18] zookeeper.ZooKeeper: Client environment:os.version=4.15.0-43-generic 2019-01-26 14:35:52,949 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18] zookeeper.ZooKeeper: Client environment:user.name=tomcat 2019-01-26 14:35:52,949 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18] zookeeper.ZooKeeper: Client environment:user.home=/home/tomcat 2019-01-26 14:35:52,950 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18] zookeeper.ZooKeeper: Client environment:user.dir=/home/tomcat 2019-01-26 14:35:52,951 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18] zookeeper.ZooKeeper: Initiating client connection, connectString=vm148:2181,vm149:2181,vm150:2181 sessionTimeout=90000 watcher=org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$$Lambda$12/835683477@2b267d81 2019-01-26 14:35:52,969 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18-SendThread(vm150:2181)] zookeeper.ClientCnxn: Opening socket connection to server vm150/192.168.31.150:2181. Will not attempt to authenticate using SASL (unknown error) 2019-01-26 14:35:52,974 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18-SendThread(vm150:2181)] zookeeper.ClientCnxn: Socket connection established to vm150/192.168.31.150:2181, initiating session 2019-01-26 14:35:52,986 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18-SendThread(vm150:2181)] zookeeper.ClientCnxn: Session establishment complete on server vm150/192.168.31.150:2181, sessionid = 0x3002261518a0002, negotiated timeout = 40000 2019-01-26 14:35:54,071 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18] zookeeper.ZooKeeper: Session: 0x3002261518a0002 closed 2019-01-26 14:35:54,074 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x08909f18-EventThread] zookeeper.ClientCnxn: EventThread shut down for session: 0x3002261518a0002 2019-01-26 14:35:54,095 INFO [main] Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id 2019-01-26 14:35:54,096 INFO [main] jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 2019-01-26 14:35:54,126 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x42f8285e] zookeeper.ZooKeeper: Initiating client connection, connectString=vm148:2181,vm149:2181,vm150:2181 sessionTimeout=90000 watcher=org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$$Lambda$12/835683477@2b267d81 2019-01-26 14:35:54,130 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x42f8285e-SendThread(vm150:2181)] zookeeper.ClientCnxn: Opening socket connection to server vm150/192.168.31.150:2181. Will not attempt to authenticate using SASL (unknown error) 2019-01-26 14:35:54,134 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x42f8285e-SendThread(vm150:2181)] zookeeper.ClientCnxn: Socket connection established to vm150/192.168.31.150:2181, initiating session 2019-01-26 14:35:54,138 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x42f8285e-SendThread(vm150:2181)] zookeeper.ClientCnxn: Session establishment complete on server vm150/192.168.31.150:2181, sessionid = 0x3002261518a0003, negotiated timeout = 40000 2019-01-26 14:35:54,416 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x42f8285e] zookeeper.ZooKeeper: Session: 0x3002261518a0003 closed 2019-01-26 14:35:54,416 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x42f8285e-EventThread] zookeeper.ClientCnxn: EventThread shut down for session: 0x3002261518a0003 2019-01-26 14:35:54,579 INFO [main] input.FileInputFormat: Total input paths to process : 1 2019-01-26 14:35:54,615 INFO [main] mapreduce.JobSubmitter: number of splits:1 2019-01-26 14:35:54,752 INFO [main] mapreduce.JobSubmitter: Submitting tokens for job: job_local98574210_0001 2019-01-26 14:35:55,026 INFO [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354849/hbase-hadoop2-compat-2.1.2.jar <- /home/tomcat/hbase-hadoop2-compat-2.1.2.jar 2019-01-26 14:35:55,084 INFO [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/hbase-hadoop2-compat-2.1.2.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354849/hbase-hadoop2-compat-2.1.2.jar 2019-01-26 14:35:55,686 INFO [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354850/jackson-core-2.9.2.jar <- /home/tomcat/jackson-core-2.9.2.jar 2019-01-26 14:35:55,693 INFO [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/jackson-core-2.9.2.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354850/jackson-core-2.9.2.jar 2019-01-26 14:35:55,713 INFO [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354851/hbase-metrics-2.1.2.jar <- /home/tomcat/hbase-metrics-2.1.2.jar 2019-01-26 14:35:55,722 INFO [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/hbase-metrics-2.1.2.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354851/hbase-metrics-2.1.2.jar 2019-01-26 14:35:55,744 INFO [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354852/hadoop-common-2.7.7.jar <- /home/tomcat/hadoop-common-2.7.7.jar 2019-01-26 14:35:55,746 INFO [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/hadoop-common-2.7.7.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354852/hadoop-common-2.7.7.jar 2019-01-26 14:35:55,746 INFO [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354853/zookeeper-3.4.10.jar <- /home/tomcat/zookeeper-3.4.10.jar 2019-01-26 14:35:55,754 INFO [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/zookeeper-3.4.10.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354853/zookeeper-3.4.10.jar 2019-01-26 14:35:55,755 INFO [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354854/hbase-protocol-shaded-2.1.2.jar <- /home/tomcat/hbase-protocol-shaded-2.1.2.jar 2019-01-26 14:35:55,758 INFO [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/hbase-protocol-shaded-2.1.2.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354854/hbase-protocol-shaded-2.1.2.jar 2019-01-26 14:35:55,758 INFO [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354855/hbase-client-2.1.2.jar <- /home/tomcat/hbase-client-2.1.2.jar 2019-01-26 14:35:55,760 INFO [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/hbase-client-2.1.2.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354855/hbase-client-2.1.2.jar 2019-01-26 14:35:55,760 INFO [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354856/hadoop-mapreduce-client-core-2.7.7.jar <- /home/tomcat/hadoop-mapreduce-client-core-2.7.7.jar 2019-01-26 14:35:55,762 INFO [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/hadoop-mapreduce-client-core-2.7.7.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354856/hadoop-mapreduce-client-core-2.7.7.jar 2019-01-26 14:35:55,762 INFO [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354857/hbase-shaded-netty-2.1.0.jar <- /home/tomcat/hbase-shaded-netty-2.1.0.jar 2019-01-26 14:35:55,763 INFO [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/hbase-shaded-netty-2.1.0.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354857/hbase-shaded-netty-2.1.0.jar 2019-01-26 14:35:55,763 INFO [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354858/commons-lang3-3.6.jar <- /home/tomcat/commons-lang3-3.6.jar 2019-01-26 14:35:55,766 INFO [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/commons-lang3-3.6.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354858/commons-lang3-3.6.jar 2019-01-26 14:35:55,766 INFO [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354859/hbase-mapreduce-2.1.2.jar <- /home/tomcat/hbase-mapreduce-2.1.2.jar 2019-01-26 14:35:55,768 INFO [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/hbase-mapreduce-2.1.2.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354859/hbase-mapreduce-2.1.2.jar 2019-01-26 14:35:55,768 INFO [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354860/metrics-core-3.2.1.jar <- /home/tomcat/metrics-core-3.2.1.jar 2019-01-26 14:35:55,770 INFO [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/metrics-core-3.2.1.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354860/metrics-core-3.2.1.jar 2019-01-26 14:35:55,770 INFO [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354861/hbase-common-2.1.2.jar <- /home/tomcat/hbase-common-2.1.2.jar 2019-01-26 14:35:55,771 INFO [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/hbase-common-2.1.2.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354861/hbase-common-2.1.2.jar 2019-01-26 14:35:55,771 INFO [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354862/htrace-core4-4.2.0-incubating.jar <- /home/tomcat/htrace-core4-4.2.0-incubating.jar 2019-01-26 14:35:55,775 INFO [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/client-facing-thirdparty/htrace-core4-4.2.0-incubating.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354862/htrace-core4-4.2.0-incubating.jar 2019-01-26 14:35:55,775 INFO [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354863/hbase-hadoop-compat-2.1.2.jar <- /home/tomcat/hbase-hadoop-compat-2.1.2.jar 2019-01-26 14:35:55,777 INFO [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/hbase-hadoop-compat-2.1.2.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354863/hbase-hadoop-compat-2.1.2.jar 2019-01-26 14:35:55,777 INFO [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354864/hbase-zookeeper-2.1.2.jar <- /home/tomcat/hbase-zookeeper-2.1.2.jar 2019-01-26 14:35:55,778 INFO [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/hbase-zookeeper-2.1.2.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354864/hbase-zookeeper-2.1.2.jar 2019-01-26 14:35:55,779 INFO [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354865/hbase-shaded-miscellaneous-2.1.0.jar <- /home/tomcat/hbase-shaded-miscellaneous-2.1.0.jar 2019-01-26 14:35:55,780 INFO [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/hbase-shaded-miscellaneous-2.1.0.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354865/hbase-shaded-miscellaneous-2.1.0.jar 2019-01-26 14:35:55,781 INFO [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354866/protobuf-java-2.5.0.jar <- /home/tomcat/protobuf-java-2.5.0.jar 2019-01-26 14:35:55,782 INFO [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/protobuf-java-2.5.0.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354866/protobuf-java-2.5.0.jar 2019-01-26 14:35:55,782 INFO [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354867/jackson-annotations-2.9.2.jar <- /home/tomcat/jackson-annotations-2.9.2.jar 2019-01-26 14:35:55,784 INFO [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/jackson-annotations-2.9.2.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354867/jackson-annotations-2.9.2.jar 2019-01-26 14:35:55,784 INFO [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354868/hbase-server-2.1.2.jar <- /home/tomcat/hbase-server-2.1.2.jar 2019-01-26 14:35:55,786 INFO [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/hbase-server-2.1.2.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354868/hbase-server-2.1.2.jar 2019-01-26 14:35:55,786 INFO [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354869/hbase-metrics-api-2.1.2.jar <- /home/tomcat/hbase-metrics-api-2.1.2.jar 2019-01-26 14:35:55,787 INFO [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/hbase-metrics-api-2.1.2.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354869/hbase-metrics-api-2.1.2.jar 2019-01-26 14:35:55,788 INFO [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354870/jackson-databind-2.9.2.jar <- /home/tomcat/jackson-databind-2.9.2.jar 2019-01-26 14:35:55,789 INFO [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/jackson-databind-2.9.2.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354870/jackson-databind-2.9.2.jar 2019-01-26 14:35:55,790 INFO [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354871/hbase-protocol-2.1.2.jar <- /home/tomcat/hbase-protocol-2.1.2.jar 2019-01-26 14:35:55,791 INFO [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/hbase-protocol-2.1.2.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354871/hbase-protocol-2.1.2.jar 2019-01-26 14:35:55,791 INFO [main] mapred.LocalDistributedCacheManager: Creating symlink: /tmp/hadoop-tomcat/mapred/local/1548513354872/hbase-shaded-protobuf-2.1.0.jar <- /home/tomcat/hbase-shaded-protobuf-2.1.0.jar 2019-01-26 14:35:55,799 INFO [main] mapred.LocalDistributedCacheManager: Localized file:/opt/hbase/hbase-2.1.2/lib/hbase-shaded-protobuf-2.1.0.jar as file:/tmp/hadoop-tomcat/mapred/local/1548513354872/hbase-shaded-protobuf-2.1.0.jar 2019-01-26 14:35:55,852 INFO [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354849/hbase-hadoop2-compat-2.1.2.jar 2019-01-26 14:35:55,853 INFO [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354850/jackson-core-2.9.2.jar 2019-01-26 14:35:55,853 INFO [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354851/hbase-metrics-2.1.2.jar 2019-01-26 14:35:55,853 INFO [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354852/hadoop-common-2.7.7.jar 2019-01-26 14:35:55,853 INFO [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354853/zookeeper-3.4.10.jar 2019-01-26 14:35:55,853 INFO [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354854/hbase-protocol-shaded-2.1.2.jar 2019-01-26 14:35:55,853 INFO [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354855/hbase-client-2.1.2.jar 2019-01-26 14:35:55,853 INFO [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354856/hadoop-mapreduce-client-core-2.7.7.jar 2019-01-26 14:35:55,853 INFO [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354857/hbase-shaded-netty-2.1.0.jar 2019-01-26 14:35:55,853 INFO [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354858/commons-lang3-3.6.jar 2019-01-26 14:35:55,853 INFO [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354859/hbase-mapreduce-2.1.2.jar 2019-01-26 14:35:55,853 INFO [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354860/metrics-core-3.2.1.jar 2019-01-26 14:35:55,853 INFO [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354861/hbase-common-2.1.2.jar 2019-01-26 14:35:55,853 INFO [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354862/htrace-core4-4.2.0-incubating.jar 2019-01-26 14:35:55,853 INFO [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354863/hbase-hadoop-compat-2.1.2.jar 2019-01-26 14:35:55,854 INFO [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354864/hbase-zookeeper-2.1.2.jar 2019-01-26 14:35:55,854 INFO [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354865/hbase-shaded-miscellaneous-2.1.0.jar 2019-01-26 14:35:55,854 INFO [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354866/protobuf-java-2.5.0.jar 2019-01-26 14:35:55,854 INFO [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354867/jackson-annotations-2.9.2.jar 2019-01-26 14:35:55,854 INFO [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354868/hbase-server-2.1.2.jar 2019-01-26 14:35:55,854 INFO [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354869/hbase-metrics-api-2.1.2.jar 2019-01-26 14:35:55,854 INFO [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354870/jackson-databind-2.9.2.jar 2019-01-26 14:35:55,854 INFO [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354871/hbase-protocol-2.1.2.jar 2019-01-26 14:35:55,854 INFO [main] mapred.LocalDistributedCacheManager: file:/tmp/hadoop-tomcat/mapred/local/1548513354872/hbase-shaded-protobuf-2.1.0.jar 2019-01-26 14:35:55,858 INFO [main] mapreduce.Job: The url to track the job: http://localhost:8080/ 2019-01-26 14:35:55,858 INFO [main] mapreduce.Job: Running job: job_local98574210_0001 2019-01-26 14:35:55,861 INFO [Thread-55] mapred.LocalJobRunner: OutputCommitter set in config null 2019-01-26 14:35:55,892 INFO [Thread-55] mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.hbase.mapreduce.TableOutputCommitter 2019-01-26 14:35:55,936 INFO [Thread-55] mapred.LocalJobRunner: Waiting for map tasks 2019-01-26 14:35:55,938 INFO [LocalJobRunner Map Task Executor #0] mapred.LocalJobRunner: Starting task: attempt_local98574210_0001_m_000000_0 2019-01-26 14:35:55,995 INFO [LocalJobRunner Map Task Executor #0] mapred.Task: Using ResourceCalculatorProcessTree : [ ] 2019-01-26 14:35:56,000 INFO [LocalJobRunner Map Task Executor #0] mapred.MapTask: Processing split: file:/home/tomcat/output.csv:0+1703 2019-01-26 14:35:56,008 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x53b1bab3] zookeeper.ZooKeeper: Initiating client connection, connectString=vm148:2181,vm149:2181,vm150:2181 sessionTimeout=90000 watcher=org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$$Lambda$12/835683477@2b267d81 2019-01-26 14:35:56,009 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x53b1bab3-SendThread(vm149:2181)] zookeeper.ClientCnxn: Opening socket connection to server vm149/192.168.31.149:2181. Will not attempt to authenticate using SASL (unknown error) 2019-01-26 14:35:56,009 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x53b1bab3-SendThread(vm149:2181)] zookeeper.ClientCnxn: Socket connection established to vm149/192.168.31.149:2181, initiating session 2019-01-26 14:35:56,016 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x53b1bab3-SendThread(vm149:2181)] zookeeper.ClientCnxn: Session establishment complete on server vm149/192.168.31.149:2181, sessionid = 0x200226284420008, negotiated timeout = 40000 2019-01-26 14:35:56,021 INFO [LocalJobRunner Map Task Executor #0] mapreduce.TableOutputFormat: Created table instance for test 2019-01-26 14:35:56,047 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x3b9f7f95] zookeeper.ZooKeeper: Initiating client connection, connectString=vm148:2181,vm149:2181,vm150:2181 sessionTimeout=90000 watcher=org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$$Lambda$12/835683477@2b267d81 2019-01-26 14:35:56,048 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x3b9f7f95-SendThread(vm149:2181)] zookeeper.ClientCnxn: Opening socket connection to server vm149/192.168.31.149:2181. Will not attempt to authenticate using SASL (unknown error) 2019-01-26 14:35:56,049 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x3b9f7f95-SendThread(vm149:2181)] zookeeper.ClientCnxn: Socket connection established to vm149/192.168.31.149:2181, initiating session 2019-01-26 14:35:56,052 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x3b9f7f95-SendThread(vm149:2181)] zookeeper.ClientCnxn: Session establishment complete on server vm149/192.168.31.149:2181, sessionid = 0x200226284420009, negotiated timeout = 40000 2019-01-26 14:35:56,116 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x3b9f7f95] zookeeper.ZooKeeper: Session: 0x200226284420009 closed 2019-01-26 14:35:56,116 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x3b9f7f95-EventThread] zookeeper.ClientCnxn: EventThread shut down for session: 0x200226284420009 2019-01-26 14:35:56,138 INFO [LocalJobRunner Map Task Executor #0] mapred.LocalJobRunner: 2019-01-26 14:35:56,280 INFO [LocalJobRunner Map Task Executor #0] mapred.Task: Task:attempt_local98574210_0001_m_000000_0 is done. And is in the process of committing 2019-01-26 14:35:56,289 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x53b1bab3] zookeeper.ZooKeeper: Session: 0x200226284420008 closed 2019-01-26 14:35:56,289 INFO [ReadOnlyZKClient-vm148:2181,vm149:2181,vm150:2181@0x53b1bab3-EventThread] zookeeper.ClientCnxn: EventThread shut down for session: 0x200226284420008 2019-01-26 14:35:56,296 INFO [LocalJobRunner Map Task Executor #0] mapred.LocalJobRunner: map 2019-01-26 14:35:56,296 INFO [LocalJobRunner Map Task Executor #0] mapred.Task: Task 'attempt_local98574210_0001_m_000000_0' done. 2019-01-26 14:35:56,303 INFO [LocalJobRunner Map Task Executor #0] mapred.Task: Final Counters for attempt_local98574210_0001_m_000000_0: Counters: 16 File System Counters FILE: Number of bytes read=37574934 FILE: Number of bytes written=38237355 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 Map-Reduce Framework Map input records=30 Map output records=30 Input split bytes=93 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=8 Total committed heap usage (bytes)=62849024 ImportTsv Bad Lines=0 File Input Format Counters Bytes Read=1703 File Output Format Counters Bytes Written=0 2019-01-26 14:35:56,304 INFO [LocalJobRunner Map Task Executor #0] mapred.LocalJobRunner: Finishing task: attempt_local98574210_0001_m_000000_0 2019-01-26 14:35:56,304 INFO [Thread-55] mapred.LocalJobRunner: map task executor complete. 2019-01-26 14:35:56,860 INFO [main] mapreduce.Job: Job job_local98574210_0001 running in uber mode : false 2019-01-26 14:35:56,862 INFO [main] mapreduce.Job: map 100% reduce 0% 2019-01-26 14:35:56,866 INFO [main] mapreduce.Job: Job job_local98574210_0001 completed successfully 2019-01-26 14:35:56,899 INFO [main] mapreduce.Job: Counters: 16 File System Counters FILE: Number of bytes read=37574934 FILE: Number of bytes written=38237355 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 Map-Reduce Framework Map input records=30 Map output records=30 Input split bytes=93 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=8 Total committed heap usage (bytes)=62849024 ImportTsv Bad Lines=0 File Input Format Counters Bytes Read=1703 File Output Format Counters Bytes Written=0
.將tsv導入hbase, 這邊使用的文件直接導入, 2.6GB, 5kw條記錄花了整整29分鐘, 不知道是否是放到hdfs裏再導入會快一些?
/opt/hbase/latest/bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns=HBASE_ROW_KEY,cf:key1,cf:key2,cf:key3,cf:key4,cf:key5 worktable posts.txt
Update 2019-01-28: 關於TSV文件裏的雙引號, 字段值當中的tab字符:
若是mysql -e導出時使用了OPTIONALLY ENCLOSED BY '\"', 那麼導出的tsv文件中, 凡是字符串類型的字段, 都會加上雙引號, 在經過上面的語句導入到HBase中以後, 會發現雙引號出如今了字段的value當中. 因此mysql -e時, 不建議使用 OPTIONALLY ENCLOSED BY '\"' 參數
若是mysql的記錄中, 字段的值包含了tab, 那麼在導出時, 會被自動轉義, 以下
40 2 ,\ [bot] 1528869876 41 2 [bot], 1528869876 42 2 t\ [bot]" 1528869876 43 2 t\ [bot]' 1528869876 44 2 't\ [bot]' 1528869876 45 2 "t\ [bot]" 1528869876 46 2 t\ [bot] 1528869876 47 2 tab\ \ [bot] 1528869876
這個和是否使用OPTIONALLY ENCLOSED BY '\"' 有關, 上面是不加此參數的, 下面是加了此參數輸出的內容
40 2 ", [bot]" 1528869876 41 2 "[bot]," 1528869876 42 2 "t [bot]\"" 1528869876 43 2 "t [bot]'" 1528869876 44 2 "'t [bot]'" 1528869876 45 2 "\"t [bot]\"" 1528869876 46 2 "t [bot]" 1528869876 47 2 "tab [bot]" 1528869876
能夠看到, 加了此參數後, 就再也不轉義tab, 而是轉義雙引號.
而對於importTSV, 對於以上兩種TSV文件, 這幾行帶tab的數據都是不能正常導入的, 會被處理爲Bad Line. 由於importTSV處理分隔符時是簡單地對單字符逐個處理, 並不會識別轉義的tab. 具體的代碼能夠查看其源代碼 https://github.com/apache/hbase/blob/master/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/ImportTsv.java aaa其中的 public ParsedLine parse(byte[] lineBytes, int length) 方法.
因此若是字段值帶tab的話, 要麼換一個不衝突的分隔符, 要麼在生成TSV時替換成別的內容(例如空格)
==
在hbase shell裏, count 'worktable' 速度很是慢, 花了一個小時才count完
Current count: 49458000, row: 9999791 49458230 row(s) Took 3684.2802 seconds => 49458230
get的速度很快
hbase(main):056:0> get 'smth','1995' COLUMN CELL cf:post_time timestamp=1548515983185, value=876546980 cf:user_id timestamp=1548515983185, value=554 cf:username timestamp=1548515983185, value="aaa" 1 row(s) Took 0.0882 seconds hbase(main):057:0> get 'smth','49471229' COLUMN CELL cf:post_time timestamp=1548515983185, value=1546941261 cf:user_id timestamp=1548515983185, value=161838 cf:username timestamp=1548515983185, value="bbb" 1 row(s) Took 0.0873 seconds
.