批量導入數據到HBase

hbase通常用於大數據的批量分析,因此在不少狀況下須要將大量數據從外部導入到hbase中,
hbase提供了一種導入數據的方式,
主要用於批量導入大量數據,即importtsv工具,用法以下:
 
Usage: importtsv -Dimporttsv.columns=a,b,c <tablename> <inputdir>

Imports the given input directory of TSV data into the specified table.

The column names of the TSV data must be specified using the -Dimporttsv.columns
option. This option takes the form of comma-separated column names, where each
column name is either a simple column family, or a columnfamily:qualifier. The special
column name HBASE_ROW_KEY is used to designate that this column should be used as the row key for each imported record. You must specify exactly one column to be the row key, and you must specify a column name for every column that exists in the input data. Another special column HBASE_TS_KEY designates that this column should be
used as timestamp for each record. Unlike HBASE_ROW_KEY, HBASE_TS_KEY is optional.
You must specify atmost one column as timestamp key for each imported record.
Record with invalid timestamps (blank, non-numeric) will be treated as bad record.
Note: if you use this option, then 'importtsv.timestamp' option will be ignored.

By default importtsv will load data directly into HBase. To instead generate HFiles of data to prepare for a bulk data load, pass the option: -Dimporttsv.bulk.output=/path/for/output
  Note: if you do not use this option, then the target table must already exist in HBase

Other options that may be specified with -D include:
  -Dimporttsv.skip.bad.lines=false - fail if encountering an invalid line
  '-Dimporttsv.separator=|' - eg separate on pipes instead of tabs
  -Dimporttsv.timestamp=currentTimeAsLong - use the specified timestamp for the import
  -Dimporttsv.mapper.class=my.Mapper - A user-defined Mapper to use instead of org.apache.hadoop.hbase.mapreduce.TsvImporterMapper
For performance consider the following options:
  -Dmapred.map.tasks.speculative.execution=false
  -Dmapred.reduce.tasks.speculative.execution=false

hbase提供importtsv工具支持從tsv文件中將數據導入hbase。使用該工具將文本數據加載至hbase十分高效,由於它是經過mapreduce
job來實施導入的。哪怕是要從現有的關係型數據庫中加載數據,也能夠先將數據導入文本文件中,而後使用importtsv 工具導入hbase。
在導入海量數據時,這個方式運行的很好,由於導出數據比在關係型數據庫中執行sql快不少。 importtsv工具不只支持將數據直接加載進hbase的表中,還支持直接生成hbase自有格式文件(hfile),因此你能夠用hbase的bulk load工具將生成好的文件直接加載進運行中的hbase集羣。這樣就減小了在數據遷移過程當中,數據傳輸與hbase加載時產生的網絡流量。下文描述了 importtsv 和bulk load工具的使用場景。咱們首先展現使用importtsv工具從tsv文件中將數據加載至hbase表中。
固然也會包含如何直接生成hbase自有格式文件,以及如何直接將已經生成好的文件加載入hbase

bulk-load的做用是用mapreduce的方式將hdfs上的文件裝載到hbase中,對於海量數據裝載入hbase很是有用.java

測試以下:git

landen@Master:~/UntarFile/hadoop-1.0.4$ bin/hadoop jar $HADOOP_HOME/lib/hbase-0.94.12.jar importtsv -Dimporttsv.columns=HBASE_ROW_KEY,IPAddress:countrycode,IPAddress:countryname,IPAddress:region,IPAddress:regionname,IPAddress:city,IPAddress:latitude,IPAddress:longitude,IPAddress:timezone -Dimporttsv.bulk.output=/output HiddenIPInfo /input
Warning: $HADOOP_HOME is deprecated.

13/12/09 21:52:28 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT
13/12/09 21:52:28 INFO zookeeper.ZooKeeper: Client environment:host.name=Master
13/12/09 21:52:28 INFO zookeeper.ZooKeeper: Client environment:java.version=1.7.0_17
13/12/09 21:52:28 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation
13/12/09 21:52:28 INFO zookeeper.ZooKeeper: Client environment:java.home=/home/landen/UntarFile/jdk1.7.0_17/jre
13/12/09 21:52:28 INFO zookeeper.ZooKeeper: Client environment:java.class.path=/home/landen/UntarFile/hadoop-1.0.4/conf:/home/landen/UntarFile/jdk1.7.0_17/lib/tools.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/..:/home/landen/UntarFile/hadoop-1.0.4/libexec/../hadoop-core-1.0.4.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/asm-3.2.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/aspectjrt-1.6.5.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/aspectjtools-1.6.5.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/chukwa-0.5.0-client.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/chukwa-0.5.0.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/commons-beanutils-1.7.0.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/commons-beanutils-core-1.8.0.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/commons-cli-1.2.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/commons-codec-1.4.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/commons-collections-3.2.1.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/commons-configuration-1.6.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/commons-daemon-1.0.1.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/commons-digester-1.8.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/commons-el-1.0.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/commons-httpclient-3.0.1.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/commons-io-2.1.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/commons-lang-2.4.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/commons-logging-1.1.1.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/commons-logging-api-1.0.4.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/commons-math-2.1.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/commons-net-1.4.1.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/core-3.1.1.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/guava-11.0.2.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/hadoop-capacity-scheduler-1.0.4.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/hadoop-fairscheduler-1.0.4.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/hadoop-thriftfs-1.0.4.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/hbase-0.94.12.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/hsqldb-1.8.0.10.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/jackson-core-asl-1.8.8.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/jackson-mapper-asl-1.8.8.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/jasper-compiler-5.5.12.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/jasper-runtime-5.5.12.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/jdeb-0.8.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/jersey-core-1.8.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/jersey-json-1.8.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/jersey-server-1.8.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/jets3t-0.6.1.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/jetty-6.1.26.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/jetty-util-6.1.26.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/jsch-0.1.42.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/json-simple-1.1.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/junit-4.5.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/kfs-0.2.2.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/LoadJsonData.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/log4j-1.2.15.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/mockito-all-1.8.5.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/oro-2.0.8.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/protobuf-java-2.4.0a.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/servlet-api-2.5-20081211.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/slf4j-api-1.4.3.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/slf4j-log4j12-1.4.3.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/xmlenc-0.52.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/zookeeper-3.4.5.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/jsp-2.1/jsp-2.1.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/jsp-2.1/jsp-api-2.1.jar
13/12/09 21:52:28 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/native/Linux-i386-32
13/12/09 21:52:28 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
13/12/09 21:52:28 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
13/12/09 21:52:28 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux
13/12/09 21:52:28 INFO zookeeper.ZooKeeper: Client environment:os.arch=i386
13/12/09 21:52:28 INFO zookeeper.ZooKeeper: Client environment:os.version=3.2.0-24-generic-pae
13/12/09 21:52:28 INFO zookeeper.ZooKeeper: Client environment:user.name=landen
13/12/09 21:52:28 INFO zookeeper.ZooKeeper: Client environment:user.home=/home/landen
13/12/09 21:52:28 INFO zookeeper.ZooKeeper: Client environment:user.dir=/home/landen/UntarFile/hadoop-1.0.4
13/12/09 21:52:28 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=Slave1:2222,Master:2222,Slave2:2222 sessionTimeout=180000 watcher=hconnection
13/12/09 21:52:28 INFO zookeeper.ClientCnxn: Opening socket connection to server Slave1/10.21.244.124:2222. Will not attempt to authenticate using SASL (unknown error)
13/12/09 21:52:28 INFO zookeeper.RecoverableZooKeeper: The identifier of this process is 6809@Master
13/12/09 21:52:28 INFO zookeeper.ClientCnxn: Socket connection established to Slave1/10.21.244.124:2222, initiating session
13/12/09 21:52:28 INFO zookeeper.ClientCnxn: Session establishment complete on server Slave1/10.21.244.124:2222, sessionid = 0x142cbdf535f0010, negotiated timeout = 180000
13/12/09 21:52:28 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=Slave1:2222,Master:2222,Slave2:2222 sessionTimeout=180000 watcher=catalogtracker-on-org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@821075
13/12/09 21:52:28 INFO zookeeper.ClientCnxn: Opening socket connection to server Slave2/10.21.244.110:2222. Will not attempt to authenticate using SASL (unknown error)
13/12/09 21:52:28 INFO zookeeper.RecoverableZooKeeper: The identifier of this process is 6809@Master
13/12/09 21:52:28 INFO zookeeper.ClientCnxn: Socket connection established to Slave2/10.21.244.110:2222, initiating session
13/12/09 21:52:28 INFO zookeeper.ClientCnxn: Session establishment complete on server Slave2/10.21.244.110:2222, sessionid = 0x242d5abedac0016, negotiated timeout = 180000
13/12/09 21:52:28 INFO zookeeper.ClientCnxn: EventThread shut down
13/12/09 21:52:28 INFO zookeeper.ZooKeeper: Session: 0x242d5abedac0016 closed
13/12/09 21:52:28 INFO mapreduce.HFileOutputFormat: Looking up current regions for table org.apache.hadoop.hbase.client.HTable@1ae6df8
13/12/09 21:52:28 INFO mapreduce.HFileOutputFormat: Configuring 1 reduce partitions to match current region count
13/12/09 21:52:28 INFO mapreduce.HFileOutputFormat: Writing partition information to hdfs://Master:9000/user/landen/partitions_b0c3723c-85ea-4828-8521-52de201023f0
13/12/09 21:52:28 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/12/09 21:52:28 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
13/12/09 21:52:28 INFO compress.CodecPool: Got brand-new compressor
13/12/09 21:52:29 INFO mapreduce.HFileOutputFormat: Incremental table output configured.
13/12/09 21:52:34 INFO input.FileInputFormat: Total input paths to process : 1
13/12/09 21:52:34 WARN snappy.LoadSnappy: Snappy native library not loaded
13/12/09 21:52:35 INFO mapred.JobClient: Running job: job_201312042044_0027
13/12/09 21:52:36 INFO mapred.JobClient:  map 0% reduce 0%
13/12/09 21:53:41 INFO mapred.JobClient:  map 100% reduce 0%
13/12/09 21:53:56 INFO mapred.JobClient:  map 100% reduce 100%
13/12/09 21:54:01 INFO mapred.JobClient: Job complete: job_201312042044_0027
13/12/09 21:54:01 INFO mapred.JobClient: Counters: 30
13/12/09 21:54:01 INFO mapred.JobClient:   Job Counters
13/12/09 21:54:01 INFO mapred.JobClient:     Launched reduce tasks=1
13/12/09 21:54:01 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=42735
13/12/09 21:54:01 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/12/09 21:54:01 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/12/09 21:54:01 INFO mapred.JobClient:     Launched map tasks=1
13/12/09 21:54:01 INFO mapred.JobClient:     Data-local map tasks=1
13/12/09 21:54:01 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=13878
13/12/09 21:54:01 INFO mapred.JobClient:   ImportTsv
13/12/09 21:54:01 INFO mapred.JobClient:     Bad Lines=0
13/12/09 21:54:01 INFO mapred.JobClient:   File Output Format Counters
13/12/09 21:54:01 INFO mapred.JobClient:     Bytes Written=2194
13/12/09 21:54:01 INFO mapred.JobClient:   FileSystemCounters
13/12/09 21:54:01 INFO mapred.JobClient:     FILE_BYTES_READ=1895
13/12/09 21:54:01 INFO mapred.JobClient:     HDFS_BYTES_READ=333
13/12/09 21:54:01 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=77323
13/12/09 21:54:01 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=2194
13/12/09 21:54:01 INFO mapred.JobClient:   File Input Format Counters
13/12/09 21:54:01 INFO mapred.JobClient:     Bytes Read=233
13/12/09 21:54:01 INFO mapred.JobClient:   Map-Reduce Framework
13/12/09 21:54:01 INFO mapred.JobClient:     Map output materialized bytes=1742
13/12/09 21:54:01 INFO mapred.JobClient:     Map input records=3
13/12/09 21:54:01 INFO mapred.JobClient:     Reduce shuffle bytes=1742
13/12/09 21:54:01 INFO mapred.JobClient:     Spilled Records=6
13/12/09 21:54:01 INFO mapred.JobClient:     Map output bytes=1724
13/12/09 21:54:01 INFO mapred.JobClient:     Total committed heap usage (bytes)=131731456
13/12/09 21:54:01 INFO mapred.JobClient:     CPU time spent (ms)=14590
13/12/09 21:54:01 INFO mapred.JobClient:     Combine input records=0
13/12/09 21:54:01 INFO mapred.JobClient:     SPLIT_RAW_BYTES=100
13/12/09 21:54:01 INFO mapred.JobClient:     Reduce input records=3
13/12/09 21:54:01 INFO mapred.JobClient:     Reduce input groups=3
13/12/09 21:54:01 INFO mapred.JobClient:     Combine output records=0
13/12/09 21:54:01 INFO mapred.JobClient:     Physical memory (bytes) snapshot=184393728
13/12/09 21:54:01 INFO mapred.JobClient:     Reduce output records=24
13/12/09 21:54:01 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=698474496
13/12/09 21:54:01 INFO mapred.JobClient:     Map output records=3
landen@Master:~/UntarFile/hadoop-1.0.4$ bin/hadoop fs -ls /output
Warning: $HADOOP_HOME is deprecated.

Found 3 items
drwxr-xr-x   - landen supergroup          0 2013-12-09 21:53 /output/IPAddress
-rw-r--r--   1 landen supergroup          0 2013-12-09 21:53 /output/_SUCCESS
drwxr-xr-x   - landen supergroup          0 2013-12-09 21:52 /output/_logssql

completebulkload 工具讀取生成的文件,判斷它們歸屬的族羣,而後訪問適當的族羣服務器。族羣服務器會將hfile文件轉移進自身存儲目錄中,而且爲客戶端創建在線數據.shell

landen@Master:~/UntarFile/hadoop-1.0.4$ bin/hadoop jar $HADOOP_HOME/lib/hbase-0.94.12.jar completebulkload /output HiddenIPInfo(HBase對應表名)
Warning: $HADOOP_HOME is deprecated.

13/12/09 22:00:00 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT
13/12/09 22:00:00 INFO zookeeper.ZooKeeper: Client environment:host.name=Master
13/12/09 22:00:00 INFO zookeeper.ZooKeeper: Client environment:java.version=1.7.0_17
13/12/09 22:00:00 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation
13/12/09 22:00:00 INFO zookeeper.ZooKeeper: Client environment:java.home=/home/landen/UntarFile/jdk1.7.0_17/jre
13/12/09 22:00:00 INFO zookeeper.ZooKeeper: Client environment:java.class.path=/home/landen/UntarFile/hadoop-1.0.4/conf:/home/landen/UntarFile/jdk1.7.0_17/lib/tools.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/..:/home/landen/UntarFile/hadoop-1.0.4/libexec/../hadoop-core-1.0.4.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/asm-3.2.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/aspectjrt-1.6.5.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/aspectjtools-1.6.5.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/chukwa-0.5.0-client.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/chukwa-0.5.0.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/commons-beanutils-1.7.0.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/commons-beanutils-core-1.8.0.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/commons-cli-1.2.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/commons-codec-1.4.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/commons-collections-3.2.1.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/commons-configuration-1.6.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/commons-daemon-1.0.1.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/commons-digester-1.8.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/commons-el-1.0.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/commons-httpclient-3.0.1.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/commons-io-2.1.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/commons-lang-2.4.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/commons-logging-1.1.1.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/commons-logging-api-1.0.4.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/commons-math-2.1.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/commons-net-1.4.1.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/core-3.1.1.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/guava-11.0.2.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/hadoop-capacity-scheduler-1.0.4.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/hadoop-fairscheduler-1.0.4.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/hadoop-thriftfs-1.0.4.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/hbase-0.94.12.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/hsqldb-1.8.0.10.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/jackson-core-asl-1.8.8.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/jackson-mapper-asl-1.8.8.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/jasper-compiler-5.5.12.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/jasper-runtime-5.5.12.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/jdeb-0.8.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/jersey-core-1.8.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/jersey-json-1.8.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/jersey-server-1.8.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/jets3t-0.6.1.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/jetty-6.1.26.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/jetty-util-6.1.26.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/jsch-0.1.42.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/json-simple-1.1.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/junit-4.5.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/kfs-0.2.2.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/LoadJsonData.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/log4j-1.2.15.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/mockito-all-1.8.5.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/oro-2.0.8.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/protobuf-java-2.4.0a.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/servlet-api-2.5-20081211.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/slf4j-api-1.4.3.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/slf4j-log4j12-1.4.3.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/xmlenc-0.52.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/zookeeper-3.4.5.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/jsp-2.1/jsp-2.1.jar:/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/jsp-2.1/jsp-api-2.1.jar
13/12/09 22:00:00 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/home/landen/UntarFile/hadoop-1.0.4/libexec/../lib/native/Linux-i386-32
13/12/09 22:00:00 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
13/12/09 22:00:00 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
13/12/09 22:00:00 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux
13/12/09 22:00:00 INFO zookeeper.ZooKeeper: Client environment:os.arch=i386
13/12/09 22:00:00 INFO zookeeper.ZooKeeper: Client environment:os.version=3.2.0-24-generic-pae
13/12/09 22:00:00 INFO zookeeper.ZooKeeper: Client environment:user.name=landen
13/12/09 22:00:00 INFO zookeeper.ZooKeeper: Client environment:user.home=/home/landen
13/12/09 22:00:00 INFO zookeeper.ZooKeeper: Client environment:user.dir=/home/landen/UntarFile/hadoop-1.0.4
13/12/09 22:00:00 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=Slave1:2222,Master:2222,Slave2:2222 sessionTimeout=180000 watcher=hconnection
13/12/09 22:00:00 INFO zookeeper.ClientCnxn: Opening socket connection to server Slave1/10.21.244.124:2222. Will not attempt to authenticate using SASL (unknown error)
13/12/09 22:00:00 INFO zookeeper.RecoverableZooKeeper: The identifier of this process is 7168@Master
13/12/09 22:00:00 INFO zookeeper.ClientCnxn: Socket connection established to Slave1/10.21.244.124:2222, initiating session
13/12/09 22:00:00 INFO zookeeper.ClientCnxn: Session establishment complete on server Slave1/10.21.244.124:2222, sessionid = 0x142cbdf535f0011, negotiated timeout = 180000
13/12/09 22:00:00 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=Slave1:2222,Master:2222,Slave2:2222 sessionTimeout=180000 watcher=catalogtracker-on-org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@a13b90
13/12/09 22:00:00 INFO zookeeper.ClientCnxn: Opening socket connection to server Slave1/10.21.244.124:2222. Will not attempt to authenticate using SASL (unknown error)
13/12/09 22:00:00 INFO zookeeper.RecoverableZooKeeper: The identifier of this process is 7168@Master
13/12/09 22:00:00 INFO zookeeper.ClientCnxn: Socket connection established to Slave1/10.21.244.124:2222, initiating session
13/12/09 22:00:00 INFO zookeeper.ClientCnxn: Session establishment complete on server Slave1/10.21.244.124:2222, sessionid = 0x142cbdf535f0012, negotiated timeout = 180000
13/12/09 22:00:01 INFO zookeeper.ZooKeeper: Session: 0x142cbdf535f0012 closed
13/12/09 22:00:01 INFO zookeeper.ClientCnxn: EventThread shut down
13/12/09 22:00:01 WARN mapreduce.LoadIncrementalHFiles: Skipping non-directory hdfs://Master:9000/output/_SUCCESS
13/12/09 22:00:01 INFO hfile.CacheConfig: Allocating LruBlockCache with maximum size 222.2m
13/12/09 22:00:01 INFO util.ChecksumType: Checksum can use java.util.zip.CRC32
13/12/09 22:00:01 INFO mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://Master:9000/output/IPAddress/b29b74ad57ff4be1a62968229b7e23d4 first=125.111.251.118 last=60.180.248.201
landen@Master:~/UntarFile/hadoop-1.0.4$ 數據庫

在HBase shell中查詢批量導入到HBase表HiddenIPInfo的數據:apache

hbase(main):045:0> scan 'HiddenIPInfo'
ROW                            COLUMN+CELL                                                                             
 125.111.251.118               column=IPAddress:city, timestamp=1386597147615, value=Ningbo                            
 125.111.251.118               column=IPAddress:countrycode, timestamp=1386597147615, value=CN                         
 125.111.251.118               column=IPAddress:countryname, timestamp=1386597147615, value=China                      
 125.111.251.118               column=IPAddress:latitude, timestamp=1386597147615, value=29.878204                     
 125.111.251.118               column=IPAddress:longitude, timestamp=1386597147615, value=121.5495                     
 125.111.251.118               column=IPAddress:region, timestamp=1386597147615, value=02                              
 125.111.251.118               column=IPAddress:regionname, timestamp=1386597147615, value=Zhejiang                    
 125.111.251.118               column=IPAddress:timezone, timestamp=1386597147615, value=Asia/Shanghai                 
 221.12.10.218                 column=IPAddress:city, timestamp=1386597147615, value=Hangzhou                          
 221.12.10.218                 column=IPAddress:countrycode, timestamp=1386597147615, value=CN                         
 221.12.10.218                 column=IPAddress:countryname, timestamp=1386597147615, value=China                      
 221.12.10.218                 column=IPAddress:latitude, timestamp=1386597147615, value=30.293594                     
 221.12.10.218                 column=IPAddress:longitude, timestamp=1386597147615, value=120.16141                    
 221.12.10.218                 column=IPAddress:region, timestamp=1386597147615, value=02                              
 221.12.10.218                 column=IPAddress:regionname, timestamp=1386597147615, value=Zhejiang                    
 221.12.10.218                 column=IPAddress:timezone, timestamp=1386597147615, value=Asia/Shanghai                 
 60.180.248.201                column=IPAddress:city, timestamp=1386597147615, value=Wenzhou                           
 60.180.248.201                column=IPAddress:countrycode, timestamp=1386597147615, value=CN                         
 60.180.248.201                column=IPAddress:countryname, timestamp=1386597147615, value=China                      
 60.180.248.201                column=IPAddress:latitude, timestamp=1386597147615, value=27.999405                     
 60.180.248.201                column=IPAddress:longitude, timestamp=1386597147615, value=120.66681                    
 60.180.248.201                column=IPAddress:region, timestamp=1386597147615, value=02                              
 60.180.248.201                column=IPAddress:regionname, timestamp=1386597147615, value=Zhejiang                    
 60.180.248.201                column=IPAddress:timezone, timestamp=1386597147615, value=Asia/Shanghai                 
3 row(s) in 0.2640 seconds

json

Note:api

1> HBASE_ROW_KEY能夠不在第一列,若是在第二列,則第二列做爲row key;服務器

2>  tsv文件的字段索引與hbase表中列的對應信息是對 -dimporttsv.columns參數進行設置;網絡

3> 若是設置了輸出目錄-Dimporttsv.bulk.output, HiddenIPInfo表還暫時不會生成,只是將hfile輸出到output文件夾下(當進行completebulkload導入操做後HiddenIPInfo纔會生成); 而後執行bin/hadoop jar hbase-VERSION.jar completebulkload /output(HFile文件存放目錄)  HiddenIPInfo(對應的HBase表名)操做將這個輸出目錄中的hfile文件轉移到對應的region中,這一步由於只是mv,因此至關快;

4> 若是數據特別大,而表中原來就有region,那麼會執行切分工做,查找數據對應的region並裝載;

5> bin/hadoop jar $HADOOP_HOME/lib/hbase-0.94.12.jar importtsv -Dimporttsv.columns=HBASE_ROW_KEY,IPAddress:countrycode,IPAddress:countryname,IPAddress:region,IPAddress:regionname,IPAddress:city,IPAddress:latitude,IPAddress:longitude,IPAddress:timezone (-Dimporttsv.bulk.output=/output) HiddenIPInfo /input當未指定-Dimporttsv.bulk.output參數時,則:

1. 執行命令前,表需已建立完成;

2. 此方式採用Put方法向hbase寫入數據,性能較低,在map階段使用的是tableoutputformat. 經過指定-Dimporttsv.bulk.output參數,importtsv工具能夠直接生成StorageFile,使用hfileoutputformat來代替在hdfs中生成hbase的自有格式文件(hfile),而後配合CompleteBulkLoad工具來加載生成的文件到一個運行的集羣中並導入hbase,性能更好, 若是表不存在,CompleteBulkLoad工具會自動建立;

6> importtsv工具只從hdfs中讀取數據,因此一開始咱們須要將tsv文件從本地文件系統拷貝到hdfs中。importtsv工具要求源文件知足TSV格式,關於TSV文件格式,可參考:http://en.wikipedia.org/wiki/Tab-separated_values,獲取源文件後,先將源文件導入到hdfs中,hadoop dfs -copyFromLocal file:///path/to/source-file hdfs:///path/to/source-file,即源文件默認以"\t"爲分割符,若是須要換成其它分割符,在執行時加上-Dimporttsv.separator=",",則變成了以","分割.

相關文章
相關標籤/搜索