最近一個羣友的boss讓研究hbase,讓hbase的入庫速度達到5w+/s,這可愁死了,4臺我的電腦組成的集羣,多線程入庫調了很久,速度也才1w左右,都沒有達到理想的那種速度,而後就想到了這種方式,可是網上可能是用mapreduce來實現入庫,而如今的需求是實時入庫,不生成文件了,因此就只能本身用代碼實現了,可是網上查了不少資料都沒有查到,最後在一個網友的指引下,看了源碼,最後找到了生成Hfile的方式,實現了以後,發現單線程入庫速度才達到1w4左右,和以前的多線程的全速差很少了,百思不得其解之時,調整了一下代碼把列的Byte.toBytes(cols)這個方法調整出來只作一次,速度立馬就到3w了,提高很是明顯,這是個人電腦上的速度,估計在它的集羣上能更快一點吧,下面把代碼和你們分享一下。多線程
String tableName = "taglog" [] family = Bytes.toBytes("logs" Configuration conf = conf.set("hbase.master", "192.168.1.133:60000" conf.set("hbase.zookeeper.quorum", "192.168.1.135" conf.set("hbase.metrics.showTableName", "false" String outputdir = "hdfs://hadoop.Master:8020/user/SEA/hfiles/" Path dir = Path familydir = FileSystem fs = BloomType bloomType = HFileDataBlockEncoder encoder = blockSize = 64000 Configuration tempConf = tempConf.set("hbase.metrics.showTableName", "false" tempConf.setFloat(HConstants.HFILE_BLOCK_CACHE_SIZE_KEY, 1.0f StoreFile.Writer writer = StoreFile.WriterBuilder(conf, start = DecimalFormat df = DecimalFormat("0000000" KeyValue kv1 = KeyValue kv2 = KeyValue kv3 = KeyValue kv4 = KeyValue kv5 = KeyValue kv6 = KeyValue kv7 = KeyValue kv8 = [] cn = Bytes.toBytes("cn" [] dt = Bytes.toBytes("dt" [] ic = Bytes.toBytes("ic" [] ifs = Bytes.toBytes("if" [] ip = Bytes.toBytes("ip" [] le = Bytes.toBytes("le" [] mn = Bytes.toBytes("mn" [] pi = Bytes.toBytes("pi" maxLength = 3000000 ( i=0;i<maxLength;i++ String currentTime = ""+System.currentTimeMillis() + current = kv1 = family, cn,current,KeyValue.Type.Put,Bytes.toBytes("3" kv2 = family, dt,current,KeyValue.Type.Put,Bytes.toBytes("6" kv3 = family, ic,current,KeyValue.Type.Put,Bytes.toBytes("8" kv4 = family, ifs,current,KeyValue.Type.Put,Bytes.toBytes("7" kv5 = family, ip,current,KeyValue.Type.Put,Bytes.toBytes("4" kv6 = family, le,current,KeyValue.Type.Put,Bytes.toBytes("2" kv7 = family, mn,current,KeyValue.Type.Put,Bytes.toBytes("5" kv8 = family,pi,current,KeyValue.Type.Put,Bytes.toBytes("1" HTable table = LoadIncrementalHFiles loader = loader.doBulkLoad(dir, table); 最後再附上查看hfile的方式,查詢正確的hfile和本身生成的hfile,方便查找問題。