HBASE學習筆記(一)

時間 2019-12-01

標籤 hbase 學習筆記欄目 Hadoop 简体版

原文原文鏈接

1、數據庫OLAP和OLTP簡單的介紹比較

　　1.OLTP:on-line transaction processing在線事務處理,應用在傳統關係型數據庫比較多，執行平常基本的事務處理，好比數據庫記錄的增刪改查，如銀行的一筆交易記錄，就是一個典型的事務處理，強調數據庫內存效率，強調內存各類指標的命令率，強調綁定變量，強調併發操做；OLTP有以這幾個特色：實時性要求高；數據量不是很大；交易通常是肯定的；高併發；知足ACIDjava

　　對於OLTP來講通常都是高可用的在線系統，以小的事務以及小的查詢爲主，評估其系統的時候，通常看其每秒執行的 Transaction 以及 Execute SQL 的數量。在這樣的系統中，單個數據庫每秒處理的 Transaction 每每超過幾百個，或者是幾千個， Select 語句的執行量每秒幾千甚至幾萬個。典型的 OLTP 系統有電子商務系統、銀行、證券等，如美國 eBay 的業務數據庫，就是很典型的 OLTP 數據庫。　　　　sql

　　OLTP 系統最容易出現瓶頸的地方就是 CPU 與磁盤子系統。shell

　　（1） CPU 出現瓶頸常表如今邏輯讀總量與計算性函數或者是過程上，邏輯讀總量等於單個語句的邏輯讀乘以執行次數，若是單個語句執行速度雖然很快，可是執行次數很是多，那麼，也可能會致使很大的邏輯讀總量。設計的方法與優化的方法就是減小單個語句的邏輯讀，或者是減小它們的執行次數。另外，一些計算型的函數，如自定義函數、 decode 等的頻繁使用，也會消耗大量的 CPU 時間，形成系統的負載升高，正確的設計方法或者是優化方法，須要儘可能避免計算過程，如保存計算結果到統計表就是一個好的方法。數據庫

　　（2）磁盤子系統在 OLTP 環境中，它的承載能力通常取決於它的 IOPS 處理能力. 由於在 OLTP 環境中，磁盤物理讀通常都是 db file sequential read，也就是單塊讀，可是這個讀的次數很是頻繁。若是頻繁到磁盤子系統都不能承載其 IOPS 的時候，就會出現大的性能問題。OLTP 比較經常使用的設計與優化方式爲 Cache 技術與 B-tree 索引技術， Cache 決定了不少語句不須要從磁盤子系統得到數據，因此， Web cache 與 Oracle data buffer 對 OLTP 系統是很重要的。另外，在索引使用方面，語句越簡單越好，這樣執行計劃也穩定，並且必定要使用綁定變量，減小語句解析，儘可能減小表關聯，儘可能減小分佈式事務，基本不使用分區技術、MV 技術、並行技術及位圖索引。由於併發量很高，批量更新時要分批快速提交，以免阻塞的發生。apache

　　OLTP 系統是一個數據塊變化很是頻繁， SQL 語句提交很是頻繁的系統。對於數據塊來講，應儘量讓數據塊保存在內存當中，對於 SQL 來講，儘量使用變量綁定技術來達到 SQL 重用，減小物理 I/O 和重複的 SQL 解析，從而極大的改善數據庫的性能。這裏影響性能除了綁定變量，還有多是熱快（hot block）。當一個塊被多個用戶同時讀取時， Oracle 爲了維護數據的一致性，須要使用 Latch 來串行化用戶的操做。當一個用戶得到了 latch 後，其餘用戶就只能等待，獲取這個數據塊的用戶越多，等待就越明顯。這就是熱快的問題。這種熱快多是數據塊，也多是回滾端塊。對於數據塊來說，一般是數據庫的數據分佈不均勻致使，若是是索引的數據塊，能夠考慮建立反向因此來達到從新分佈數據的目的，對於回滾段數據塊，能夠適當多增長几個回滾段來避免這種爭用。數組

　　2.OLAP:On-Line Analytical Processing在線事務分析，是數據倉庫系統的主要應用，支持複雜的分析操做，側重決策支持，而且提供直觀易懂的查詢結果。典型的應用就是複雜的動態的報表系統。OLAP有如下這幾個特色：實時性要求不是很高；數據量大；OLAP系統的重點是經過數據提供決策支持，因此查詢通常都是動態，自定義的。因此在OLAP中，維度的概念特別重要。通常會將用戶全部關心的維度數據，存入對應數據平臺。　　緩存

　　在線事務處理有的時候也叫 DSS 決策支持系統，就是咱們說的數據倉庫。在這樣的系統中，語句的執行量不是考覈標準，由於一條語句的執行時間可能會很是長，讀取的數據也很是多。因此，在這樣的系統中，考覈的標準每每是磁盤子系統的吞吐量（帶寬），如能達到多少 MB/s 的流量。磁盤子系統的吞吐量則每每取決於磁盤的個數，這個時候， Cache 基本是沒有效果的，數據庫的讀寫類型基本上是 db file scattered read 與 direct path read/write。應儘可能採用個數比較多的磁盤以及比較大的帶寬，如 4Gb 的光纖接口。在 OLAP 系統中，常使用分區技術、並行技術。服務器

　　1)分區技術在 OLAP 系統中的重要性主要體如今數據庫管理上，好比數據庫加載，能夠經過分區交換的方式實現，備份能夠經過備份分區表空間實現，刪除數據能夠經過分區進行刪除，至於分區在性能上的影響，它可使得一些大表的掃描變得很快（只掃描單個分區）。另外，若是分區結合並行的話，也可使得整個表的掃描會變得很快。總之，分區主要的功能是管理上的方便性，它並不能絕對保證查詢性能的提升，有時候分區會帶來性能上的提升，有時候會下降。並行技術除了與分區技術結合外，在 Oracle 10g 中，與 RAC 結合實現多節點的同時掃描，效果也很是不錯，可把一個任務，如 select 的全表掃描，平均地分派到多個 RAC 的節點上去。在 OLAP 系統中，不須要使用綁定（BIND）變量，由於整個系統的執行量很小，分析時間對於執行時間來講，能夠忽略，並且可避免出現錯誤的執行計劃。可是 OLAP 中能夠大量使用位圖索引，物化視圖，對於大的事務，儘可能尋求速度上的優化，沒有必要像 OLTP 要求快速提交，甚至要刻意減慢執行的速度。綁定變量真正的用途是在 OLTP 系統中，這個系統一般有這樣的特色，用戶併發數很大，用戶的請求十分密集，而且這些請求的 SQL 大多數是能夠重複使用的。對於 OLAP 系統來講，絕大多數時候數據庫上運行着的是報表做業，執行基本上是聚合類的 SQL 操做，好比 group by，這時候，把優化器模式設置爲 all_rows 是恰當的。而對於一些分頁操做比較多的網站類數據庫，設置爲 first_rows 會更好一些。但有時候對於OLAP 系統，咱們又有分頁的狀況下，咱們能夠考慮在每條 SQL 中用 hint。如：　併發

　　　Select /*+first_rows(10) */ a.* from table a;　app

　　2)分開設計與優化

　　在設計上要特別注意，如在高可用的 OLTP 環境中，不要盲目地把 OLAP 的技術拿過來用。如分區技術，假設不是大範圍地使用分區關鍵字，而採用其它的字段做爲 where 條件，那麼，若是是本地索引，將不得不掃描多個索引，而性能變得更爲低下。若是是全局索引，又失去分區的意義。並行技術也是如此，通常在完成大型任務時才使用，如在實際生活中，翻譯一本書，能夠先安排多我的，每一個人翻譯不一樣的章節，這樣能夠提升翻譯速度。若是隻是翻譯一頁書，也去分配不一樣的人翻譯不一樣的行，再組合起來，就不必了，由於在分配工做的時間裏，一我的或許早就翻譯完了。位圖索引也是同樣，若是用在 OLTP 環境中，很容易形成阻塞與死鎖。可是，在 OLAP環境中，可能會由於其特有的特性，提升 OLAP 的查詢速度。 MV 也是基本同樣，包括觸發器等，在 DML 頻繁的 OLTP 系統上，很容易成爲瓶頸，甚至是 Library Cache 等待，而在 OLAP 環境上，則可能會由於使用恰當而提升查詢速度。對於 OLAP 系統，在內存上可優化的餘地很小，增長 CPU 處理速度和磁盤 I/O 速度是最直接的提升數據庫性能的方法，固然這也意味着系統成本的增長。好比咱們要對幾億條或者幾十億條數據進行聚合處理，這種海量的數據，所有放在內存中操做是很難的，同時也沒有必要，由於這些數據快不多重用，緩存起來也沒有實際意義，並且還會形成物理 I/O 至關大。因此這種系統的瓶頸每每是磁盤 I/O 上面的。對於 OLAP 系統， SQL 的優化很是重要，由於它的數據量很大，作全表掃描和索引對性能上來講差別是很是大的。

　　3.對兩者的對比：

2、HBASE的特色：　　

　1.簡單介紹　hbase 是基於 Google BigTable 模型開發的，典型的 key/value 系統。是創建在 hdfs之上，提供高可靠性、高性能、列存儲、可伸縮、實時讀寫 nosql 的數據庫系統。主要用於海量結構化和半結構化數據存儲。它介於 nosql 和 RDBMS 之間，僅能經過主鍵(row key)和主鍵的 range 來檢索數據，僅支持單行事務(可經過 hive 支持來實現多表 join 等複雜操做)。Hbase 查詢數據功能很簡單，不支持 join 等複雜操做，不支持複雜的事務（行級的事務）與 hadoop 同樣， Hbase 目標主要依靠橫向擴展，經過不斷增長廉價的商用服務器，來增長計算和存儲能力。

　2.HBase 中的表的特色：　　

　　1）大：一個表能夠有上十億行，上百萬列　　

　　2）無模式：每行都有一個可排序的主鍵和任意多的列，列能夠根據須要動態的增長，同一張表中不一樣的行能夠有大相徑庭的列；　　

　　3）面向列:面向列(族)的存儲和權限控制，列(族)獨立檢索。　

　　4）稀疏:對於爲空(null)的列，並不佔用存儲空間，所以，表能夠設計的很是稀疏。　　

　　5）數據多版本：每一個單元中的數據能夠有多個版本，默認狀況下版本號自動分配，是單元格插入時的時間戳

　　6）數據類型單一： Hbase 中的數據都是字節數組 byte[]

　3.HBASE邏輯劃分：

　　HBase 以表的形式存儲數據。表有行和列組成。列劃分爲若干個列族(column family
　　

　　1）Row Key　　

　　　與 nosql 數據庫們同樣,row key 是用來檢索記錄的主鍵。訪問 hbase table 中的行，只有三種方式：　　

　　　1 經過單個 row key 訪問　　

　　　2 經過 row key 的 range　　

　　　3 全表掃描　　

　　1）Row key 行鍵 (Row key)

　　能夠是任意字符串(最大長度是 64KB，實際應用中長度通常爲 10-100bytes)，在 hbase 內部， row key 保存爲字節數組。Hbase 會對錶中的數據按照 rowkey 排序(字典順序)。存儲時，數據按照 Row key 的字典序(byte order)排序存儲。設計 key 時，要充分排序存儲這個特性，將常常一塊兒讀取的行存儲放到一塊兒。

　　(位置相關性)注意：字典序對 int 排序的結果1,10,100,11,12,13,14,15,16,17,18,19,2,20,21,…,9,91,92,93,94,95,96,97,98,99。要保持整形的天然序，行鍵必須用 0 做左填充。行的一次讀寫是原子操做 (不論一次讀寫多少列)。這個設計決策可以使用戶很容易的理解程序在對同一個行進行併發更新操做時的行爲。

　　2）列族hbase

　　表中的每一個列，都歸屬與某個列族。列族是表的 schema 的一部分(而列不是)，必須在使用表以前定義。列名都以列族做爲前綴。例如 courses:history ， courses:math 都屬於 courses 這個列族。訪問控制、磁盤和內存的使用統計都是在列族層面進行的。列族越多，在取一行數據時所要參與 IO、搜尋的文件就越多，因此，若是沒有必要，不要設置太多的列族。通常設置 2-3 個比較合理。　　

　　3)時間戳

　　HBase 中經過 row 和 columns 肯定的爲一個存貯單元稱爲 cell。每一個 cell 都保存着同一份數據的多個版本。版本經過時間戳來索引。時間戳的類型是 64 位整型。時間戳能夠由hbase(在數據寫入時自動 )賦值，此時時間戳是精確到毫秒的當前系統時間。時間戳也能夠由客戶顯式賦值。若是應用程序要避免數據版本衝突，就必須本身生成具備惟一性的時間戳。每一個 cell 中，不一樣版本的數據按照時間倒序排序，即最新的數據排在最前面。爲了不數據存在過多版本形成的的管理 (包括存貯和索引)負擔， hbase 提供了兩種數據版本回收方式： 保存數據的最後 n 個版本 保存最近一段時間內的版本（設置數據的生命週期 TTL）。　　

　　4)Cell:由{row key, column( =<family> + <label>), version} 惟一肯定的單元。cell 中的數據是沒有類型的，所有是字節碼形式存貯。

　4.HBASE集羣介紹：

　　組件介紹：

　　Client：包含訪問 Hbase 的接口，並維護 cache 來加快對 Hbase 的訪問，好比 region 的位置信息。

　　HMaster：是 hbase 集羣的主節點，能夠配置多個，用來實現 HA爲 RegionServer 分配 region負責 RegionServer 的負載均衡發現失效的 RegionServer 並從新分配其上的 region

　　RegionServer：Regionserver 維護 region，處理對這些 region 的 IO 請求；Regionserver 負責切分在運行過程當中變得過大的 region

　　Region:分佈式存儲的最小單元。

　Zookeeper 做用:
　　經過選舉，保證任什麼時候候，集羣中只有一個活着的 HMaster， HMaster 與 RegionServers 啓動時會向 ZooKeeper 註冊
　　存貯全部 Region 的尋址入口
　　實時監控 Region server 的上線和下線信息。並實時通知給 HMaster
　　存儲 HBase 的 schema 和 table 元數據
　　Zookeeper 的引入使得 HMaster 再也不是單點故障

3、HBASE shell 基本命令　　

　　1.$>hbase shell;　　//進入shell命令行

　　2$hbase>help　　　//幫助

　　3.$hbase>help 'create_namespace '　　//查看建立名字空間的幫助命令

　　3$hbase>list_namespace　　//列出名字空間

　　4$hbase>list_namespace_tables 'default' 　　//列出默認名字空間的表

　　5$hbase>list_namespace_tables 'hbase'　　//查看hbase名字空間下的表

　　6$hbase>create_namespace 'ns3';　　　　//建立名稱空間

　　7.$hbase>put 'ns4:t1','row1','f1:id',100;　　//向hbase中插入數據id

　　8.$hbase>put 'ns4:t1','row1','f1:name','tom'　　//向表中插入一行數據name

　　9.$hbase>put 'ns4:t1','row1','f1:age',23;　　//向表中插入一行數據age

　　10..$hbase>get 'ns4:t1','row1'　　//行查詢

　　11..$hbase>scan 'ns4:t1'　　　　//掃描表

　　12.$hbase>disable 'ns1:t1';　　//禁用表

　　13.$hbase>drop 'ns1:t1';　　　　//刪除表，記住在刪除表以前須要禁用表

　　14.$hbase>flush 'ns4:t1'　　　　//清理內存數據到磁盤中去

　　15.$hbase>scan 'hbase:meta'　　//查看元數據表

　　16.$hbase>split 'ns4:t1'　　　　　//切割表

　　17.$hbase>split 'regionName','splitKey'　　//對區域進行切割

　　18.$hbase>split 'ns4:t1,row5184,1531798673543.8990374fdac33a552623b6886bf57b7e.','//row8888'//按照給定的rowkey對region進行切割　　

　　19.$hbase>move '' move '0293961e341eabe080e63ca2fd0d09dd','s203,16020,1531457868571'　　//將region移動到另一個region上

　　20.$hbase>merge_region merge_region 　　　　　　　　　　'0293961e341eabe080e63ca2fd0d09dd','01cbc34b4048a2586ca171cf31046a30'//實現region的合併

　　21.$hbase>desc 'ns2:t1'　　　　//查看指定名稱空間下面的表

4、HBASE代碼開發

　　1.經過javaAPI訪問HBASE，部分進行源碼分析　　

　　　　1):建立HBASE模塊，添加依賴

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.it18zhang</groupId>
    <artifactId>HBaseDemo</artifactId>
    <version>1.0-SNAPSHOT</version>
    <dependencies>
        <dependency>
            <groupId>org.apache.hbase</groupId>
            <artifactId>hbase-client</artifactId>
            <version>1.2.6</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hbase</groupId>
            <artifactId>hbase-server</artifactId>
            <version>1.2.6</version>
        </dependency>
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.12</version>
        </dependency>
    </dependencies>
</project>

　　　2.經過API來測試CRUD

　　　　1.測試put()

public  void put() throws IOException {
        //建立配置對象Configuration
        Configuration conf = HBaseConfiguration.create();
        //經過鏈接工廠建立鏈接對象
       Connection conn =  ConnectionFactory.createConnection(conf);
        //經過鏈接查詢table對象
        TableName tname = TableName.valueOf("ns1:t1");
        //得到表
        Table table = conn.getTable(tname);
        //經過Bytes工具類建立字節數組，將字符串轉換成字節數組
        byte[] rowid = Bytes.toBytes("row3");
       //建立put對象
        Put put = new Put(rowid);
        //
        byte[] f1 = Bytes.toBytes("f1");
        byte[] id = Bytes.toBytes("id");
        byte[] value = Bytes.toBytes(101);
        put.addColumn(f1,id,value);
        //將數據傳進表中去,執行插入操做
        table.put(put);
    }


代碼解析：
首先來進行相關的配置：Configuration conf = HBaseConfiguration.create()，接下來看看hbaseConfiguration的源碼：

package org.apache.hadoop.hbase;

/**
 * Adds HBase configuration files to a Configuration
 */
@InterfaceAudience.Public
@InterfaceStability.Stable
public class HBaseConfiguration extends Configuration {
  private static final Log LOG = LogFactory.getLog(HBaseConfiguration.class);

  /**
   * Instantiating HBaseConfiguration() is deprecated. Please use
   * HBaseConfiguration#create() to construct a plain Configuration
   */
  @Deprecated
  public HBaseConfiguration() {
    //TODO:replace with private constructor, HBaseConfiguration should not extend Configuration
    super();
    addHbaseResources(this);
    LOG.warn("instantiating HBaseConfiguration() is deprecated. Please use"
        + " HBaseConfiguration#create() to construct a plain Configuration");
  }

您能夠看到，HBaseConfiguration 這個類是在org.apache.hadoop.hbase這個包下面，繼承自Configuration這個類，而Configuration這個類是實現了Writable和Comparable這兩個接口。

HBaseConfiguration()這個方法已通過期了，推薦使用create()方法。

public static Configuration create() {

  Configuration conf = new Configuration();
  // In case HBaseConfiguration is loaded from a different classloader than
  // Configuration, conf needs to be set with appropriate class loader to resolve
  // HBase resources.
  conf.setClassLoader(HBaseConfiguration.class.getClassLoader());
  return addHbaseResources(conf);
}

HBaseConfiguration是從不一樣於Configuration的類來進行加載的，conf須要設置適當的類加載器來解析hbase資源，最終返回addHbaseResources(conf)

public static Configuration addHbaseResources(Configuration conf) {

conf.addResource("hbase-default.xml");
  conf.addResource("hbase-site.xml");

  checkDefaultsVersion(conf);
  HeapMemorySizeUtil.checkForClusterFreeMemoryLimit(conf);
  return conf;
}
這個方法添加了hbase-default.xml(類庫自帶不須要進行配置)和hbase-site.xml這兩個配置文件，因此咱們須要從集羣中拷貝咱們配置的hbase-site.xml文件，

再次咱們來看：

       Connection conn =  ConnectionFactory.createConnection(conf);

A non-instantiable class that manages creation of {@link Connection}s.
* Managing the lifecycle of the {@link Connection}s to the cluster is the responsibility of
* the caller.
* From a {@link Connection}, {@link Table} implementations are retrieved
* with {@link Connection#getTable(TableName)}. Example:
* <pre>
* Connection connection = ConnectionFactory.createConnection(config);
* Table table = connection.getTable(TableName.valueOf("table1"));
* try {
*   // Use the table as needed, for a single operation and a single thread
* } finally {
*   table.close();
*   connection.close();
* }
ConnectionFactory這個類是一個不可被實例化的類，只能經過ConnectionFactory.createConnection()這種靜態調用的方法來調用來實現鏈接，而且在調用的時候會拋出異常
經過connection.getTable(TableName.valueOf("table1"))的方式來建立表，返回值類型是Table,這個類已經被封裝成對象了

接下來看看：

　　　　 byte[] rowid = Bytes.toBytes("row3");
       //建立put對象
        Put put = new Put(rowid);

/**
 * Create a Put operation for the specified row.
 * @param row row key
 */
public Put(byte [] row) {
  this(row, HConstants.LATEST_TIMESTAMP);
}
在Put構造中，參數的類型是字節數據，因此咱們能夠將內容轉換成字節數據,在HBase中有Bytes工具類能夠實現數據格式的轉換，而後再調用put()方法，

public Put addColumn(byte[] family, byte[] qualifier, long ts, byte[] value) {
    if(ts < 0L) {
        throw new IllegalArgumentException("Timestamp cannot be negative. ts=" + ts);
    } else {
        List list = this.getCellList(family);
        KeyValue kv = this.createPutKeyValue(family, qualifier, ts, value);
        list.add(kv);
        this.familyMap.put(CellUtil.cloneFamily(kv), list);
        return this;
    }
}
將列族，列，以及列的值添加進去

    @Test
    //查詢
    public void get() throws IOException {
        //建立conf對象
        Configuration conf  = HBaseConfiguration.create();
        //經過鏈接工廠建立對象
        Connection conn= ConnectionFactory.createConnection();
        //經過鏈接查詢tableName對象
        TableName tname = TableName.valueOf("ns1:t1");
        //得到table
        Table table =conn.getTable(tname);
        //經過bytes工具建立字節數組
       byte[] rowid =  Bytes.toBytes("row3");
        Get get = new Get(Bytes.toBytes("row3"));
        Result r = table.get(get);
        byte[] idvalue = r.getValue(Bytes.toBytes("f1"),Bytes.toBytes("id"));
        System.out.println(Bytes.toInt(idvalue));//將字節數組轉換成整形值進行輸出
    }

}

首先來看看：

　　Get get = new Get();

/**
 * Create a Get operation for the specified row.
 * <p>
 * If no further operations are done, this will get the latest version of
 * all columns in all families of the specified row.
 * @param row row key
 */
public Get(byte [] row) {
  Mutation.checkRow(row);
  this.row = row;
}
爲指定的列建立一個Get操做，若是以前的操做已經所有完成，那麼這個操做會得到指定行的全部列族的全部列的最新版本的值，因此咱們這個只須要指定行值爲row3就能夠了

接下來：

        Result r = table.get(get);

**
 * Extracts certain cells from a given row.
 * @param get The object that specifies what data to fetch and from which row.
 * @return The data coming from the specified row, if it exists.  If the row
 * specified doesn't exist, the {@link Result} instance returned won't
 * contain any {@link org.apache.hadoop.hbase.KeyValue}, as indicated by {@link Result#isEmpty()}.
 * @throws IOException if a remote or network exception occurs.
 * @since 0.20.0
 */
Result get(Get get) throws IOException;
這個方法是從給定行彙總抽取特定的單元格，參數get是從指定行獲取的數據，若是數據存在，就會返回指定行的數據

最後將取出的字節數據轉換成整形值進行輸出

　　3)經過JAVA API實現hbase數據庫的百萬數據插入

@Test
    public void testBigInsert() throws Exception {
        DecimalFormat format = new DecimalFormat();
        format.applyPattern("0000000");
        long start =System.currentTimeMillis();
        //建立配置對象
        Configuration conf = HBaseConfiguration.create();
        //建立鏈接對象
        Connection conn = ConnectionFactory.createConnection(conf);
        TableName tname = TableName.valueOf("ns4:t1");
        HTable table =(HTable) conn.getTable(tname);//獲取表
        table.setAutoFlush(false);
        for(int i = 0 ;i<10000;i++){
            //向put中添加row key
            Put put = new Put(Bytes.toBytes("row"+format.format(i)));//在設置rowid的時候要進行格式化
            //在進行數據插入以前要關閉寫前日誌
            put.setWriteToWAL(false);
            //向列族中來添加id這個列
            put.addColumn(Bytes.toBytes("f1"),Bytes.toBytes("id"),Bytes.toBytes(i));
            //向表中來添加name這個列
            put.addColumn(Bytes.toBytes("f1"),Bytes.toBytes("name"),Bytes.toBytes("tom"+i));
            //向列族中添加age這個列
            put.addColumn(Bytes.toBytes("f1"),Bytes.toBytes("age"),Bytes.toBytes(i%100));
            table.put(put);
            if(i%2000==0){
                table.flushCommits();//數據滿2000的時候清理一次緩衝區
            }

        }
        System.out.println(System.currentTimeMillis()-start);
        table.flushCommits();//數據滿2000的時候清理一次緩衝區
    }

　　4)經過java-API實現namespace的創建，disable，刪除，建表，刪除表，掃描等操做等操做

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.*;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;
import org.junit.Test;

import java.io.IOException;
import java.text.DecimalFormat;
import java.util.Iterator;

/**
 * Created by Administrator on 2018/7/17 0017.
 */
public class HBaseTest1 {
    //本方法實現測試hbase名字空間的創建
    @Test
    public void testCreateNameSpace() throws Exception {
        //配置conf對象
        Configuration conf = HBaseConfiguration.create();
        //建立鏈接對象
        Connection conn = ConnectionFactory.createConnection(conf);
        NamespaceDescriptor nd = NamespaceDescriptor.create("ns3").build();
        Admin admin = conn.getAdmin();
        admin.createNamespace(nd);
    }

    //本方法實現測試刪除名字空間
    @Test
    public void testDropNameSpace() throws Exception {
        Configuration conf = HBaseConfiguration.create();
        Connection conn = ConnectionFactory.createConnection(conf);
        Admin admin = conn.getAdmin();
        admin.deleteNamespace("ns3");
    }

    //本方式實現遍歷名字空間
    @Test
    public void testReadNameSpace() throws Exception {
        Configuration conf = HBaseConfiguration.create();
        Connection conn = ConnectionFactory.createConnection(conf);
        Admin admin = conn.getAdmin();
        NamespaceDescriptor[] nsd = admin.listNamespaceDescriptors();
        for (NamespaceDescriptor n : nsd) {
            System.out.println(n.getName());
        }
    }

    //本方法試下在名字空間中建立表
    @Test
    public void testCreateTable() throws Exception {
        Configuration conf = HBaseConfiguration.create();
        Connection conn = ConnectionFactory.createConnection(conf);
        Admin admin = conn.getAdmin();
        TableName tname = TableName.valueOf("ns3:t1");
        HTableDescriptor htd = new HTableDescriptor(tname);
        HColumnDescriptor hcd = new HColumnDescriptor("f1");
        htd.addFamily(hcd);
        admin.createTable(htd);
    }

    //本方法實現表的插入
    @Test
    public void testPut() throws Exception {
        //配置configuration
        Configuration conf = HBaseConfiguration.create();
        //配置Connection
        Connection conn = ConnectionFactory.createConnection(conf);
        //
        TableName tname = TableName.valueOf("ns3:t1");
        Table table = conn.getTable(tname);
        //
        Put put = new Put(Bytes.toBytes("row1"));
        put.addColumn(Bytes.toBytes("f1"), Bytes.toBytes("id"), Bytes.toBytes(1));
        put.addColumn(Bytes.toBytes("f1"), Bytes.toBytes("name"), Bytes.toBytes("tom"));
        put.addColumn(Bytes.toBytes("f1"), Bytes.toBytes("age"), Bytes.toBytes(13));
        table.put(put);
    }
    //本方法實現table元素的查詢
    @Test
    public void testRead() throws Exception {
        Configuration conf = HBaseConfiguration.create();
        Connection conn =ConnectionFactory.createConnection(conf);
        TableName tname =TableName.valueOf("ns3:t1");
        Table table = conn.getTable(tname);
        Get get = new Get("row1".getBytes());
        Result r = table.get(get);
        byte[] idarr = r.getValue(Bytes.toBytes("f1"),Bytes.toBytes("id"));
        System.out.println(Bytes.toInt(idarr));
    }
    //本方法實現對錶的禁用
    @Test
    public  void testDisable() throws Exception {
        Configuration conf = HBaseConfiguration.create();
        Connection conn = ConnectionFactory.createConnection(conf);
        Admin admin =conn.getAdmin();
        admin.disableTables("ns3:t1");
    }
    //本方法實現對錶的刪除操做
    @Test
    public void testDroptable() throws Exception {
        Configuration conf = HBaseConfiguration.create();
        Connection conn = ConnectionFactory.createConnection(conf);
        Admin admin = conn.getAdmin();
        admin.disableTables("ns3:t1");
        TableName tname = TableName.valueOf("ns3:t1");
        admin.deleteTable(tname);
    }
    //本方法實現百萬數據插入
    @Test
    public void testBigDataInsert() throws Exception {
        DecimalFormat format = new DecimalFormat();
        format.applyPattern("0000");
        Configuration conf = HBaseConfiguration.create();
        Connection conn= ConnectionFactory.createConnection(conf);
        TableName tname = TableName.valueOf("ns3:t1");
        HTable table = (HTable)conn.getTable(tname);
        table.setAutoFlush(false);
        for(int i = 1 ; i <10000;i++){
            //向put中添加rowkey
            Put put = new Put(Bytes.toBytes("row"+format.format(i)));
            //在數據寫入以前關閉寫前日誌
            put.setWriteToWAL(false);
            put.addColumn(Bytes.toBytes("f1"),Bytes.toBytes("id"),Bytes.toBytes(i));
            put.addColumn(Bytes.toBytes("f1"),Bytes.toBytes("name"),Bytes.toBytes("tom"+i));
            put.addColumn(Bytes.toBytes("f1"),Bytes.toBytes("id"),Bytes.toBytes(i%100));
            table.put(put);
            if(i%2000==0){
                table.flushCommits();
            }

        }
        table.flushCommits();
    }
    //本方法實現測試scan掃描,掃描的方式是前包後不包的狀況
    @Test
    public void testScan() throws Exception {
        Configuration conf =HBaseConfiguration.create();
        Connection conn = ConnectionFactory.createConnection(conf);
        TableName tname = TableName.valueOf("ns3:t1");
        Table table =conn.getTable(tname);
        Scan scan =new Scan();
        scan.setStartRow(Bytes.toBytes("row5000"));
        scan.setStopRow(Bytes.toBytes("row8888"));
        ResultScanner rs = table.getScanner(scan);
        Iterator<Result> it=rs.iterator();
        while (it.hasNext()){
            Result r =  it.next();
            System.out.println(Bytes.toString(r.getValue(Bytes.toBytes("f1"),Bytes.toBytes("name"))));
        }
    }
}

 //本方法實現指定版本數來進行查詢    @Test    public void getWithVersions() throws IOException {        Configuration conf = HBaseConfiguration.create();        Connection conn = ConnectionFactory.createConnection(conf);        TableName tname = TableName.valueOf("ns2:t1");        Table table = conn.getTable(tname);        Get get = new Get(Bytes.toBytes("00001"));        //檢索全部版本        get.setMaxVersions();        Result r = table.get(get);        List<Cell> list =r.getColumnCells(Bytes.toBytes("f1"),Bytes.toBytes("name"));        for(Cell c :list){            String f = Bytes.toString(c.getFamily());            String col = Bytes.toString(c.getQualifier());            long time =c.getTimestamp();//獲取時間戳            String val = Bytes.toString(c.getValue());            System.out.println(f+"/"+col+"/"+time+" = "+val);        }}

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。