前言:上篇文章HBase Filter 過濾器概述對HBase過濾器的組成及其家譜進行簡單介紹,本篇文章主要對HBase過濾器之比較器做一個補充介紹,也算是HBase Filter學習的必備低階魂技吧。本篇文中源碼基於HBase 1.1.2.2.6.5.0-292 HDP版本。正則表達式
HBase全部的比較器實現類都繼承於父類ByteArrayComparable,而ByteArrayComparable又實現了Comparable接口;不一樣功能的比較器差異在於對父類compareTo()方法的重寫邏輯不一樣。數組
下面分別對HBase Filter默認實現的七大比較器一一進行介紹。微信
介紹:二進制比較器,用於按字典順序比較指定字節數組。less
先看一個小例子:函數
public class BinaryComparatorDemo { public static void main(String[] args) { BinaryComparator bc = new BinaryComparator(Bytes.toBytes("bbb")); int code1 = bc.compareTo(Bytes.toBytes("bbb"), 0, 3); System.out.println(code1); // 0 int code2 = bc.compareTo(Bytes.toBytes("aaa"), 0, 3); System.out.println(code2); // 1 int code3 = bc.compareTo(Bytes.toBytes("ccc"), 0, 3); System.out.println(code3); // -1 int code4 = bc.compareTo(Bytes.toBytes("bbf"), 0, 3); System.out.println(code4); // -4 int code5 = bc.compareTo(Bytes.toBytes("bbbedf"), 0, 6); System.out.println(code5); // -3 } }
不難看出,該比較器的比較規則以下:學習
看一下以上規則對應其compareTo()方法的源碼實現:
實現一:優化
static enum UnsafeComparer implements Bytes.Comparer<byte[]> { INSTANCE; .... public int compareTo(byte[] buffer1, int offset1, int length1, byte[] buffer2, int offset2, int length2) { if (buffer1 == buffer2 && offset1 == offset2 && length1 == length2) { return 0; } else { int minLength = Math.min(length1, length2); int minWords = minLength / 8; long offset1Adj = (long)(offset1 + BYTE_ARRAY_BASE_OFFSET); long offset2Adj = (long)(offset2 + BYTE_ARRAY_BASE_OFFSET); int j = minWords << 3; int offset; for(offset = 0; offset < j; offset += 8) { long lw = theUnsafe.getLong(buffer1, offset1Adj + (long)offset); long rw = theUnsafe.getLong(buffer2, offset2Adj + (long)offset); long diff = lw ^ rw; if (diff != 0L) { return lessThanUnsignedLong(lw, rw) ? -1 : 1; } } offset = j; int b; int a; if (minLength - j >= 4) { a = theUnsafe.getInt(buffer1, offset1Adj + (long)j); b = theUnsafe.getInt(buffer2, offset2Adj + (long)j); if (a != b) { return lessThanUnsignedInt(a, b) ? -1 : 1; } offset = j + 4; } if (minLength - offset >= 2) { short sl = theUnsafe.getShort(buffer1, offset1Adj + (long)offset); short sr = theUnsafe.getShort(buffer2, offset2Adj + (long)offset); if (sl != sr) { return lessThanUnsignedShort(sl, sr) ? -1 : 1; } offset += 2; } if (minLength - offset == 1) { a = buffer1[offset1 + offset] & 255; b = buffer2[offset2 + offset] & 255; if (a != b) { return a - b; } } return length1 - length2; } }
實現二:this
static enum PureJavaComparer implements Bytes.Comparer<byte[]> { INSTANCE; private PureJavaComparer() { } public int compareTo(byte[] buffer1, int offset1, int length1, byte[] buffer2, int offset2, int length2) { if (buffer1 == buffer2 && offset1 == offset2 && length1 == length2) { return 0; } else { int end1 = offset1 + length1; int end2 = offset2 + length2; int i = offset1; for(int j = offset2; i < end1 && j < end2; ++j) { int a = buffer1[i] & 255; int b = buffer2[j] & 255; if (a != b) { return a - b; } ++i; } return length1 - length2; } } }
實現一是對實現二的一個優化,都引自Bytes類,HBase優先執行實現一方案,若是有異常再執行實現二方案。以下:code
public static int compareTo(byte[] buffer1, int offset1, int length1, byte[] buffer2, int offset2, int length2) { return Bytes.LexicographicalComparerHolder.BEST_COMPARER.compareTo(buffer1, offset1, length1, buffer2, offset2, length2); } ... ... static final String UNSAFE_COMPARER_NAME = Bytes.LexicographicalComparerHolder.class.getName() + "$UnsafeComparer"; static final Bytes.Comparer<byte[]> BEST_COMPARER = getBestComparer(); static Bytes.Comparer<byte[]> getBestComparer() { try { Class<?> theClass = Class.forName(UNSAFE_COMPARER_NAME); Bytes.Comparer<byte[]> comparer = (Bytes.Comparer)theClass.getEnumConstants()[0]; return comparer; } catch (Throwable var2) { return Bytes.lexicographicalComparerJavaImpl(); } }
介紹:二進制比較器,只比較前綴是否與指定字節數組相同。繼承
先看一個小例子:
public class BinaryPrefixComparatorDemo { public static void main(String[] args) { BinaryPrefixComparator bc = new BinaryPrefixComparator(Bytes.toBytes("b")); int code1 = bc.compareTo(Bytes.toBytes("bbb"), 0, 3); System.out.println(code1); // 0 int code2 = bc.compareTo(Bytes.toBytes("aaa"), 0, 3); System.out.println(code2); // 1 int code3 = bc.compareTo(Bytes.toBytes("ccc"), 0, 3); System.out.println(code3); // -1 int code4 = bc.compareTo(Bytes.toBytes("bbf"), 0, 3); System.out.println(code4); // 0 int code5 = bc.compareTo(Bytes.toBytes("bbbedf"), 0, 6); System.out.println(code5); // 0 int code6 = bc.compareTo(Bytes.toBytes("ebbedf"), 0, 6); System.out.println(code6); // -3 } }
該比較器只是基於BinaryComparator比較器稍做更改而已,如下代碼一目瞭然:
public int compareTo(byte[] value, int offset, int length) { return Bytes.compareTo(this.value, 0, this.value.length, value, offset, this.value.length <= length ? this.value.length : length); }
看一下同BinaryComparator方法的異同:
public int compareTo(byte[] value, int offset, int length) { return Bytes.compareTo(this.value, 0, this.value.length, value, offset, length); }
區別只在於最後一個傳參,即length=min(this.value.length,value.length),取小。這樣在後面的字節逐位比較時,即只需比較min length次。
介紹:位比價器,經過BitwiseOp提供的AND(與)、OR(或)、NOT(非)進行比較。返回結果要麼爲1要麼爲0,僅支持 EQUAL 和非 EQUAL。
先看一個小例子:
public class BitComparatorDemo { public static void main(String[] args) { // 長度相同按位或比較:由低位起逐位比較,每一位按位或比較都爲0,則返回1,不然返回0。 BitComparator bc1 = new BitComparator(new byte[]{0,0,0,0}, BitComparator.BitwiseOp.OR); int i = bc1.compareTo(new byte[]{0,0,0,0}, 0, 4); System.out.println(i); // 1 // 長度相同按位與比較:由低位起逐位比較,每一位按位與比較都爲0,則返回1,不然返回0。 BitComparator bc2 = new BitComparator(new byte[]{1,0,1,0}, BitComparator.BitwiseOp.AND); int j = bc2.compareTo(new byte[]{0,1,0,1}, 0, 4); System.out.println(j); // 1 // 長度相同按位異或比較:由低位起逐位比較,每一位按位異或比較都爲0,則返回1,不然返回0。 BitComparator bc3 = new BitComparator(new byte[]{1,0,1,0}, BitComparator.BitwiseOp.XOR); int x = bc3.compareTo(new byte[]{1,0,1,0}, 0, 4); System.out.println(x); // 1 // 長度不一樣,返回1,不然按位比較 BitComparator bc4 = new BitComparator(new byte[]{1,0,1,0}, BitComparator.BitwiseOp.XOR); int y = bc4.compareTo(new byte[]{1,0,1}, 0, 3); System.out.println(y); // 1 } }
上述註釋闡述的規則,對應如下代碼:
···
public int compareTo(byte[] value, int offset, int length) {
if (length != this.value.length) {
return 1;
} else {
int b = 0;
for(int i = length - 1; i >= 0 && b == 0; --i) { switch(this.bitOperator) { case AND: b = this.value[i] & value[i + offset] & 255; break; case OR: b = (this.value[i] | value[i + offset]) & 255; break; case XOR: b = (this.value[i] ^ value[i + offset]) & 255; } } return b == 0 ? 1 : 0; }
}
···
核心思想就是:由低位起逐位比較,直到b!=0退出循環。
介紹:Long 型專用比較器,返回值:0 -1 1。上篇概述沒有提到,這裏補上。
先看一個小例子:
public class LongComparatorDemo { public static void main(String[] args) { LongComparator longComparator = new LongComparator(1000L); int i = longComparator.compareTo(Bytes.toBytes(1000L), 0, 8); System.out.println(i); // 0 int i2 = longComparator.compareTo(Bytes.toBytes(1001L), 0, 8); System.out.println(i2); // -1 int i3 = longComparator.compareTo(Bytes.toBytes(998L), 0, 8); System.out.println(i3); // 1 } }
這個比較器實現至關簡單,很少說了,以下:
public int compareTo(byte[] value, int offset, int length) { Long that = Bytes.toLong(value, offset, length); return this.longValue.compareTo(that); }
介紹:控制比較式,判斷當前值是否是爲null。是null返回0,不是null返回1,僅支持 EQUAL 和非 EQUAL。
先看一個小例子:
public class NullComparatorDemo { public static void main(String[] args) { NullComparator nc = new NullComparator(); int i1 = nc.compareTo(Bytes.toBytes("abc")); int i2 = nc.compareTo(Bytes.toBytes("")); int i3 = nc.compareTo(null); System.out.println(i1); // 1 System.out.println(i2); // 1 System.out.println(i3); // 0 } }
這個比較器實現至關簡單,很少說了,以下:
public int compareTo(byte[] value) { return value != null ? 1 : 0; }
介紹:提供一個正則的比較器,支持正則表達式的值比較,僅支持 EQUAL 和非 EQUAL。匹配成功返回0,匹配失敗返回1。
先看一個小例子:
public class RegexStringComparatorDemo { public static void main(String[] args) { RegexStringComparator rsc = new RegexStringComparator("abc"); int abc = rsc.compareTo(Bytes.toBytes("abcd"), 0, 3); System.out.println(abc); // 0 int bcd = rsc.compareTo(Bytes.toBytes("bcd"), 0, 3); System.out.println(bcd); // 1 String check = "^([a-z0-9A-Z]+[-|\\.]?)+[a-z0-9A-Z]@([a-z0-9A-Z]+(-[a-z0-9A-Z]+)?\\.)+[a-zA-Z]{2,}$"; RegexStringComparator rsc2 = new RegexStringComparator(check); int code = rsc2.compareTo(Bytes.toBytes("zpb@163.com"), 0, "zpb@163.com".length()); System.out.println(code); // 0 int code2 = rsc2.compareTo(Bytes.toBytes("zpb#163.com"), 0, "zpb#163.com".length()); System.out.println(code2); // 1 } }
其compareTo()方法有兩種引擎實現,對應兩套正則匹配規則,分別是JAVA版和JONI版(面向JRuby),默認爲RegexStringComparator.EngineType.JAVA。以下:
public int compareTo(byte[] value, int offset, int length) { return this.engine.compareTo(value, offset, length); } public static enum EngineType { JAVA, JONI; private EngineType() { } }
具體實現都很簡單,都是調用正則語法匹配。如下是JAVA EngineType 實現:
public int compareTo(byte[] value, int offset, int length) { String tmp; if (length < value.length / 2) { tmp = new String(Arrays.copyOfRange(value, offset, offset + length), this.charset); } else { tmp = new String(value, offset, length, this.charset); } return this.pattern.matcher(tmp).find() ? 0 : 1; }
JONI EngineType 實現:
public int compareTo(byte[] value, int offset, int length) { Matcher m = this.pattern.matcher(value); return m.search(offset, length, this.pattern.getOptions()) < 0 ? 1 : 0; }
都很容易理解,很少說了。
介紹:判斷提供的子串是否出如今value中,而且不區分大小寫。包含字串返回0,不包含返回1,僅支持 EQUAL 和非 EQUAL。
先看一個小例子:
public class SubstringComparatorDemo { public static void main(String[] args) { String value = "aslfjllkabcxxljsl"; SubstringComparator sc = new SubstringComparator("abc"); int i = sc.compareTo(Bytes.toBytes(value), 0, value.length()); System.out.println(i); // 0 SubstringComparator sc2 = new SubstringComparator("abd"); int i2 = sc2.compareTo(Bytes.toBytes(value), 0, value.length()); System.out.println(i2); // 1 SubstringComparator sc3 = new SubstringComparator("ABC"); int i3 = sc3.compareTo(Bytes.toBytes(value), 0, value.length()); System.out.println(i3); // 0 } }
這個比較器實現也至關簡單,很少說了,以下:
public int compareTo(byte[] value, int offset, int length) { return Bytes.toString(value, offset, length).toLowerCase().contains(this.substr) ? 0 : 1; }
到此,七種比較器就介紹完了。若是對源碼不敢興趣,也建議必定要看看文中的小例子,熟悉下每種比較器的構造函數及結果輸出。後續在使用HBase過濾器的過程當中,會常常用到。固然除了這七種比較器,你們也能夠自定義比較器。
轉載請註明出處!歡迎關注本人微信公衆號【HBase工做筆記】