TCompactProtocol協議做爲TBinaryProtocol協議的升級強化版,都做爲二進制編碼傳輸方式,採用了一種樂器MIDI文件的編碼方法(wiki,百度下),簡單介紹下兩種思想:html
1: ZigZag有符號數編碼,如表格所示:java
編碼前 | 編碼後 |
0 | 0 |
-1 | 1 |
1 | 2 |
-2 | 3 |
2 | 4 |
-3 | 5 |
其效果等效於正數等於原先 * 2,負數變正數。apache
32bits int = (i << 1) ^ (i >> 31), 64bits long = (l << 1) ^ (l >> 63)數組
2:VLQ(variable-length quantity)編碼:ide
即一字節的最高位(MHB)爲標誌位,不參與具體的內容,意思數值的大小僅僅有其它七位來表示。當最高位bit爲1時,表示下一個byte也是該數值的內容(下一個byte的低七位bits);當最高位bit爲0時,下一個byte不參與其中。經過這樣的方式,而不是int固定的4個bytes,long 8個bytes來說,對於小數,能節約很多的空間大小;但凡事有利有弊,當數值比較大時,就要佔用更多的空間,例如較大的int ,須要5bytes,較大的long須要10bytes.oop
二者的結合 :源碼分析
當VLQ編碼遇到負數時,例如:long -1; 0XFFFFFFFFFFFFFFFF,就須要10bytes了,經過和ZigZag的結合,吧負數轉變相應的正數。當正數,負數的 |數值|較小時,均可以經過二者的結合,有效的壓縮佔用的空間大小。但同上,數值較大不可避免的佔用比日常正常編碼更多的空間。優化
源碼分析: this
首先來看一下int32,long64的ZigZag編碼:編碼
private long longToZigzag(long l) { return (l << 1) ^ (l >> 63); } /** * Convert n into a zigzag int. This allows negative numbers to be * represented compactly as a varint. */ private int intToZigZag(int n) { return (n << 1) ^ (n >> 31);//正數 n << 1 擴大兩倍 , n >> 31 = 0 , ^ 0 不變 ,2 * n ; }
再看看int32,long64的varint寫法:
byte[] i32buf = new byte[5]; //int32 最大須要5個字節 private void writeVarint32(int n) throws TException { int idx = 0; //index flag while (true) { if ((n & ~0x7F) == 0) { // if (n <= 2^7) 1byte i32buf[idx++] = (byte)n; // writeByteDirect((byte)n); break; // return; } else { i32buf[idx++] = (byte)((n & 0x7F) | 0x80); 、//else if(n > 2^ 7) 按小端方式給byte第八位貼上1標籤,存放在buf。 // writeByteDirect((byte)((n & 0x7F) | 0x80)); n >>>= 7; //邏輯右移7bit,再次判斷,loop } } trans_.write(i32buf, 0, idx); //吧buf寫入傳輸層 } /** * Write an i64 as a varint. Results in 1-10 bytes on the wire. */ byte[] varint64out = new byte[10];//最大須要10bytes private void writeVarint64(long n) throws TException { int idx = 0; while (true) { if ((n & ~0x7FL) == 0) { //注意這邊的 ~0x7FL(不能寫成0x7F) varint64out[idx++] = (byte)n; break; } else { varint64out[idx++] = ((byte)((n & 0x7F) | 0x80)); n >>>= 7; } } trans_.write(varint64out, 0, idx); }
上面註解說明了varint的系統操做,預分配最大字節buffer,而後按照小端方式寫入VLQ編碼後實際內容。再來看看系統是怎麼結合二者的:
public void writeI32(int i32) throws TException { writeVarint32(intToZigZag(i32)); //先調intToZigZag轉換,在write VLQ。 } /** * Write an i64 as a zigzag varint. */ public void writeI64(long i64) throws TException { writeVarint64(longToZigzag(i64)); } public void writeI16(short i16) throws TException { //i16先按int32 zigzag編碼轉換 而後按VLQ轉換 writeVarint32(intToZigZag(i16)); }
咱們先系統的看一下TCompactProtocol按什麼方法寫入Thrift內部數據類型的,而後再看message的寫法,一下是thrift內部數據類型,i16,i32,i64已經看完,在來看看別的:
private static class Types { public static final byte BOOLEAN_TRUE = 0x01; public static final byte BOOLEAN_FALSE = 0x02; public static final byte BYTE = 0x03; public static final byte I16 = 0x04; public static final byte I32 = 0x05; public static final byte I64 = 0x06; public static final byte DOUBLE = 0x07; public static final byte BINARY = 0x08; public static final byte LIST = 0x09; public static final byte SET = 0x0A; public static final byte MAP = 0x0B; public static final byte STRUCT = 0x0C; }
boolean:
public void writeBool(boolean b) throws TException { if (booleanField_ != null) { // we haven't written the field header yet writeFieldBeginInternal(booleanField_, b ? Types.BOOLEAN_TRUE : Types.BOOLEAN_FALSE); booleanField_ = null; } else { // we're not part of a field, so just write the value. writeByteDirect(b ? Types.BOOLEAN_TRUE : Types.BOOLEAN_FALSE);//按照上面對應的boolean_yes,boolean_no字節值寫入。 } }
TCompactProtocol寫入Boolean分兩種狀況,1:該boolean值爲TStruct中的內部成員時TField時,得寫入header數據(即內容和數據類型壓縮在一塊兒寫);2 :若是不爲TField內部類型的話,直接按byte寫入。關於TStruct和TField的細節請參照上篇。
具體tstruct寫入,稍後分析。
byte:
public void writeByte(byte b) throws TException { writeByteDirect(b);//one byte 直接寫入。 }
private byte[] byteDirectBuffer = new byte[1]; private void writeByteDirect(byte b) throws TException { byteDirectBuffer[0] = b; trans_.write(byteDirectBuffer); }
double:
public void writeDouble(double dub) throws TException { byte[] data = new byte[]{0, 0, 0, 0, 0, 0, 0, 0}; //8個字節 fixedLongToBytes(Double.doubleToLongBits(dub), data, 0); //double 轉long bit 分佈,而後按照fix64編碼傳輸。 trans_.write(data); }
private void fixedLongToBytes(long n, byte[] buf, int off) { buf[off+0] = (byte)( n & 0xff); buf[off+1] = (byte)((n >> 8 ) & 0xff); buf[off+2] = (byte)((n >> 16) & 0xff); buf[off+3] = (byte)((n >> 24) & 0xff); buf[off+4] = (byte)((n >> 32) & 0xff); buf[off+5] = (byte)((n >> 40) & 0xff); buf[off+6] = (byte)((n >> 48) & 0xff); buf[off+7] = (byte)((n >> 56) & 0xff); }
能夠看出double類型,先按Double.doubletoLongBits()轉換後,按照fixed64編碼寫入(8字節小端寫入),如上。
bytearray:
public void writeBinary(ByteBuffer bin) throws TException { int length = bin.limit() - bin.position();//計算數據len writeBinary(bin.array(), bin.position() + bin.arrayOffset(), length); }
private void writeBinary(byte[] buf, int offset, int length) throws TException { writeVarint32(length); //按VLQ編碼寫入len值,這裏沒有使用zigzag編碼(zigzag編碼主要解決負數VLQ編碼佔用大空間的狀況,這裏len不爲負,直接VLQ寫入) trans_.write(buf, offset, length);//寫入實際內buff中內容 }
string:
public void writeString(String str) throws TException { try { byte[] bytes = str.getBytes("UTF-8");//utf-8編碼,獲得字節數組 writeBinary(bytes, 0, bytes.length);//抵用writeBinary,see 上面 } catch (UnsupportedEncodingException e) { throw new TException("UTF-8 not supported!"); } }
容器類型:
SetTag:
public void writeSetBegin(TSet set) throws TException { writeCollectionBegin(set.elemType, set.size);//set類型,長度值 }
type byte:
public final class TType { public static final byte STOP = 0; public static final byte VOID = 1;//java中沒有這種類型,這裏存在只是爲了別的語言,可能 public static final byte BOOL = 2; public static final byte BYTE = 3; public static final byte DOUBLE = 4; public static final byte I16 = 6; public static final byte I32 = 8; public static final byte I64 = 10; public static final byte STRING = 11; public static final byte STRUCT = 12; public static final byte MAP = 13; public static final byte SET = 14; public static final byte LIST = 15; public static final byte ENUM = 16;//低下static {}中,該類型也沒用到。 因此4bits 夠用了 }
protected void writeCollectionBegin(byte elemType, int size) throws TException { if (size <= 14) { // 1110 writeByteDirect(size << 4 | getCompactType(elemType));//size <= 14時,size << 4 | 對應的TTyte,壓縮從一個byte寫入。 } else { writeByteDirect(0xf0 | getCompactType(elemType));// 1111 0000| ttype ,按one byte寫入 writeVarint32(size);// VLQ編碼寫入len } }
getCompactType(xx):
private byte getCompactType(byte ttype) { return ttypeToCompactType[ttype]; }
static { ttypeToCompactType[TType.STOP] = TType.STOP; ttypeToCompactType[TType.BOOL] = Types.BOOLEAN_TRUE; ttypeToCompactType[TType.BYTE] = Types.BYTE; ttypeToCompactType[TType.I16] = Types.I16; ttypeToCompactType[TType.I32] = Types.I32; ttypeToCompactType[TType.I64] = Types.I64; ttypeToCompactType[TType.DOUBLE] = Types.DOUBLE; ttypeToCompactType[TType.STRING] = Types.BINARY; ttypeToCompactType[TType.LIST] = Types.LIST; ttypeToCompactType[TType.SET] = Types.SET; ttypeToCompactType[TType.MAP] = Types.MAP; ttypeToCompactType[TType.STRUCT] = Types.STRUCT; }
public void writeListEnd() throws TException {} //no-op 空操做,走個形式而已
list tag:
public void writeListBegin(TList list) throws TException { writeCollectionBegin(list.elemType, list.size); }
public void writeListEnd() throws TException {}
同上,就不重複了。
map tag:
public void writeMapBegin(TMap map) throws TException { if (map.size == 0) {//size == 0 writeByteDirect(0); //直接寫入one byte 0完事。 } else { writeVarint32(map.size); //VLQ寫入長度 writeByteDirect(getCompactType(map.keyType) << 4 | getCompactType(map.valueType)); //one byte 寫入 keyType(TType),valueType(TType) (keyType << 4 | valueType) 與avro的map不一樣,其key } //type只能爲string類型。 }
wirteMapEnd()也是no-op操做就不貼了。
介紹完內置類型的寫入方式,能夠介紹寫message了。
public void writeMessageBegin(TMessage message) throws TException { writeByteDirect(PROTOCOL_ID); // 1000 0010 one byte protocol_id writeByteDirect((VERSION & VERSION_MASK) | ((message.type << TYPE_SHIFT_AMOUNT) & TYPE_MASK));// ((0000 0001 & 0001 1111) | (type << 5)) & 1110 0000); one byte高三位messageType | writeVarint32(message.seqid); //低五位version bits, VLQ編碼寫入message 的sequence increment id. writeString(message.name); //消息名,即方法名。 }
private static final byte PROTOCOL_ID = (byte)0x82;//1000 0010 private static final byte VERSION = 1; private static final byte VERSION_MASK = 0x1f; // 0001 1111 private static final byte TYPE_MASK = (byte)0xE0; // 1110 0000 private static final byte TYPE_BITS = 0x07; // 0000 0111 private static final int TYPE_SHIFT_AMOUNT = 5;
這裏的version應該爲了之後的version更新。byte類型的messageType(call, execption, oneway,reply)具體請見上篇TBinaryProtocol分析。爲了發消息的完整性,仍是貼出TServiceClient的sendBase()步驟:
protected void sendBase(String methodName, TBase args) throws TException { oprot_.writeMessageBegin(new TMessage(methodName, TMessageType.CALL, ++seqid_)); args.write(oprot_); oprot_.writeMessageEnd(); oprot_.getTransport().flush(); }
如今該進行TBASE的write()了,即方法參數和返回值的封裝類寫,仍是以hello.thrift爲例:
hellostring_args的write():
public void write(org.apache.thrift.protocol.TProtocol oprot) throws org.apache.thrift.TException { schemes.get(oprot.getScheme()).getScheme().write(oprot, this); }
schema的write():
public void write(org.apache.thrift.protocol.TProtocol oprot, helloString_args struct) throws org.apache.thrift.TException { struct.validate(); oprot.writeStructBegin(STRUCT_DESC); if (struct.para != null) { oprot.writeFieldBegin(PARA_FIELD_DESC); oprot.writeString(struct.para); oprot.writeFieldEnd(); } oprot.writeFieldStop(); oprot.writeStructEnd(); }
private static final org.apache.thrift.protocol.TStruct STRUCT_DESC = new org.apache.thrift.protocol.TStruct("helloString_args");// 方法參數封裝類的TStruct表示。 private static final org.apache.thrift.protocol.TField PARA_FIELD_DESC = new org.apache.thrift.protocol.TField("para", org.apache.thrift.protocol.TType.STRING, (short)1);
ok,此處的oprot爲TCompactProtocol,看看他的writeStructBegin():
public void writeStructBegin(TStruct struct) throws TException { lastField_.push(lastFieldId_);記住上次write struct 最後的field id. lastFieldId_ = 0; //從本次參數寫開始。 }
private ShortStack lastField_ = new ShortStack(15); //用於存放Tstructs中的field id(也就是thrift定義文件中service方法參數的標號 1:,2:);用於跟蹤當前struct或者以前struct的field id
接下來,寫writeFieldBegin()吧:
public void writeFieldBegin(TField field) throws TException { if (field.type == TType.BOOL) { //若是該方法參數爲boolean類型, // we want to possibly include the value, so we'll wait. booleanField_ = field; //這裏先作下標記,等會和具體boolean值一塊寫,壓縮嘛!一開始介紹些基本數據類型(上面)的boolean的兩種狀況,第一種指當boolean值爲Tfield的話,壓縮一下,跟這裏相結合, } else { //這裏先記錄下header metadata,等寫實際內容時,即writeBoolean在一塊寫。 writeFieldBeginInternal(field, (byte)-1); } }
private void writeFieldBeginInternal(TField field, byte typeOverride) throws TException { // short lastField = lastField_.pop(); // if there's a type override, use that. // -1得到其內置數據類型,若是非-1狀況,(指的是boolean)直接寫入其byte值 ,true 0x01,false 0x02 byte typeToWrite = typeOverride == -1 ? getCompactType(field.type) : typeOverride; // typeOverride爲寫Boolean值,特設的,對其優化,one byte寫入 // check if we can use delta encoding for the field id 增量式編碼前提,用one byte 4MSB來作增量式編碼,全部field id之間的差不能大於15.每次寫Tstruct(即一個方法參數的封裝類,其中可能含有不少參數) if (field.id > lastFieldId_ && field.id - lastFieldId_ <= 15) { // 由於每次寫struct時,都會設置last_fieldid_ = 0,因此都是一次方法RPC調用參數表示ID之間的比較。不會出現上次RPC方法調用的參數id和 // write them together //本次RPC方法調用參數id的比較。 writeByteDirect((field.id - lastFieldId_) << 4 | typeToWrite); //本次field id和上次field id作增量 << 4和複寫標誌作 |,用一個byte傳輸,壓縮空間。 } else { // write them separate writeByteDirect(typeToWrite); //分開寫 one byte 複寫標誌。 writeI16(field.id); //i16 (zigzag + vlq編碼)寫入,參數個數最大2^16個。 } lastFieldId_ = field.id; //從新複製lastfield_id // lastField_.push(field.id); }
而後就是寫具體的參數值內容了,寫完後寫上writeFieldEnd()操做;
structs全部的參數都寫完後,調用writeFieldStop():
public void writeFieldStop() throws TException { writeByteDirect(TType.STOP);// one byte value 0,佔位符吧,標誌讀完了。 }
writeStructEnd():
public void writeStructEnd() throws TException { lastFieldId_ = lastField_.pop();//從新寫structs時,會吧這值壓入stack,並從新附上0. }
public void writeMessageEnd() throws TException {}
讀操做就不分析了,朋友們能夠參照了去看看。