先看下dubbo在serialize層的類設計方案
序列化方案的入口,是接口Serialization的實現類。java
/** * Serialization. (SPI, Singleton, ThreadSafe) * 默認擴展方案是 hessian2 也是dubbo協議默認序列化實現 * @author ding.lid * @author william.liangf */ @SPI("hessian2") public interface Serialization { /** * get content type id * 序列化標識 具體每一個spi擴展方案指定一個固定值 * @return content type id */ byte getContentTypeId(); /** * get content type * * @return content type */ String getContentType(); /** * create serializer * 獲取一個具體序列化實現實例 * @param url * @param output * @return serializer * @throws IOException */ @Adaptive ObjectOutput serialize(URL url, OutputStream output) throws IOException; /** * create deserializer * 獲取一個具體反序列化實現實例 * @param url * @param input * @return deserializer * @throws IOException */ @Adaptive ObjectInput deserialize(URL url, InputStream input) throws IOException; }
目前dubbo的spi實現有:數組
具體看下DubboSerialization類和Hessian2Serialization類:app
public class DubboSerialization implements Serialization { /****\ * 固定值 1 * @return */ public byte getContentTypeId() { return 1; } /*** * 固定值 x-application/dubbo * @return */ public String getContentType() { return "x-application/dubbo"; } /*** * 具體序列化實現實例是GenericObjectOutput * @param url * @param out * @return * @throws IOException */ public ObjectOutput serialize(URL url, OutputStream out) throws IOException { return new GenericObjectOutput(out); } /*** * 具體反序列化實現實例是GenericObjectInput * @param url * @param is * @return * @throws IOException */ public ObjectInput deserialize(URL url, InputStream is) throws IOException { return new GenericObjectInput(is); } } public class Hessian2Serialization implements Serialization { public static final byte ID = 2; public byte getContentTypeId() { return ID; } public String getContentType() { return "x-application/hessian2"; } public ObjectOutput serialize(URL url, OutputStream out) throws IOException { return new Hessian2ObjectOutput(out); } public ObjectInput deserialize(URL url, InputStream is) throws IOException { return new Hessian2ObjectInput(is); } }
能夠看到具體序列化實例是內聚到接口實現裏的。
以DubboSerialization爲例,看下具體實例的類層次結構。框架
GenericObjectOutput類繼承關係以下圖:ide
GenericObjectInput類繼承關係以下圖:工具
因此能夠用個二維繼承圖來表示DubboSerialization的類層次圖:測試
其餘Serialization擴展實現也有相似的繼承關係類圖,這樣就很好的把接口和實現作了分離,方便spi擴展。ui
值得一提的是,hession2序列化是dubbo內置了開源的hession序列化實現。hession序列化因爲是二進制的因此序列化結果較小。不過Dubbo框架本身也寫了一個序列化工具方案,從他們提供的測試類來看,dubbo本身實現的方案序列化結果更小。這裏很樂意研究下,dubbo本身實現的序列化方案.編碼
它設計結構,就是圍繞序列化對象構建Builder,由Builder來驅動對象序列化和反序列化url
Builder<T> 自己是個泛型抽象類,提供了三個待實現抽象方法:
/*** * 對象序列化方法 待實現 * @param obj * @param out * @throws IOException */ abstract public void writeTo(T obj, GenericObjectOutput out) throws IOException; /*** * 對象反序列化方法 待實現 * @param in * @return * @throws IOException */ abstract public T parseFrom(GenericObjectInput in) throws IOException; /*** * 返回 序列化 類型 class * @return */ abstract public Class<T> getType();
根據設計原理,每種要序列化的類型,都要有對應的Builder實現類。Builder類自己
內置了8種基本類型,以及String,HashMap,ArrayList,Date等12中經常使用數據類型和集合類的Builder實現。
而對於自定義的pojo對象序列化,則是利用javasssit技術動態生成Builder實現類,
在設計上應該是基於一個假設,即複雜的pojo對象也都是由上述經常使用的類型字段構造的。
pojo的序列化操做,實際就是對對象字段的序列化操做,這樣就能夠用上述內置的類型Builder分別對各種型字段序列化。上面說到,dubbo自身實現的序列化結果較小,也是由於它爲每種類型,特別經常使用的內置類型都定製化了序列化方案,好比作空間壓縮。
好比Integer類型Builder實現:
new Builder<Integer>() { @Override public Class<Integer> getType() { return Integer.class; } //序列化方法 @Override public void writeTo(Integer obj, GenericObjectOutput out) throws IOException { if (obj == null) {//等於null,就存表明null的標識 out.write0(OBJECT_NULL); } else {//不然,按規則,現存一字節,表明有值的標識,接下來存具體值 out.write0(OBJECT_VALUE); out.writeInt(obj.intValue()); } } //反序列化方法 @Override public Integer parseFrom(GenericObjectInput in) throws IOException { byte b = in.read0(); if (b == OBJECT_NULL)//是空標識,返回null return null; if (b != OBJECT_VALUE) throw new IOException("Input format error, expect OBJECT_NULL|OBJECT_VALUE, get " + b + "."); //不然,返回反序列化後值 return Integer.valueOf(in.readInt()); } }
還有ArrayList類型的Builder實現:
new Builder<ArrayList>() { @Override public Class<ArrayList> getType() { return ArrayList.class; } @Override public void writeTo(ArrayList obj, GenericObjectOutput out) throws IOException { if (obj == null) {//null值 out.write0(OBJECT_NULL); } else {// out.write0(OBJECT_VALUES);//存一個多值的標識 out.writeUInt(obj.size());//存值個數 for (Object item : obj) out.writeObject(item);//具體循環序列化每一個字 } } @Override public ArrayList parseFrom(GenericObjectInput in) throws IOException { byte b = in.read0(); if (b == OBJECT_NULL)//null值 return null; if (b != OBJECT_VALUES) throw new IOException("Input format error, expect OBJECT_NULL|OBJECT_VALUES, get " + b + "."); int len = in.readUInt();//先讀取值個數 ArrayList ret = new ArrayList(len);//根據大小構造容器 for (int i = 0; i < len; i++)//遍歷讀取值個數,放序列化 ret.add(in.readObject()); return ret; } }
能夠看到每種類型的序列化,反序列化都是對應的。這樣就不會數據混亂了。
再看下本身定義的pojo類型,dubbo處理方式是,利用javassit動態生成對應的Builder實現,以Phone爲例:
public class Phone implements Serializable { private static final long serialVersionUID = 4399060521859707703L; private String country; private String area; private String number; private String extensionNumber; //getter,setter 省略 }
javassit動態生成的Builder實現代碼反編譯後,這樣的:
public class Phone$bc0 extends Builder.AbstractObjectBuilder implements ClassGenerator.DC { public static Field[] fields; public static Builder[] builders; public Class getType() { return Phone.class; } //具體序列化方法 protected void writeObject(Object paramObject, GenericObjectOutput paramGenericObjectOutput) throws IOException { Phone localPhone = (Phone)paramObject; paramGenericObjectOutput.writeInt(fields.length); //因爲Phone的4個字段都是String類型,因此這裏builders的元素其實都是Builder<String>實例,具體是String的序列化方法 //同理,反序列化相似。 builders[0].writeTo(localPhone.getArea(), paramGenericObjectOutput); builders[1].writeTo(localPhone.getCountry(), paramGenericObjectOutput); builders[2].writeTo(localPhone.getExtensionNumber(), paramGenericObjectOutput); builders[3].writeTo(localPhone.getNumber(), paramGenericObjectOutput); } //反序列化方法 protected void readObject(Object paramObject, GenericObjectInput paramGenericObjectInput) throws IOException { int i = paramGenericObjectInput.readInt(); if (i != 4) throw new IllegalStateException("Deserialize Class [com.alibaba.dubbo.common.model.person.Phone], field count not matched. Expect 4 but get " + i + "."); Phone localPhone = (Phone)paramObject; if (i == 0) return; localPhone.setArea((String)builders[0].parseFrom(paramGenericObjectInput)); if (i == 1) return; localPhone.setCountry((String)builders[1].parseFrom(paramGenericObjectInput)); if (i == 2) return; localPhone.setExtensionNumber((String)builders[2].parseFrom(paramGenericObjectInput)); if (i == 3) return; localPhone.setNumber((String)builders[3].parseFrom(paramGenericObjectInput)); for (int j = 4; j < i; j++) paramGenericObjectInput.skipAny(); } protected Object newInstance(GenericObjectInput paramGenericObjectInput) { return new Phone(); } }
這個類實現了Builder內部一個抽象類:
public static abstract class AbstractObjectBuilder<T> extends Builder<T> { abstract public Class<T> getType(); public void writeTo(T obj, GenericObjectOutput out) throws IOException { if (obj == null) { out.write0(OBJECT_NULL); } else { int ref = out.getRef(obj); if (ref < 0) { out.addRef(obj); out.write0(OBJECT); writeObject(obj, out); } else { out.write0(OBJECT_REF); out.writeUInt(ref); } } } public T parseFrom(GenericObjectInput in) throws IOException { byte b = in.read0(); switch (b) { case OBJECT: { T ret = newInstance(in); in.addRef(ret); readObject(ret, in); return ret; } case OBJECT_REF: return (T) in.getRef(in.readUInt()); case OBJECT_NULL: return null; default: throw new IOException("Input format error, expect OBJECT|OBJECT_REF|OBJECT_NULL, get " + b); } } abstract protected void writeObject(T obj, GenericObjectOutput out) throws IOException; abstract protected T newInstance(GenericObjectInput in) throws IOException; abstract protected void readObject(T ret, GenericObjectInput in) throws IOException; }
實現了它3個抽象方法。這樣上層方法調用writeTo()方法和parseFrom()方法就能夠作序列化操做了。其餘像數組類型和枚舉類型的動態Builder<T>也都差很少這個思路。
以上作的主要目的就是把複雜類型的序列化,轉爲對基本類型的序列化和反序列化。
由GenericObjectInput類和GenericObjectOutput的繼承關係圖可知,
基本類型的序列化在GenericDataOutput類裏,反序列化在GenericDataInput類裏,具體代碼,抽取一個分析下。
這裏看下int值的序列化,writeVarint32(int v)方法:
/*** * 對int數據的序列化 -15到31值,留做特殊標識,序列化時 * 固定依次存10到56(爲何選這47位數,不太理解????誰來醍醐灌頂) * @param v * @throws IOException */ private void writeVarint32(int v) throws IOException { switch (v) { case -15: write0(VARINT_NF); break; case -14: write0(VARINT_NE); break; case -13: write0(VARINT_ND); break; case -12: write0(VARINT_NC); break; case -11: write0(VARINT_NB); break; case -10: write0(VARINT_NA); break; case -9: write0(VARINT_N9); break; case -8: write0(VARINT_N8); break; case -7: write0(VARINT_N7); break; case -6: write0(VARINT_N6); break; case -5: write0(VARINT_N5); break; case -4: write0(VARINT_N4); break; case -3: write0(VARINT_N3); break; case -2: write0(VARINT_N2); break; case -1: write0(VARINT_N1); break; case 0: write0(VARINT_0); break; case 1: write0(VARINT_1); break; case 2: write0(VARINT_2); break; case 3: write0(VARINT_3); break; case 4: write0(VARINT_4); break; case 5: write0(VARINT_5); break; case 6: write0(VARINT_6); break; case 7: write0(VARINT_7); break; case 8: write0(VARINT_8); break; case 9: write0(VARINT_9); break; case 10: write0(VARINT_A); break; case 11: write0(VARINT_B); break; case 12: write0(VARINT_C); break; case 13: write0(VARINT_D); break; case 14: write0(VARINT_E); break; case 15: write0(VARINT_F); break; case 16: write0(VARINT_10); break; case 17: write0(VARINT_11); break; case 18: write0(VARINT_12); break; case 19: write0(VARINT_13); break; case 20: write0(VARINT_14); break; case 21: write0(VARINT_15); break; case 22: write0(VARINT_16); break; case 23: write0(VARINT_17); break; case 24: write0(VARINT_18); break; case 25: write0(VARINT_19); break; case 26: write0(VARINT_1A); break; case 27: write0(VARINT_1B); break; case 28: write0(VARINT_1C); break; case 29: write0(VARINT_1D); break; case 30: write0(VARINT_1E); break; case 31: write0(VARINT_1F); break; default: //其餘值的存放規則是, //第一字節,後續有幾個有效字節, //後面是具體的有效字節 int t = v, ix = 0; byte[] b = mTemp; //把字節由低到高放入字節數組b[1],b[2] while (true) { b[++ix] = (byte) (v & 0xff); if ((v >>>= 8) == 0) break; } if (t > 0) {//是正數 // [ 0a e2 => 0a e2 00 ] [ 92 => 92 00 ] if (b[ix] < 0)//最高字節,最高位爲1 這樣比較,就是負數 b[++ix] = 0;//補0,防止誤解析爲負數 } else {//是負數,存的是補碼(是它相反數的各位取反,末尾加1) 這裏作壓縮bit位,//有點繞 // [ 01 ff ff ff => 01 ff ] [ e0 ff ff ff => e0 ] [ 01 e0 ff ff ff => 01 e0 ] 1110 while (b[ix] == (byte) 0xff && b[ix - 1] < 0) ix--; } b[0] = (byte) (VARINT + ix - 1);//存一個標識爲,表明有效字節數(0 表明1個字節,1:表明2個字節,2,表明3個字節) write0(b, 0, ix + 1);//因此這裏寫ix+1個字節 } }
能夠對照下int的反序列化:
private int readVarint32() throws IOException { byte b = read0();//第一個字節是標識字節 switch (b) { case VARINT8://0表明接下來,1個有效字節 return read0(); case VARINT16: {//1表明接下來,2個有效字節 byte b1 = read0(), b2 = read0(); return (short) ((b1 & 0xff) | ((b2 & 0xff) << 8)); } case VARINT24: { byte b1 = read0(), b2 = read0(), b3 = read0(); int ret = (b1 & 0xff) | ((b2 & 0xff) << 8) | ((b3 & 0xff) << 16); if (b3 < 0) return ret | 0xff000000; return ret; } case VARINT32: { byte b1 = read0(), b2 = read0(), b3 = read0(), b4 = read0(); return ((b1 & 0xff) | ((b2 & 0xff) << 8) | ((b3 & 0xff) << 16) | ((b4 & 0xff) << 24)); } //其餘特殊值硬編碼 case VARINT_NF: return -15; case VARINT_NE: return -14; case VARINT_ND: return -13; case VARINT_NC: return -12; case VARINT_NB: return -11; case VARINT_NA: return -10; case VARINT_N9: return -9; case VARINT_N8: return -8; case VARINT_N7: return -7; case VARINT_N6: return -6; case VARINT_N5: return -5; case VARINT_N4: return -4; case VARINT_N3: return -3; case VARINT_N2: return -2; case VARINT_N1: return -1; case VARINT_0: return 0; case VARINT_1: return 1; case VARINT_2: return 2; case VARINT_3: return 3; case VARINT_4: return 4; case VARINT_5: return 5; case VARINT_6: return 6; case VARINT_7: return 7; case VARINT_8: return 8; case VARINT_9: return 9; case VARINT_A: return 10; case VARINT_B: return 11; case VARINT_C: return 12; case VARINT_D: return 13; case VARINT_E: return 14; case VARINT_F: return 15; case VARINT_10: return 16; case VARINT_11: return 17; case VARINT_12: return 18; case VARINT_13: return 19; case VARINT_14: return 20; case VARINT_15: return 21; case VARINT_16: return 22; case VARINT_17: return 23; case VARINT_18: return 24; case VARINT_19: return 25; case VARINT_1A: return 26; case VARINT_1B: return 27; case VARINT_1C: return 28; case VARINT_1D: return 29; case VARINT_1E: return 30; case VARINT_1F: return 31; default: throw new IOException("Tag error, expect VARINT, but get " + b); } }
//看64位long型的序列化,體現出節約空間的操做 private void writeVarint64(long v) throws IOException { int i = (int) v; //若是int能放下long ,就用int存,儘可能減小沒必要要的空間浪費 if (v == i) { writeVarint32(i); } else { long t = v; int ix = 0; byte[] b = mTemp; while (true) { b[++ix] = (byte) (v & 0xff); if ((v >>>= 8) == 0) break; } if (t > 0) { // [ 0a e2 => 0a e2 00 ] [ 92 => 92 00 ] if (b[ix] < 0) b[++ix] = 0; } else { // [ 01 ff ff ff => 01 ff ] [ e0 ff ff ff => e0 ] while (b[ix] == (byte) 0xff && b[ix - 1] < 0) ix--; } b[0] = (byte) (VARINT + ix - 1); write0(b, 0, ix + 1); } } //有意思的是,對String的序列化,實際上是作utf8編碼 /*** 處理字符串是,取每一個字符 * 循環處理,每次最多處理256個字符, * 具體,把字符經過String的getChars方法放入字符數組,再把字符轉爲字節,同時作utf8編碼 * 具體能夠看看,utf8編碼規則,要否則很差懂。 * 對字符串的序列化因爲用了utf8編碼,相對unicode實際上是放大了存儲空間 */ public void writeUTF(String v) throws IOException { if (v == null) { write0(OBJECT_NULL); } else { int len = v.length(); if (len == 0) { write0(OBJECT_DUMMY); } else { write0(OBJECT_BYTES); writeUInt(len); int off = 0, limit = mLimit - 3, size; char[] buf = mCharBuf;//256 do { //最大256 size = Math.min(len - off, CHAR_BUF_SIZE); //把char字符放入buf v.getChars(off, off + size, buf, 0); for (int i = 0; i < size; i++) { char c = buf[i]; if (mPosition > limit) {//還剩2字節緩衝區 if (c < 0x80) {//若是一字節能表示,就用一字節 1000,0000 write0((byte) c); } else if (c < 0x800) { //0000 1000,0000,0000 write0((byte) (0xC0 | ((c >> 6) & 0x1F)));//取高5位,前面補110 write0((byte) (0x80 | (c & 0x3F)));//取低六位前面補10 } else { write0((byte) (0xE0 | ((c >> 12) & 0x0F)));//取高4位,前面補1110 write0((byte) (0x80 | ((c >> 6) & 0x3F)));//取中6位,前面補10 write0((byte) (0x80 | (c & 0x3F)));//取末尾6位,前面補10 } } else {//還剩3個以上字節緩衝區,直接放緩衝區 if (c < 0x80) { mBuffer[mPosition++] = (byte) c; } else if (c < 0x800) { mBuffer[mPosition++] = (byte) (0xC0 | ((c >> 6) & 0x1F)); mBuffer[mPosition++] = (byte) (0x80 | (c & 0x3F)); } else { mBuffer[mPosition++] = (byte) (0xE0 | ((c >> 12) & 0x0F)); mBuffer[mPosition++] = (byte) (0x80 | ((c >> 6) & 0x3F)); mBuffer[mPosition++] = (byte) (0x80 | (c & 0x3F)); } } } off += size;//取下一字節段 } while (off < len); } } }