前言
最近項目中須要將業務對象直接序列化,而後存數據庫;考慮到序列化、反序列化的時間以及生產文件的大小以爲Protobuf是一個很好的選擇,可是Protobuf有的問題就是須要有一個.proto的描述文件,並且由Protobuf生成的對象用來做爲業務對象並非特別友好,每每業務對象和Protobuf對象存在一個互相轉換的過程;考慮到咱們僅僅是將業務對象直接序列化到數據庫,發現Protobuf在這種狀況下並非特別的好;
這時候發現了Protostuff,protostuff不須要依賴.proto文件,能夠直接對普通的javabean進行序列化、反序列化的操做,而效率上甚至比protobuf還快,生成的二進制數據庫格式和Protobuf徹底相同的,能夠說是一個基於Protobuf的序列化工具。java
簡單測試
1.先測試一下Protostuff
提供一個簡單的javabean數據庫
public class Person { private int id; private String name; private String email; // get/set方法省略 }
測試類PbStuff緩存
public class PbStuff { public static void main(String[] args) throws FileNotFoundException, IOException { Schema<Person> schema = RuntimeSchema.getSchema(Person.class); Person person1 = new Person(); person1.setId(1); person1.setName("zhaohui"); LinkedBuffer buffer = LinkedBuffer.allocate(1024); byte[] data = ProtobufIOUtil.toByteArray(person1, schema, buffer); System.out.println(data.length); } }
序列化以後二進制的大小爲29字節session
2.測試Protobuf
proto文件app
option java_package = "protobuf.clazz"; option java_outer_classname = "PersonX"; message Person { required int32 id = 1; required string name = 2; required string email = 3; }
PBTest類ide
public class PBTest { public static void main(String[] args) { PersonX.Person.Builder builder = PersonX.Person.newBuilder(); builder.setId(1); builder.setName("zhaohui"); builder.setEmail("xxxxxxxx@126.com"); PersonX.Person p = builder.build(); byte[] result = p.toByteArray(); System.out.println(result.length); } }
序列化以後二進制的大小一樣也是29字節工具
通過簡單的測試:發現Protobuf和Protostuff序列化相同的數據獲得的結果是同樣的
Protobuf的編碼是盡其所能地將字段的元信息和字段的值壓縮存儲,而且字段的元信息中含有對這個字段描述的全部信息;既然Protostuff序列化以後的大小和Protobuf是同樣的,那能夠分析一下Protostuff的源碼源碼分析
源碼分析
1.Schema schema = RuntimeSchema.getSchema(Person.class); //獲取業務對象Person的Schema
RuntimeSchema是一個包含業務對象全部信息的類,包括類信息、字段信息測試
/** * Gets the schema that was either registered or lazily initialized at runtime. * <p> * Method overload for backwards compatibility. */ public static <T> Schema<T> getSchema(Class<T> typeClass) { return getSchema(typeClass, ID_STRATEGY); } /** * Gets the schema that was either registered or lazily initialized at runtime. */ public static <T> Schema<T> getSchema(Class<T> typeClass, IdStrategy strategy) { return strategy.getSchemaWrapper(typeClass, true).getSchema(); }
getSchema方法中指定了獲取Schema的默認策略類ID_STRATEGY,ID_STRATEGY在類RuntimeEnv中進行了實例化:ui
ID_STRATEGY = new DefaultIdStrategy();
能夠大體看一下DefaultIdStrategy類:
public final class DefaultIdStrategy extends IdStrategy { final ConcurrentHashMap<String, HasSchema<?>> pojoMapping = new ConcurrentHashMap<>(); final ConcurrentHashMap<String, EnumIO<?>> enumMapping = new ConcurrentHashMap<>(); final ConcurrentHashMap<String, CollectionSchema.MessageFactory> collectionMapping = new ConcurrentHashMap<>(); final ConcurrentHashMap<String, MapSchema.MessageFactory> mapMapping = new ConcurrentHashMap<>(); final ConcurrentHashMap<String, HasDelegate<?>> delegateMapping = new ConcurrentHashMap<>(); ... }
能夠發現DefaultIdStrategy內存緩存了不少Schema信息,不難理解既然要或者業務對象的類和字段信息,必然用到反射機制,這是一個很耗時的過程,進行緩存頗有必要,這樣下次遇到相同的類就能夠不用進行反射了。
因此能夠看到DefaultIdStrategy中有不少這種模式的方法:
public <T> HasSchema<T> getSchemaWrapper(Class<T> typeClass, boolean create) { HasSchema<T> hs = (HasSchema<T>) pojoMapping.get(typeClass.getName()); if (hs == null && create) { hs = new Lazy<>(typeClass, this); final HasSchema<T> last = (HasSchema<T>) pojoMapping.putIfAbsent( typeClass.getName(), hs); if (last != null) hs = last; } return hs; }
先get,若是爲null,就putIfAbsent
當業務對象的Schema還沒被緩存,這時候就會去create,RuntimeSchema提供了createFrom方法:
public static <T> RuntimeSchema<T> createFrom(Class<T> typeClass, Set<String> exclusions, IdStrategy strategy) { final Map<String, java.lang.reflect.Field> fieldMap = findInstanceFields(typeClass); ...省略 final Field<T> field = RuntimeFieldFactory.getFieldFactory( f.getType(), strategy).create(fieldMapping, name, f, strategy); fields.add(field); } } return new RuntimeSchema<>(typeClass, fields, RuntimeEnv.newInstantiator(typeClass)); }
主要就是對typeClass進行反射,而後進行封裝;將字段類型封裝成了RuntimeFieldFactory,最後經過RuntimeFieldFactory的create方法封裝進入Field類中,RuntimeFieldFactory列舉了全部支持的類型:
static final RuntimeFieldFactory<BigDecimal> BIGDECIMAL; static final RuntimeFieldFactory<BigInteger> BIGINTEGER; static final RuntimeFieldFactory<Boolean> BOOL; static final RuntimeFieldFactory<Byte> BYTE; static final RuntimeFieldFactory<ByteString> BYTES; static final RuntimeFieldFactory<byte[]> BYTE_ARRAY; static final RuntimeFieldFactory<Character> CHAR; static final RuntimeFieldFactory<Date> DATE; static final RuntimeFieldFactory<Double> DOUBLE; static final RuntimeFieldFactory<Float> FLOAT; static final RuntimeFieldFactory<Integer> INT32; static final RuntimeFieldFactory<Long> INT64; static final RuntimeFieldFactory<Short> SHORT; static final RuntimeFieldFactory<String> STRING; static final RuntimeFieldFactory<Integer> ENUM; static final RuntimeFieldFactory<Object> OBJECT; static final RuntimeFieldFactory<Object> POJO; static final RuntimeFieldFactory<Object> POLYMORPHIC_POJO; static final RuntimeFieldFactory<Collection<?>> COLLECTION = new RuntimeFieldFactory<Collection<?>>(ID_COLLECTION)
固然還有經常使用的Map類型,在RuntimeMapFieldFactory中定義了
2.LinkedBuffer buffer = LinkedBuffer.allocate(1024);
開闢了1024字節緩存,用來存放業務對象序列化以後存放的地方,固然你可能會擔憂這個大小若是不夠怎麼辦,後面的代碼中能夠看到,若是空間不足,會自動擴展的,全部這個大小要設置一個合適的值,設置大了浪費空間,設置小了會自動擴展浪費時間。
3.byte[] data = ProtobufIOUtil.toByteArray(person1, schema, buffer);
ProtobufIOUtil提供的就是以Protobuf編碼的格式來序列化業務對象
public static <T> byte[] toByteArray(T message, Schema<T> schema, LinkedBuffer buffer) { if (buffer.start != buffer.offset) throw new IllegalArgumentException("Buffer previously used and had not been reset."); final ProtobufOutput output = new ProtobufOutput(buffer); try { schema.writeTo(output, message); } catch (IOException e) { } return output.toByteArray(); }
schema中調用writeTo方法,將message中的消息保存到ProtobufOutput中
public final void writeTo(Output output, T message) throws IOException { for (Field<T> f : getFields()) f.writeTo(output, message); }
第一步中將業務對象的字段信息都封裝到了Field中了,能夠看一下Field類提供的幾個方法:
/** * Writes the value of a field to the {@code output}. */ protected abstract void writeTo(Output output, T message) throws IOException; /** * Reads the field value into the {@code message}. */ protected abstract void mergeFrom(Input input, T message) throws IOException; /** * Transfer the input field to the output field. */ protected abstract void transfer(Pipe pipe, Input input, Output output, boolean repeated) throws IOException;
提供了三個抽象方法,分別是寫數據,讀數據和轉移數據
下面以int類型爲實例,看看實現:
public static final RuntimeFieldFactory<Integer> INT32 = new RuntimeFieldFactory<Integer>( ID_INT32) { @Override public <T> Field<T> create(int number, java.lang.String name, final java.lang.reflect.Field f, IdStrategy strategy) { final boolean primitive = f.getType().isPrimitive(); final long offset = us.objectFieldOffset(f); return new Field<T>(FieldType.INT32, number, name, f.getAnnotation(Tag.class)) { @Override public void mergeFrom(Input input, T message) throws IOException { if (primitive) us.putInt(message, offset, input.readInt32()); else us.putObject(message, offset, Integer.valueOf(input.readInt32())); } @Override public void writeTo(Output output, T message) throws IOException { if (primitive) output.writeInt32(number, us.getInt(message, offset), false); else { Integer value = (Integer) us.getObject(message, offset); if (value != null) output.writeInt32(number, value.intValue(), false); } } ... }; }
上面這段代碼能夠在RuntimeUnsafeFieldFactory中找到,基本的數據類型都在此類中能找到,collection和map分別在RuntimeRepeatedFieldFactory和RuntimeMapFieldFactory中,writeTo方法調用了ProtobufOutput中的writeInt32方法:
public void writeInt32(int fieldNumber, int value, boolean repeated) throws IOException { ... tail = writeTagAndRawVarInt32( makeTag(fieldNumber, WIRETYPE_VARINT), value, this, tail); ... }
寫入field的Tag已經Value,Protobuf也是這種形式存放的,以下圖所示:
public static LinkedBuffer writeTagAndRawVarInt32(int tag, int value, final WriteSession session, LinkedBuffer lb) { final int tagSize = computeRawVarint32Size(tag); final int size = computeRawVarint32Size(value); final int totalSize = tagSize + size; if (lb.offset + totalSize > lb.buffer.length) lb = new LinkedBuffer(session.nextBufferSize, lb); final byte[] buffer = lb.buffer; int offset = lb.offset; lb.offset += totalSize; session.size += totalSize; if (tagSize == 1) buffer[offset++] = (byte) tag; else { for (int i = 0, last = tagSize - 1; i < last; i++, tag >>>= 7) buffer[offset++] = (byte) ((tag & 0x7F) | 0x80); buffer[offset++] = (byte) tag; } if (size == 1) buffer[offset] = (byte) value; else { for (int i = 0, last = size - 1; i < last; i++, value >>>= 7) buffer[offset++] = (byte) ((value & 0x7F) | 0x80); buffer[offset] = (byte) value; } return lb; }
tag是經過makeTag方法建立的:
public static int makeTag(final int fieldNumber, final int wireType) { return (fieldNumber << TAG_TYPE_BITS) | wireType; }
fieldNumber每一個字段的標號,wire_type是該字段的數據類型,全部若是咱們改變了業務對象類中字段的順序,或者改變了字段的類型,都會出現反序列化失敗;
前面提到的數據壓縮在方法computeRawVarint32Size中體現出來了:
public static int computeRawVarint32Size(final int value) { if ((value & (0xffffffff << 7)) == 0) return 1; if ((value & (0xffffffff << 14)) == 0) return 2; if ((value & (0xffffffff << 21)) == 0) return 3; if ((value & (0xffffffff << 28)) == 0) return 4; return 5; }
根據value值的範圍,返回不一樣的字節數;接下來的代碼也能夠看到檢查LinkedBuffer的空間是否足夠,不夠進行擴充;接下來的代碼就是用壓縮的方式將tag和Value存入緩存中。
總結 大體瞭解了Protostuff對業務對象序列化的過程,不論是簡單的測試仍是經過查看源碼,均可以發現Protostuff的序列化方式是徹底借鑑Protobuf來實現的。