Protostuff序列化分析

時間 2019-12-04

標籤 protostuff 序列分析简体版

原文原文鏈接

前言
最近項目中須要將業務對象直接序列化，而後存數據庫；考慮到序列化、反序列化的時間以及生產文件的大小以爲Protobuf是一個很好的選擇，可是Protobuf有的問題就是須要有一個.proto的描述文件，並且由Protobuf生成的對象用來做爲業務對象並非特別友好，每每業務對象和Protobuf對象存在一個互相轉換的過程；考慮到咱們僅僅是將業務對象直接序列化到數據庫，發現Protobuf在這種狀況下並非特別的好；
這時候發現了Protostuff，protostuff不須要依賴.proto文件，能夠直接對普通的javabean進行序列化、反序列化的操做，而效率上甚至比protobuf還快，生成的二進制數據庫格式和Protobuf徹底相同的，能夠說是一個基於Protobuf的序列化工具。java

簡單測試
1.先測試一下Protostuff
提供一個簡單的javabean數據庫

public class Person {

    private int id;
    private String name;
    private String email;
        
        // get/set方法省略
}

測試類PbStuff緩存

public class PbStuff {
    
    public static void main(String[] args) throws FileNotFoundException,
            IOException {
        Schema<Person> schema = RuntimeSchema.getSchema(Person.class);
        Person person1 = new Person();
        person1.setId(1);
        person1.setName("zhaohui");
        LinkedBuffer buffer = LinkedBuffer.allocate(1024);
        byte[] data = ProtobufIOUtil.toByteArray(person1, schema, buffer);
        System.out.println(data.length);
    }
}

序列化以後二進制的大小爲29字節session

2.測試Protobuf
proto文件app

option java_package = "protobuf.clazz"; 
option java_outer_classname = "PersonX";

message Person {
  required int32 id = 1;
  required string name = 2;
  required string email = 3;
}

PBTest類ide

public class PBTest {

    public static void main(String[] args) {
        PersonX.Person.Builder builder = PersonX.Person.newBuilder();
        builder.setId(1);
        builder.setName("zhaohui");
        builder.setEmail("xxxxxxxx@126.com");

        PersonX.Person p = builder.build();
        byte[] result = p.toByteArray();
        System.out.println(result.length);

    }
}

序列化以後二進制的大小一樣也是29字節工具

通過簡單的測試：發現Protobuf和Protostuff序列化相同的數據獲得的結果是同樣的
Protobuf的編碼是盡其所能地將字段的元信息和字段的值壓縮存儲，而且字段的元信息中含有對這個字段描述的全部信息；既然Protostuff序列化以後的大小和Protobuf是同樣的，那能夠分析一下Protostuff的源碼源碼分析

源碼分析
1.Schema schema = RuntimeSchema.getSchema(Person.class); //獲取業務對象Person的Schema
RuntimeSchema是一個包含業務對象全部信息的類，包括類信息、字段信息測試

/**
     * Gets the schema that was either registered or lazily initialized at runtime.
     * <p>
     * Method overload for backwards compatibility.
     */
    public static <T> Schema<T> getSchema(Class<T> typeClass)
    {
        return getSchema(typeClass, ID_STRATEGY);
    }

    /**
     * Gets the schema that was either registered or lazily initialized at runtime.
     */
    public static <T> Schema<T> getSchema(Class<T> typeClass,
            IdStrategy strategy)
    {
        return strategy.getSchemaWrapper(typeClass, true).getSchema();
    }

getSchema方法中指定了獲取Schema的默認策略類ID_STRATEGY，ID_STRATEGY在類RuntimeEnv中進行了實例化：ui

ID_STRATEGY = new DefaultIdStrategy();

能夠大體看一下DefaultIdStrategy類：

public final class DefaultIdStrategy extends IdStrategy
{

    final ConcurrentHashMap<String, HasSchema<?>> pojoMapping = new ConcurrentHashMap<>();

    final ConcurrentHashMap<String, EnumIO<?>> enumMapping = new ConcurrentHashMap<>();

    final ConcurrentHashMap<String, CollectionSchema.MessageFactory> collectionMapping = new ConcurrentHashMap<>();

    final ConcurrentHashMap<String, MapSchema.MessageFactory> mapMapping = new ConcurrentHashMap<>();

    final ConcurrentHashMap<String, HasDelegate<?>> delegateMapping = new ConcurrentHashMap<>();
    ...
}

能夠發現DefaultIdStrategy內存緩存了不少Schema信息，不難理解既然要或者業務對象的類和字段信息，必然用到反射機制，這是一個很耗時的過程，進行緩存頗有必要，這樣下次遇到相同的類就能夠不用進行反射了。

因此能夠看到DefaultIdStrategy中有不少這種模式的方法：

public <T> HasSchema<T> getSchemaWrapper(Class<T> typeClass, boolean create)
    {
        HasSchema<T> hs = (HasSchema<T>) pojoMapping.get(typeClass.getName());
        if (hs == null && create)
        {
            hs = new Lazy<>(typeClass, this);
            final HasSchema<T> last = (HasSchema<T>) pojoMapping.putIfAbsent(
                    typeClass.getName(), hs);
            if (last != null)
                hs = last;
        }

        return hs;
    }

先get，若是爲null，就putIfAbsent

當業務對象的Schema還沒被緩存，這時候就會去create，RuntimeSchema提供了createFrom方法：

public static <T> RuntimeSchema<T> createFrom(Class<T> typeClass,
            Set<String> exclusions, IdStrategy strategy)
    {
        final Map<String, java.lang.reflect.Field> fieldMap = findInstanceFields(typeClass);
        ...省略
        final Field<T> field = RuntimeFieldFactory.getFieldFactory(
                        f.getType(), strategy).create(fieldMapping, name, f,
                        strategy);
                fields.add(field);
            }
        }

        return new RuntimeSchema<>(typeClass, fields, RuntimeEnv.newInstantiator(typeClass));
     }

主要就是對typeClass進行反射，而後進行封裝；將字段類型封裝成了RuntimeFieldFactory，最後經過RuntimeFieldFactory的create方法封裝進入Field類中，RuntimeFieldFactory列舉了全部支持的類型：

static final RuntimeFieldFactory<BigDecimal> BIGDECIMAL;
    static final RuntimeFieldFactory<BigInteger> BIGINTEGER;
    static final RuntimeFieldFactory<Boolean> BOOL;
    static final RuntimeFieldFactory<Byte> BYTE;
    static final RuntimeFieldFactory<ByteString> BYTES;
    static final RuntimeFieldFactory<byte[]> BYTE_ARRAY;
    static final RuntimeFieldFactory<Character> CHAR;
    static final RuntimeFieldFactory<Date> DATE;
    static final RuntimeFieldFactory<Double> DOUBLE;
    static final RuntimeFieldFactory<Float> FLOAT;
    static final RuntimeFieldFactory<Integer> INT32;
    static final RuntimeFieldFactory<Long> INT64;
    static final RuntimeFieldFactory<Short> SHORT;
    static final RuntimeFieldFactory<String> STRING;

    static final RuntimeFieldFactory<Integer> ENUM;
    static final RuntimeFieldFactory<Object> OBJECT;
    static final RuntimeFieldFactory<Object> POJO;
    static final RuntimeFieldFactory<Object> POLYMORPHIC_POJO;

    static final RuntimeFieldFactory<Collection<?>> COLLECTION = 
            new RuntimeFieldFactory<Collection<?>>(ID_COLLECTION)

固然還有經常使用的Map類型，在RuntimeMapFieldFactory中定義了

2.LinkedBuffer buffer = LinkedBuffer.allocate(1024);
開闢了1024字節緩存，用來存放業務對象序列化以後存放的地方，固然你可能會擔憂這個大小若是不夠怎麼辦，後面的代碼中能夠看到，若是空間不足，會自動擴展的，全部這個大小要設置一個合適的值，設置大了浪費空間，設置小了會自動擴展浪費時間。

3.byte[] data = ProtobufIOUtil.toByteArray(person1, schema, buffer);
ProtobufIOUtil提供的就是以Protobuf編碼的格式來序列化業務對象

public static <T> byte[] toByteArray(T message, Schema<T> schema, LinkedBuffer buffer)
    {
        if (buffer.start != buffer.offset)
            throw new IllegalArgumentException("Buffer previously used and had not been reset.");

        final ProtobufOutput output = new ProtobufOutput(buffer);
        try
        {
            schema.writeTo(output, message);
        }
        catch (IOException e)
        {
        }

        return output.toByteArray();
    }

schema中調用writeTo方法，將message中的消息保存到ProtobufOutput中

public final void writeTo(Output output, T message) throws IOException
    {
        for (Field<T> f : getFields())
            f.writeTo(output, message);
    }

第一步中將業務對象的字段信息都封裝到了Field中了，能夠看一下Field類提供的幾個方法：

/**
     * Writes the value of a field to the {@code output}.
     */
    protected abstract void writeTo(Output output, T message)
            throws IOException;

    /**
     * Reads the field value into the {@code message}.
     */
    protected abstract void mergeFrom(Input input, T message)
            throws IOException;

    /**
     * Transfer the input field to the output field.
     */
    protected abstract void transfer(Pipe pipe, Input input, Output output,
            boolean repeated) throws IOException;

提供了三個抽象方法，分別是寫數據，讀數據和轉移數據
下面以int類型爲實例，看看實現：

public static final RuntimeFieldFactory<Integer> INT32 = new RuntimeFieldFactory<Integer>(
            ID_INT32)
    {
        @Override
        public <T> Field<T> create(int number, java.lang.String name,
                final java.lang.reflect.Field f, IdStrategy strategy)
        {
            final boolean primitive = f.getType().isPrimitive();
            final long offset = us.objectFieldOffset(f);
            return new Field<T>(FieldType.INT32, number, name,
                    f.getAnnotation(Tag.class))
            {
                @Override
                public void mergeFrom(Input input, T message)
                        throws IOException
                {
                    if (primitive)
                        us.putInt(message, offset, input.readInt32());
                    else
                        us.putObject(message, offset,
                                Integer.valueOf(input.readInt32()));
                }

                @Override
                public void writeTo(Output output, T message)
                        throws IOException
                {
                    if (primitive)
                        output.writeInt32(number, us.getInt(message, offset),
                                false);
                    else
                    {
                        Integer value = (Integer) us.getObject(message, offset);
                        if (value != null)
                            output.writeInt32(number, value.intValue(), false);
                    }
                }
                ...
            };
        }

上面這段代碼能夠在RuntimeUnsafeFieldFactory中找到，基本的數據類型都在此類中能找到，collection和map分別在RuntimeRepeatedFieldFactory和RuntimeMapFieldFactory中，writeTo方法調用了ProtobufOutput中的writeInt32方法：

public void writeInt32(int fieldNumber, int value, boolean repeated) throws IOException
    {
         ...
         tail = writeTagAndRawVarInt32(
                  makeTag(fieldNumber, WIRETYPE_VARINT),
                  value,
                  this,
                  tail);
          ...
    }

寫入field的Tag已經Value，Protobuf也是這種形式存放的，以下圖所示：

public static LinkedBuffer writeTagAndRawVarInt32(int tag, int value,
            final WriteSession session, LinkedBuffer lb)
    {
        final int tagSize = computeRawVarint32Size(tag);
        final int size = computeRawVarint32Size(value);
        final int totalSize = tagSize + size;

        if (lb.offset + totalSize > lb.buffer.length)
            lb = new LinkedBuffer(session.nextBufferSize, lb);

        final byte[] buffer = lb.buffer;
        int offset = lb.offset;
        lb.offset += totalSize;
        session.size += totalSize;

        if (tagSize == 1)
            buffer[offset++] = (byte) tag;
        else
        {
            for (int i = 0, last = tagSize - 1; i < last; i++, tag >>>= 7)
                buffer[offset++] = (byte) ((tag & 0x7F) | 0x80);

            buffer[offset++] = (byte) tag;
        }

        if (size == 1)
            buffer[offset] = (byte) value;
        else
        {
            for (int i = 0, last = size - 1; i < last; i++, value >>>= 7)
                buffer[offset++] = (byte) ((value & 0x7F) | 0x80);

            buffer[offset] = (byte) value;
        }

        return lb;
    }

tag是經過makeTag方法建立的：

public static int makeTag(final int fieldNumber, final int wireType)
    {
        return (fieldNumber << TAG_TYPE_BITS) | wireType;
    }

fieldNumber每一個字段的標號，wire_type是該字段的數據類型，全部若是咱們改變了業務對象類中字段的順序，或者改變了字段的類型，都會出現反序列化失敗；
前面提到的數據壓縮在方法computeRawVarint32Size中體現出來了：

public static int computeRawVarint32Size(final int value)
    {
        if ((value & (0xffffffff << 7)) == 0)
            return 1;
        if ((value & (0xffffffff << 14)) == 0)
            return 2;
        if ((value & (0xffffffff << 21)) == 0)
            return 3;
        if ((value & (0xffffffff << 28)) == 0)
            return 4;
        return 5;
    }

根據value值的範圍，返回不一樣的字節數；接下來的代碼也能夠看到檢查LinkedBuffer的空間是否足夠，不夠進行擴充；接下來的代碼就是用壓縮的方式將tag和Value存入緩存中。

總結大體瞭解了Protostuff對業務對象序列化的過程，不論是簡單的測試仍是經過查看源碼，均可以發現Protostuff的序列化方式是徹底借鑑Protobuf來實現的。

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。