Protobuf動態解析那些事兒

時間 2019-11-12

原文原文鏈接

需求背景

在接收到 protobuf 數據以後，如何自動建立具體的 Protobuf Message 對象，再作反序列化。「自動」的意思主要有兩個方面：（1）當程序中新增一個 protobuf Message 類型時，這部分代碼不須要修改，不須要本身去註冊消息類型，不須要重啓進程，只須要提供protobuf文件；（2）當protobuf Message修改後，這部分代碼不須要修改，不須要本身去註冊消息類型，不須要重啓進程只須要提供修改後protobuf文件。html

技術介紹

Protobuf的入門能夠參考 Google Protocol Buffer 的在線幫助網頁或者IBM developerwor上的文章《Google Protocol Buffer 的使用和原理》。linux

protobuf的動態解析在google protobuf buffer官網並無什麼介紹。經過google出的一些參考文檔能夠知道，其實，Google Protobuf 自己具備很強的反射(reflection)功能，能夠根據 type name 建立具體類型的 Message 對象，咱們直接利用便可，應該就能夠知足上面的需求。網絡

實現能夠參考淘寶的文章《玩轉Protocol Buffers 》，裏面對protobuf的動態解析的原理作了詳細的介紹，在此我介紹一下Protobuf class diagram。函數

你們一般關心和使用的是圖的左半部分：MessageLite、Message、Generated Message Types (Person, AddressBook) 等，而較少注意到圖的右半部分：Descriptor, DescriptorPool, MessageFactory。性能

上圖中，其關鍵做用的是 Descriptor class，每一個具體 Message Type 對應一個 Descriptor 對象。儘管咱們沒有直接調用它的函數，可是Descriptor在「根據 type name 建立具體類型的 Message 對象」中扮演了重要的角色，起了橋樑做用。上圖的紅色箭頭描述了根據 type name 建立具體 Message 對象的過程。測試

實現

先直接上代碼，這個代碼來自於《玩轉Protocol Buffers 》：ui

#include <google/protobuf/descriptor.h>google

#include <google/protobuf/descriptor.pb.h>spa

#include <google/protobuf/dynamic_message.h>.net

#include <google/protobuf/compiler/importer.h>

using namespace google::protobuf;

using namespace google::protobuf::compiler;

int main(int argc,const char *argv[])

DiskSourceTree sourceTree;

//look up .proto file in current directory

sourceTree.MapPath("","./");

Importer importer(&sourceTree, NULL);

//runtime compile foo.proto

importer.Import("foo.proto");

const Descriptor *descriptor = importer.pool()->

FindMessageTypeByName("Pair");

cout << descriptor->DebugString();

// build a dynamic message by "Pair" proto

DynamicMessageFactory factory;

const Message *message = factory.GetPrototype(descriptor);

// create a real instance of "Pair"

Message *pair = message->New();

// write the "Pair" instance by reflection

const Reflection *reflection = pair->GetReflection();

const FieldDescriptor *field = NULL;

field = descriptor->FindFieldByName("key");

reflection->SetString(pair, field,"my key");

field = descriptor->FindFieldByName("value");

reflection->SetUInt32(pair, field, 1111);

cout << pair->DebugString();

那咱們就來看看上面的代碼

1 ）把本地地址映射爲虛擬地址

DiskSourceTree sourceTree;

//look up .proto file in current directory

sourceTree.MapPath("","./");

2 ）構造 DescriptorPool

Importer importer(&sourceTree, NULL);

//runtime compile foo.proto

importer.Import("foo.proto");

3 ）獲取 Descriptor

const Descriptor *descriptor = importer.pool()->FindMessageTypeByName("Pair");

4 ）經過 Descriptor獲取Message

const Message *message = factory.GetPrototype(descriptor);

5 ）根據類型信息使用 DynamicMessage new 出這個類型的一個空對象

Message *pair = message->New();

6 ）經過 Message 的 reflection 操做 message 的各個字段

const Reflection *reflection = pair->GetReflection();

const FieldDescriptor *field = NULL;

field = descriptor->FindFieldByName("key");

reflection->SetString(pair, field,"my key");

field = descriptor->FindFieldByName("value");

reflection->SetUInt32(pair, field, 1111);

直接copy上面代碼看起來咱們上面的需求就知足了，只是惟一的缺點就是每次來個包加載一次配置文件，當時以爲性能應該和讀取磁盤的性能差很少，可是通過測試性能極差，一個進程每秒盡能夠處理1000多個包，通過分析性能瓶頸不在磁盤，而在頻繁調用malloc和free上。

看來咱們得從新考慮實現，初步的實現想法：只有protobuf描述文件更新時再從新加載，沒有更新來包只須要使用加載好的解析就能夠。這個方案看起來挺好的，性能應該不錯，通過測試，性能確實能夠，每秒能夠處理3萬左右的包，可是實現中遇到了困難。要更新原來的Message，必須更新Importer和Factory，那麼要更新這些東西，就涉及到了資源的釋放。通過研究這些資源的釋放順序特別重要，下面就介紹一下protobuf相關資源釋放策略。

動態的Message是咱們用DynamicMessageFactory構造出來的，所以銷燬Message必須用同一個DynamicMessageFactory。動態更新.proto文件時，咱們銷燬老的並使用新的DynamicMessageFactory，在銷燬DynamicMessageFactory以前，必須先刪除全部通過它構造的Message。

原理：DynamicMessageFactory裏面包含DynamicMessage的共享信息，析構DynamicMessage時須要用到。生存期必須保持Descriptor>DynamicMessageFactory>DynamicMessage。

釋放順序必須是：釋放全部DynamicMessage，釋放DynamicMessageFactory，釋放Importer。