這是一篇姊妹篇文章,淺析一下Go是如何實現protobuf編解碼的:html
本編是第二篇。git
上一篇文章Go是如何實現protobuf的編解碼的(1):原理
中已經指出了Go語言數據和Protobuf數據的編解碼是由包github.com/golang/protobuf/proto
完成的,本編就來分析一下proto包是如何實現編解碼的。github
編解碼包都有支持的編解碼類型,咱們暫且把這些類型稱爲底層類型,編解碼的本質是:golang
接下來先看編碼,再看解碼。json
約定:如下全部的代碼片,若是是request.pb.go或main.go中的代碼,會在第一行標記文件名,不然都是proto包的源碼。
// main.go package main import ( "fmt" "./types" "github.com/golang/protobuf/proto" ) func main() { req := &types.Request{Data: "Hello Dabin"} // Marshal encoded, err := proto.Marshal(req) if err != nil { fmt.Printf("Encode to protobuf data error: %v", err) } ... }
編碼調用的是proto.Marshal
函數,它能夠完成的是Go語言數據序列化成protobuf數據,返回序列化結果或錯誤。數組
proto編譯成的Go結構體都是符合Message
接口的,從Marshal
可知Go結構體有3種序列化方式:緩存
pb Message
知足newMarshaler
接口,則調用XXX_Marshal()
進行序列化。pb
知足Marshaler
接口,則調用Marshal()
進行序列化,這種方式適合某類型自定義序列化規則的狀況。pb
進行序列化,後面會介紹方式1實際就是使用方式3。// Marshal takes a protocol buffer message // and encodes it into the wire format, returning the data. // This is the main entry point. func Marshal(pb Message) ([]byte, error) { if m, ok := pb.(newMarshaler); ok { siz := m.XXX_Size() b := make([]byte, 0, siz) return m.XXX_Marshal(b, false) } if m, ok := pb.(Marshaler); ok { // If the message can marshal itself, let it do it, for compatibility. // NOTE: This is not efficient. return m.Marshal() } // in case somehow we didn't generate the wrapper if pb == nil { return nil, ErrNil } var info InternalMessageInfo siz := info.Size(pb) b := make([]byte, 0, siz) return info.Marshal(b, pb, false) }
newMarshaler
和Marshaler
以下:併發
// newMarshaler is the interface representing objects that can marshal themselves. // // This exists to support protoc-gen-go generated messages. // The proto package will stop type-asserting to this interface in the future. // // DO NOT DEPEND ON THIS. type newMarshaler interface { XXX_Size() int XXX_Marshal(b []byte, deterministic bool) ([]byte, error) } // Marshaler is the interface representing objects that can marshal themselves. type Marshaler interface { Marshal() ([]byte, error) }
Request
實現了newMarshaler
接口,XXX_Marshal
實現以下,它實際是調用了xxx_messageInfo_Request.Marshal
,xxx_messageInfo_Request
是定義在request.pb.go
中的一個全局變量,類型就是InternalMessageInfo
,實際就是前文提到的wrapper。app
// request.pb.go func (m *Request) XXX_Marshal(b []byte, deterministic bool) ([]byte, error) { print("Called xxx marshal\n") panic("I want see stack trace") return xxx_messageInfo_Request.Marshal(b, m, deterministic) } var xxx_messageInfo_Request proto.InternalMessageInfo
本質上,XXX_Marshal
也是wrapper,後面纔是真正序列化的主體函數在proto包中。less
InternalMessageInfo
主要是用來緩存序列化和反序列化須要用到的信息。
// InternalMessageInfo is a type used internally by generated .pb.go files. // This type is not intended to be used by non-generated code. // This type is not subject to any compatibility guarantee. type InternalMessageInfo struct { marshal *marshalInfo // marshal信息 unmarshal *unmarshalInfo // unmarshal信息 merge *mergeInfo discard *discardInfo }
InternalMessageInfo.Marshal
首先是獲取待序列化類型的序列化信息u marshalInfo
,而後利用u.marshal
進行序列化。
// Marshal is the entry point from generated code, // and should be ONLY called by generated code. // It marshals msg to the end of b. // a is a pointer to a place to store cached marshal info. func (a *InternalMessageInfo) Marshal(b []byte, msg Message, deterministic bool) ([]byte, error) { // 獲取該message類型的MarshalInfo,這些信息都緩存起來了 // 大量併發時無需重複建立 u := getMessageMarshalInfo(msg, a) // 入參校驗 ptr := toPointer(&msg) if ptr.isNil() { // We get here if msg is a typed nil ((*SomeMessage)(nil)), // so it satisfies the interface, and msg == nil wouldn't // catch it. We don't want crash in this case. return b, ErrNil } // 根據MarshalInfo對數據進行marshal return u.marshal(b, ptr, deterministic) }
因爲每種類型的序列化信息是一致的,因此getMessageMarshalInfo
對序列化信息進行了緩存,緩存在a.marshal
中,若是a中不存在marshal信息,則去生成,但不進行初始化,而後保存到a中。
func getMessageMarshalInfo(msg interface{}, a *InternalMessageInfo) *marshalInfo { // u := a.marshal, but atomically. // We use an atomic here to ensure memory consistency. // 從InternalMessageInfo中讀取 u := atomicLoadMarshalInfo(&a.marshal) // 讀取不到表明未保存過 if u == nil { // Get marshal information from type of message. t := reflect.ValueOf(msg).Type() if t.Kind() != reflect.Ptr { panic(fmt.Sprintf("cannot handle non-pointer message type %v", t)) } u = getMarshalInfo(t.Elem()) // Store it in the cache for later users. // a.marshal = u, but atomically. atomicStoreMarshalInfo(&a.marshal, u) } return u }
getMarshalInfo
只是建立了一個marshalInfo
對象,填充了字段typ
,剩餘的字段未填充。
// getMarshalInfo returns the information to marshal a given type of message. // The info it returns may not necessarily initialized. // t is the type of the message (NOT the pointer to it). // 獲取MarshalInfo結構體,若是不存在則使用message類型t建立1個 func getMarshalInfo(t reflect.Type) *marshalInfo { marshalInfoLock.Lock() u, ok := marshalInfoMap[t] if !ok { u = &marshalInfo{typ: t} marshalInfoMap[t] = u } marshalInfoLock.Unlock() return u } // marshalInfo is the information used for marshaling a message. type marshalInfo struct { typ reflect.Type fields []*marshalFieldInfo unrecognized field // offset of XXX_unrecognized extensions field // offset of XXX_InternalExtensions v1extensions field // offset of XXX_extensions sizecache field // offset of XXX_sizecache initialized int32 // 0 -- only typ is set, 1 -- fully initialized messageset bool // uses message set wire format hasmarshaler bool // has custom marshaler sync.RWMutex // protect extElems map, also for initialization extElems map[int32]*marshalElemInfo // info of extension elements }
marshalInfo.marshal
是Marshal真實主體,會判斷u是否已經初始化,若是未初始化調用computeMarshalInfo
計算Marshal須要的信息,實際就是填充marshalInfo
中的各類字段。
u.hasmarshaler
表明當前類型是否實現了Marshaler
接口,直接調用Marshal函數進行序列化。能夠肯定Marshal函數的序列化方式2,即實現Marshaler
接口的方法,最後確定也會調用marshalInfo.marshal
。
該函數的主體是一個for循環,依次遍歷該類型的每個字段,對required屬性進行校驗,而後按字段類型,調用f.marshaler
對該字段類型進行序列化。這個f.marshaler
哪來的呢?
// marshal is the main function to marshal a message. It takes a byte slice and appends // the encoded data to the end of the slice, returns the slice and error (if any). // ptr is the pointer to the message. // If deterministic is true, map is marshaled in deterministic order. // 該函數是Marshal的主體函數,把消息編碼爲數據後,追加到b以後,最後返回b。 // deterministic爲true表明map會以肯定的順序進行編碼。 func (u *marshalInfo) marshal(b []byte, ptr pointer, deterministic bool) ([]byte, error) { // 初始化marshalInfo的基礎信息 // 主要是根據已有信息填充該結構體的一些字段 if atomic.LoadInt32(&u.initialized) == 0 { u.computeMarshalInfo() } // If the message can marshal itself, let it do it, for compatibility. // NOTE: This is not efficient. // 若是該類型實現了Marshaler接口,即可以對本身Marshal,則自行Marshal // 結果追加到b if u.hasmarshaler { m := ptr.asPointerTo(u.typ).Interface().(Marshaler) b1, err := m.Marshal() b = append(b, b1...) return b, err } var err, errLater error // The old marshaler encodes extensions at beginning. // 檢查擴展字段,把message的擴展字段追加到b if u.extensions.IsValid() { // offset函數用來根據指針偏移量獲取message的指定字段 e := ptr.offset(u.extensions).toExtensions() if u.messageset { b, err = u.appendMessageSet(b, e, deterministic) } else { b, err = u.appendExtensions(b, e, deterministic) } if err != nil { return b, err } } if u.v1extensions.IsValid() { m := *ptr.offset(u.v1extensions).toOldExtensions() b, err = u.appendV1Extensions(b, m, deterministic) if err != nil { return b, err } } // 遍歷message的每個字段,檢查並作編碼,而後追加到b for _, f := range u.fields { if f.required { // 若是required的字段未設置,則記錄錯誤,全部的marshal工做完成後再處理 if ptr.offset(f.field).getPointer().isNil() { // Required field is not set. // We record the error but keep going, to give a complete marshaling. if errLater == nil { errLater = &RequiredNotSetError{f.name} } continue } } // 字段爲指針類型,而且爲nil,表明未設置,該字段無需編碼 if f.isPointer && ptr.offset(f.field).getPointer().isNil() { // nil pointer always marshals to nothing continue } // 利用這個字段的marshaler進行編碼 b, err = f.marshaler(b, ptr.offset(f.field), f.wiretag, deterministic) if err != nil { if err1, ok := err.(*RequiredNotSetError); ok { // required字段但未設置錯誤 // Required field in submessage is not set. // We record the error but keep going, to give a complete marshaling. if errLater == nil { errLater = &RequiredNotSetError{f.name + "." + err1.field} } continue } // 「動態數組」中包含nil元素 if err == errRepeatedHasNil { err = errors.New("proto: repeated field " + f.name + " has nil element") } if err == errInvalidUTF8 { if errLater == nil { fullName := revProtoTypes[reflect.PtrTo(u.typ)] + "." + f.name errLater = &invalidUTF8Error{fullName} } continue } return b, err } } // 爲識別的類型字段,直接轉爲bytes,追加到b // computeMarshalInfo中已經收集這些字段 if u.unrecognized.IsValid() { s := *ptr.offset(u.unrecognized).toBytes() b = append(b, s...) } return b, errLater }
computeMarshalInfo
實際上就是對要序列化的類型,進行一次全面檢查,設置好序列化要使用的數據,這其中就包含了各字段的序列化函數f.marshaler
。咱們就重點關注下這部分,struct的每個字段都會分配一個marshalFieldInfo
,表明這個字段序列化須要的信息,會調用computeMarshalFieldInfo
會填充這個對象。
// computeMarshalInfo initializes the marshal info. func (u *marshalInfo) computeMarshalInfo() { // 加鎖,表明了不能同時計算marshal信息 u.Lock() defer u.Unlock() // 計算1次便可 if u.initialized != 0 { // non-atomic read is ok as it is protected by the lock return } // 獲取要marshal的message類型 t := u.typ u.unrecognized = invalidField u.extensions = invalidField u.v1extensions = invalidField u.sizecache = invalidField // If the message can marshal itself, let it do it, for compatibility. // 判斷當前類型是否實現了Marshal接口,若是實現標記爲類型自有marshaler // 沒用類型斷言是由於t是Type類型,不是保存在某個接口的變量 // NOTE: This is not efficient. if reflect.PtrTo(t).Implements(marshalerType) { u.hasmarshaler = true atomic.StoreInt32(&u.initialized, 1) // 能夠直接返回了,後面使用自有的marshaler編碼 return } // get oneof implementers // 看*t實現瞭如下哪一個接口,oneof特性 var oneofImplementers []interface{} switch m := reflect.Zero(reflect.PtrTo(t)).Interface().(type) { case oneofFuncsIface: _, _, _, oneofImplementers = m.XXX_OneofFuncs() case oneofWrappersIface: oneofImplementers = m.XXX_OneofWrappers() } n := t.NumField() // deal with XXX fields first // 遍歷t的每個XXX字段 for i := 0; i < t.NumField(); i++ { f := t.Field(i) // 跳過非XXX開頭的字段 if !strings.HasPrefix(f.Name, "XXX_") { continue } // 處理如下幾個protobuf自帶的字段 switch f.Name { case "XXX_sizecache": u.sizecache = toField(&f) case "XXX_unrecognized": u.unrecognized = toField(&f) case "XXX_InternalExtensions": u.extensions = toField(&f) u.messageset = f.Tag.Get("protobuf_messageset") == "1" case "XXX_extensions": u.v1extensions = toField(&f) case "XXX_NoUnkeyedLiteral": // nothing to do default: panic("unknown XXX field: " + f.Name) } n-- } // normal fields // 處理message的普通字段 fields := make([]marshalFieldInfo, n) // batch allocation u.fields = make([]*marshalFieldInfo, 0, n) for i, j := 0, 0; i < t.NumField(); i++ { f := t.Field(i) // 跳過XXX字段 if strings.HasPrefix(f.Name, "XXX_") { continue } // 取fields的下一個有效字段,指針類型 // j表明了fields有效字段數量,n是包含了XXX字段的總字段數量 field := &fields[j] j++ field.name = f.Name // 填充到u.fields u.fields = append(u.fields, field) // 字段的tag裏包含「protobuf_oneof」特殊處理 if f.Tag.Get("protobuf_oneof") != "" { field.computeOneofFieldInfo(&f, oneofImplementers) continue } // 字段裏不包含「protobuf」,表明不是protoc自動生成的字段 if f.Tag.Get("protobuf") == "" { // field has no tag (not in generated message), ignore it // 刪除剛剛保存的字段信息 u.fields = u.fields[:len(u.fields)-1] j-- continue } // 填充字段的marshal信息 field.computeMarshalFieldInfo(&f) } // fields are marshaled in tag order on the wire. // 字段排序 sort.Sort(byTag(u.fields)) // 初始化完成 atomic.StoreInt32(&u.initialized, 1) }
回顧一下Request
的定義,它包含1個字段Data,後面protobuf:...
描述了protobuf要使用的信息,"bytes,..."
這段被稱爲tags,用逗號進行分割後,其中:
// request.pb.go type Request struct{ Data string `protobuf:"bytes,1,opt,name=data,proto3" json:"data,omitempty"` ... }
computeMarshalFieldInfo
首先要獲取字段ID和要轉換的類型,填充到marshalFieldInfo
,而後調用setMarshaler
利用字段f和tags獲取該字段類型的序列化函數。
// computeMarshalFieldInfo fills up the information to marshal a field. func (fi *marshalFieldInfo) computeMarshalFieldInfo(f *reflect.StructField) { // parse protobuf tag of the field. // tag has format of "bytes,49,opt,name=foo,def=hello!" // 獲取"protobuf"的完整tag,而後使用,分割,獲得上面的格式 tags := strings.Split(f.Tag.Get("protobuf"), ",") if tags[0] == "" { return } // tag的編號,即message中設置的string name = x,則x就是這個字段的tag id tag, err := strconv.Atoi(tags[1]) if err != nil { panic("tag is not an integer") } // 要轉換成的類型,bytes,varint等等 wt := wiretype(tags[0]) // 設置字段是required仍是opt if tags[2] == "req" { fi.required = true } // 設置field和tag信息到marshalFieldInfo fi.setTag(f, tag, wt) // 根據當前的tag信息(類型等),選擇marshaler函數 fi.setMarshaler(f, tags) }
setMarshaler
的重點是typeMarshaler
,typeMarshaler
這個函數很是長,其實就是根據類型設置返回對於的序列化函數,好比Bool、Int3二、Uint32...,若是是結構體、切片等複合類型,就能夠造成遞歸了。
// setMarshaler fills up the sizer and marshaler in the info of a field. func (fi *marshalFieldInfo) setMarshaler(f *reflect.StructField, tags []string) { // map類型字段特殊處理 switch f.Type.Kind() { case reflect.Map: // map field fi.isPointer = true fi.sizer, fi.marshaler = makeMapMarshaler(f) return case reflect.Ptr, reflect.Slice: // 指針字段和切片字段標記指針類型 fi.isPointer = true } // 根據字段類型和tag選擇marshaler fi.sizer, fi.marshaler = typeMarshaler(f.Type, tags, true, false) } // typeMarshaler returns the sizer and marshaler of a given field. // t is the type of the field. // tags is the generated "protobuf" tag of the field. // If nozero is true, zero value is not marshaled to the wire. // If oneof is true, it is a oneof field. // 函數很是長,省略內容 func typeMarshaler(t reflect.Type, tags []string, nozero, oneof bool) (sizer, marshaler) { ... switch t.Kind() { case reflect.Bool: if pointer { return sizeBoolPtr, appendBoolPtr } if slice { if packed { return sizeBoolPackedSlice, appendBoolPackedSlice } return sizeBoolSlice, appendBoolSlice } if nozero { return sizeBoolValueNoZero, appendBoolValueNoZero } return sizeBoolValue, appendBoolValue case reflect.Uint32: ... case reflect.Int32: .... case reflect.Struct: ... }
如下是Bool和String類型的2個序列化函數示例:
func appendBoolValue(b []byte, ptr pointer, wiretag uint64, _ bool) ([]byte, error) { v := *ptr.toBool() b = appendVarint(b, wiretag) if v { b = append(b, 1) } else { b = append(b, 0) } return b, nil }
func appendStringValue(b []byte, ptr pointer, wiretag uint64, _ bool) ([]byte, error) { v := *ptr.toString() b = appendVarint(b, wiretag) b = appendVarint(b, uint64(len(v))) b = append(b, v...) return b, nil }
因此序列化後的[]byte
,應當是符合這種模式:
| wiretag | data | wiretag | data | ... | data |
OK,以上就是編碼的主要流程,簡單回顧一下:
proto.Marshal
會調用*.pb.go
中自動生成的Wrapper函數,Wrapper函數會調用InternalMessageInfo
進行序列化,而後才步入序列化的正題[]byte
,因此字段編碼完成,則返回序列化的結果[]byte
或者錯誤。解碼的流程其實與編碼很相似,會是上面回顧的3大步驟,主要的區別在步驟2:它要獲取的是序列化類型的unmarshal信息u,若是u沒有初始化,會進行初始化,設置的是結構體每一個字段的反序列化函數,以及其餘信息。
因此解碼的函數解析會簡要的過一遍,再也不有編碼那麼詳細的解釋。
下面是proto包中反序列化的接口和函數定義:
// Unmarshaler is the interface representing objects that can // unmarshal themselves. The argument points to data that may be // overwritten, so implementations should not keep references to the // buffer. // Unmarshal implementations should not clear the receiver. // Any unmarshaled data should be merged into the receiver. // Callers of Unmarshal that do not want to retain existing data // should Reset the receiver before calling Unmarshal. type Unmarshaler interface { Unmarshal([]byte) error } // newUnmarshaler is the interface representing objects that can // unmarshal themselves. The semantics are identical to Unmarshaler. // // This exists to support protoc-gen-go generated messages. // The proto package will stop type-asserting to this interface in the future. // // DO NOT DEPEND ON THIS. type newUnmarshaler interface { // 實現了XXX_Unmarshal XXX_Unmarshal([]byte) error } // Unmarshal parses the protocol buffer representation in buf and places the // decoded result in pb. If the struct underlying pb does not match // the data in buf, the results can be unpredictable. // // Unmarshal resets pb before starting to unmarshal, so any // existing data in pb is always removed. Use UnmarshalMerge // to preserve and append to existing data. func Unmarshal(buf []byte, pb Message) error { pb.Reset() // pb本身有unmarshal函數,實現了newUnmarshaler接口 if u, ok := pb.(newUnmarshaler); ok { return u.XXX_Unmarshal(buf) } // pb本身有unmarshal函數,實現了Unmarshaler接口 if u, ok := pb.(Unmarshaler); ok { return u.Unmarshal(buf) } // 使用默認的Unmarshal return NewBuffer(buf).Unmarshal(pb) }
Request
實現了Unmarshaler
接口:
// request.pb.go func (m *Request) XXX_Unmarshal(b []byte) error { return xxx_messageInfo_Request.Unmarshal(m, b) }
反序列化也是使用InternalMessageInfo
進行。
// Unmarshal is the entry point from the generated .pb.go files. // This function is not intended to be used by non-generated code. // This function is not subject to any compatibility guarantee. // msg contains a pointer to a protocol buffer struct. // b is the data to be unmarshaled into the protocol buffer. // a is a pointer to a place to store cached unmarshal information. func (a *InternalMessageInfo) Unmarshal(msg Message, b []byte) error { // Load the unmarshal information for this message type. // The atomic load ensures memory consistency. // 獲取保存在a中的unmarshal信息 u := atomicLoadUnmarshalInfo(&a.unmarshal) if u == nil { // Slow path: find unmarshal info for msg, update a with it. u = getUnmarshalInfo(reflect.TypeOf(msg).Elem()) atomicStoreUnmarshalInfo(&a.unmarshal, u) } // Then do the unmarshaling. // 執行unmarshal err := u.unmarshal(toPointer(&msg), b) return err }
如下是反序列化的主題函數,u未初始化時會調用computeUnmarshalInfo
設置反序列化須要的信息。
// unmarshal does the main work of unmarshaling a message. // u provides type information used to unmarshal the message. // m is a pointer to a protocol buffer message. // b is a byte stream to unmarshal into m. // This is top routine used when recursively unmarshaling submessages. func (u *unmarshalInfo) unmarshal(m pointer, b []byte) error { if atomic.LoadInt32(&u.initialized) == 0 { // 爲u填充unmarshal信息,以及設置每一個字段類型的unmarshaler函數 u.computeUnmarshalInfo() } if u.isMessageSet { return unmarshalMessageSet(b, m.offset(u.extensions).toExtensions()) } var reqMask uint64 // bitmask of required fields we've seen. var errLater error for len(b) > 0 { // Read tag and wire type. // Special case 1 and 2 byte varints. var x uint64 if b[0] < 128 { x = uint64(b[0]) b = b[1:] } else if len(b) >= 2 && b[1] < 128 { x = uint64(b[0]&0x7f) + uint64(b[1])<<7 b = b[2:] } else { var n int x, n = decodeVarint(b) if n == 0 { return io.ErrUnexpectedEOF } b = b[n:] } // 獲取tag和wire標記 tag := x >> 3 wire := int(x) & 7 // Dispatch on the tag to one of the unmarshal* functions below. // 根據tag選擇該類型的unmarshalFieldInfo:f var f unmarshalFieldInfo if tag < uint64(len(u.dense)) { f = u.dense[tag] } else { f = u.sparse[tag] } // 若是該類型有unmarshaler函數,則執行解碼和錯誤處理 if fn := f.unmarshal; fn != nil { var err error // 從b解析,而後填充到f的對應字段 b, err = fn(b, m.offset(f.field), wire) if err == nil { reqMask |= f.reqMask continue } if r, ok := err.(*RequiredNotSetError); ok { // Remember this error, but keep parsing. We need to produce // a full parse even if a required field is missing. if errLater == nil { errLater = r } reqMask |= f.reqMask continue } if err != errInternalBadWireType { if err == errInvalidUTF8 { if errLater == nil { fullName := revProtoTypes[reflect.PtrTo(u.typ)] + "." + f.name errLater = &invalidUTF8Error{fullName} } continue } return err } // Fragments with bad wire type are treated as unknown fields. } // Unknown tag. // 跳過未知tag,多是proto中的message定義升級了,增長了一些字段,使用老版本的,就不識別新的字段 if !u.unrecognized.IsValid() { // Don't keep unrecognized data; just skip it. var err error b, err = skipField(b, wire) if err != nil { return err } continue } // 檢查未識別字段是否是extension // Keep unrecognized data around. // maybe in extensions, maybe in the unrecognized field. z := m.offset(u.unrecognized).toBytes() var emap map[int32]Extension var e Extension for _, r := range u.extensionRanges { if uint64(r.Start) <= tag && tag <= uint64(r.End) { if u.extensions.IsValid() { mp := m.offset(u.extensions).toExtensions() emap = mp.extensionsWrite() e = emap[int32(tag)] z = &e.enc break } if u.oldExtensions.IsValid() { p := m.offset(u.oldExtensions).toOldExtensions() emap = *p if emap == nil { emap = map[int32]Extension{} *p = emap } e = emap[int32(tag)] z = &e.enc break } panic("no extensions field available") } } // Use wire type to skip data. var err error b0 := b b, err = skipField(b, wire) if err != nil { return err } *z = encodeVarint(*z, tag<<3|uint64(wire)) *z = append(*z, b0[:len(b0)-len(b)]...) if emap != nil { emap[int32(tag)] = e } } // 校驗解析到的required字段的數量,若是與u中記錄的不匹配,則報錯 if reqMask != u.reqMask && errLater == nil { // A required field of this message is missing. for _, n := range u.reqFields { if reqMask&1 == 0 { errLater = &RequiredNotSetError{n} } reqMask >>= 1 } } return errLater }
設置字段反序列化函數的過程不看了,看一下怎麼選函數的,typeUnmarshaler
是爲字段類型,選擇反序列化函數,這些函數選擇與序列化函數是一一對應的。
// typeUnmarshaler returns an unmarshaler for the given field type / field tag pair. func typeUnmarshaler(t reflect.Type, tags string) unmarshaler { ... // Figure out packaging (pointer, slice, or both) slice := false pointer := false if t.Kind() == reflect.Slice && t.Elem().Kind() != reflect.Uint8 { slice = true t = t.Elem() } if t.Kind() == reflect.Ptr { pointer = true t = t.Elem() } ... switch t.Kind() { case reflect.Bool: if pointer { return unmarshalBoolPtr } if slice { return unmarshalBoolSlice } return unmarshalBoolValue } }
unmarshalBoolValue
是默認的Bool類型反序列化函數,會把protobuf數據b解碼,而後轉換爲bool類型v,最後賦值給字段f。
func unmarshalBoolValue(b []byte, f pointer, w int) ([]byte, error) { if w != WireVarint { return b, errInternalBadWireType } // Note: any length varint is allowed, even though any sane // encoder will use one byte. // See https://github.com/golang/protobuf/issues/76 x, n := decodeVarint(b) if n == 0 { return nil, io.ErrUnexpectedEOF } // TODO: check if x>1? Tests seem to indicate no. // toBool是返回bool類型的指針 // 完成對字段f的賦值 v := x != 0 *f.toBool() = v return b[n:], nil }
本文分析了Go語言protobuf數據的序列化和反序列過程,能夠簡要歸納爲:
proto.Marshal
和proto.Unmarshal
會調用*.pb.go
中自動生成的Wrapper函數,Wrapper函數會調用InternalMessageInfo
進行(反)序列化,而後才步入(反)序列化的正題如下參考文章都值得閱讀:
.proto
文件的語法,就如同Go語言的語法,不懂語法怎麼編寫.proto
文件?讀這篇文章會了解不少原理,以及能夠少踩坑,必讀。.protoc
生成.pb.go
的,可選。.pb.go
文件配合使用。