你真的懂string與[]byte的轉換了嗎

string類型和[]byte類型是咱們編程時最常使用到的數據結構。本文將探討二者之間的轉換方式，經過分析它們之間的內在聯繫來撥開迷霧。golang

兩種轉換方式web

標準轉換編程

go中string與[]byte的互換，相信每一位gopher都能馬上想到如下的轉換方式，咱們將之稱爲標準轉換。設計模式

1// string to []byte
2s1 := "hello"
3b := []byte(s1)
4
5// []byte to string
6s2 := string(b)

強轉換數組

經過unsafe和reflect包，能夠實現另一種轉換方式，咱們將之稱爲強轉換（也經常被人稱做黑魔法）。安全

 1func String2Bytes(s string) []byte {
 2    sh := (*reflect.StringHeader)(unsafe.Pointer(&s))
 3    bh := reflect.SliceHeader{
 4        Data: sh.Data,
 5        Len:  sh.Len,
 6        Cap:  sh.Len,
 7    }
 8    return *(*[]byte)(unsafe.Pointer(&bh))
 9}
10
11func Bytes2String(b []byte) string {
12    return *(*string)(unsafe.Pointer(&b))
13}

性能對比

既然有兩種轉換方式，那麼咱們有必要對它們作性能對比。微信

 1// 測試強轉換功能
 2func TestBytes2String(t *testing.T) {
 3    x := []byte("Hello Gopher!")
 4    y := Bytes2String(x)
 5    z := string(x)
 6
 7    if y != z {
 8        t.Fail()
 9    }
10}
11
12// 測試強轉換功能
13func TestString2Bytes(t *testing.T) {
14    x := "Hello Gopher!"
15    y := String2Bytes(x)
16    z := []byte(x)
17
18    if !bytes.Equal(y, z) {
19        t.Fail()
20    }
21}
22
23// 測試標準轉換string()性能
24func Benchmark_NormalBytes2String(b *testing.B) {
25    x := []byte("Hello Gopher! Hello Gopher! Hello Gopher!")
26    for i := 0; i < b.N; i++ {
27        _ = string(x)
28    }
29}
30
31// 測試強轉換[]byte到string性能
32func Benchmark_Byte2String(b *testing.B) {
33    x := []byte("Hello Gopher! Hello Gopher! Hello Gopher!")
34    for i := 0; i < b.N; i++ {
35        _ = Bytes2String(x)
36    }
37}
38
39// 測試標準轉換[]byte性能
40func Benchmark_NormalString2Bytes(b *testing.B) {
41    x := "Hello Gopher! Hello Gopher! Hello Gopher!"
42    for i := 0; i < b.N; i++ {
43        _ = []byte(x)
44    }
45}
46
47// 測試強轉換string到[]byte性能
48func Benchmark_String2Bytes(b *testing.B) {
49    x := "Hello Gopher! Hello Gopher! Hello Gopher!"
50    for i := 0; i < b.N; i++ {
51        _ = String2Bytes(x)
52    }
53}

測試結果以下數據結構

 1$ go test -bench="." -benchmem
 2goos: darwin
 3goarch: amd64
 4pkg: workspace/example/stringBytes
 5Benchmark_NormalBytes2String-8          38363413                27.9 ns/op            48 B/op          1 allocs/op
 6Benchmark_Byte2String-8                 1000000000               0.265 ns/op           0 B/op          0 allocs/op
 7Benchmark_NormalString2Bytes-8          32577080                34.8 ns/op            48 B/op          1 allocs/op
 8Benchmark_String2Bytes-8                1000000000               0.532 ns/op           0 B/op          0 allocs/op
 9PASS
10ok      workspace/example/stringBytes   3.170s

注意，-benchmem能夠提供每次操做分配內存的次數，以及每次操做分配的字節數。併發

當x的數據均爲"Hello Gopher!"時，測試結果以下app

 1$ go test -bench="." -benchmem
 2goos: darwin
 3goarch: amd64
 4pkg: workspace/example/stringBytes
 5Benchmark_NormalBytes2String-8          245907674                4.86 ns/op            0 B/op          0 allocs/op
 6Benchmark_Byte2String-8                 1000000000               0.266 ns/op           0 B/op          0 allocs/op
 7Benchmark_NormalString2Bytes-8          202329386                5.92 ns/op            0 B/op          0 allocs/op
 8Benchmark_String2Bytes-8                1000000000               0.532 ns/op           0 B/op          0 allocs/op
 9PASS
10ok      workspace/example/stringBytes   4.383s

強轉換方式的性能會明顯優於標準轉換。

讀者能夠思考如下問題

1.爲何強轉換性能會比標準轉換好？

2.爲何在上述測試中，當x的數據較大時，標準轉換方式會有一次分配內存的操做，從而致使其性能更差，而強轉換方式卻不受影響？

3.既然強轉換方式性能這麼好，爲何go語言提供給咱們使用的是標準轉換方式？

原理分析

要回答以上三個問題，首先要明白是string和[]byte在go中究竟是什麼。

[]byte

在go中，byte是uint8的別名，在go標準庫builtin中有以下說明：

1// byte is an alias for uint8 and is equivalent to uint8 in all ways. It is
2// used, by convention, to distinguish byte values from 8-bit unsigned
3// integer values.
4type byte = uint8

在go的源碼中src/runtime/slice.go，slice的定義以下：

1type slice struct {
2    array unsafe.Pointer
3    len   int
4    cap   int
5}

array是底層數組的指針，len表示長度，cap表示容量。對於[]byte來講，array指向的就是byte數組。

string

關於string類型，在go標準庫builtin中有以下說明：

1// string is the set of all strings of 8-bit bytes, conventionally but not
2// necessarily representing UTF-8-encoded text. A string may be empty, but
3// not nil. Values of string type are immutable.
4type string string

翻譯過來就是：string是8位字節的集合，一般但不必定表明UTF-8編碼的文本。string能夠爲空，可是不能爲nil。string的值是不能改變的。

在go的源碼中src/runtime/string.go，string的定義以下：

1type stringStruct struct {
2    str unsafe.Pointer
3    len int
4}

stringStruct表明的就是一個string對象，str指針指向的是某個數組的首地址，len表明的數組長度。那麼這個數組是什麼呢？咱們能夠在實例化stringStruct對象時找到答案。

1//go:nosplit
2func gostringnocopy(str *byte) string {
3    ss := stringStruct{str: unsafe.Pointer(str), len: findnull(str)}
4    s := *(*string)(unsafe.Pointer(&ss))
5    return s
6}

能夠看到，入參str指針就是指向byte的指針，那麼咱們能夠肯定string的底層數據結構就是byte數組。

綜上，string與[]byte在底層結構上是很是的相近（後者的底層表達僅多了一個cap屬性，所以它們在內存佈局上是可對齊的），這也就是爲什麼builtin中內置函數copy會有一種特殊狀況copy(dst []byte, src string) int的緣由了。

1// The copy built-in function copies elements from a source slice into a
2// destination slice. (As a special case, it also will copy bytes from a
3// string to a slice of bytes.) The source and destination may overlap. Copy
4// returns the number of elements copied, which will be the minimum of
5// len(src) and len(dst).
6func copy(dst, src []Type) int
7

區別

對於[]byte與string而言，二者之間最大的區別就是string的值不能改變。這該如何理解呢？下面經過兩個例子來講明。

對於[]byte來講，如下操做是可行的：

1b := []byte("Hello Gopher!")
2b [1] = 'T'

string，修改操做是被禁止的：

1s := "Hello Gopher!"
2s[1] = 'T'

而string能支持這樣的操做：

1s := "Hello Gopher!"
2s = "Tello Gopher!"

字符串的值不能被更改，但能夠被替換。string在底層都是結構體stringStruct{str: str_point, len: str_len}，string結構體的str指針指向的是一個字符常量的地址，這個地址裏面的內容是不能夠被改變的，由於它是隻讀的，可是這個指針能夠指向不一樣的地址。

那麼，如下操做的含義是不一樣的：

1s := "S1" // 分配存儲"S1"的內存空間，s結構體裏的str指針指向這塊內存
2s = "S2"  // 分配存儲"S2"的內存空間，s結構體裏的str指針轉爲指向這塊內存
3
4b := []byte{1} // 分配存儲'1'數組的內存空間，b結構體的array指針指向這個數組。
5b = []byte{2}  // 將array的內容改成'2'

圖解以下

由於string的指針指向的內容是不能夠更改的，因此每更改一次字符串，就得從新分配一次內存，以前分配的空間還須要gc回收，這是致使string相較於[]byte操做低效的根本緣由。

標準轉換的實現細節

[]byte(string)的實現（源碼在src/runtime/string.go中）

 1// The constant is known to the compiler.
 2// There is no fundamental theory behind this number.
 3const tmpStringBufSize = 32
 4
 5type tmpBuf [tmpStringBufSize]byte
 6
 7func stringtoslicebyte(buf *tmpBuf, s string) []byte {
 8    var b []byte
 9    if buf != nil && len(s) <= len(buf) {
10        *buf = tmpBuf{}
11        b = buf[:len(s)]
12    } else {
13        b = rawbyteslice(len(s))
14    }
15    copy(b, s)
16    return b
17}
18
19// rawbyteslice allocates a new byte slice. The byte slice is not zeroed.
20func rawbyteslice(size int) (b []byte) {
21    cap := roundupsize(uintptr(size))
22    p := mallocgc(cap, nil, false)
23    if cap != uintptr(size) {
24        memclrNoHeapPointers(add(p, uintptr(size)), cap-uintptr(size))
25    }
26
27    *(*slice)(unsafe.Pointer(&b)) = slice{p, size, int(cap)}
28    return
29}

這裏有兩種狀況：s的長度是否大於32。當大於32時，go須要調用mallocgc分配一塊新的內存（大小由s決定），這也就回答了上文中的問題2：當x的數據較大時，標準轉換方式會有一次分配內存的操做。

最後經過copy函數實現string到[]byte的拷貝，具體實如今src/runtime/slice.go中的slicestringcopy方法。

 1func slicestringcopy(to []byte, fm string) int {
 2    if len(fm) == 0 || len(to) == 0 {
 3        return 0
 4    }
 5
 6  // copy的長度取決與string和[]byte的長度最小值
 7    n := len(fm)
 8    if len(to) < n {
 9        n = len(to)
10    }
11
12  // 若是開啓了競態檢測 -race
13    if raceenabled {
14        callerpc := getcallerpc()
15        pc := funcPC(slicestringcopy)
16        racewriterangepc(unsafe.Pointer(&to[0]), uintptr(n), callerpc, pc)
17    }
18  // 若是開啓了memory sanitizer -msan
19    if msanenabled {
20        msanwrite(unsafe.Pointer(&to[0]), uintptr(n))
21    }
22
23  // 該方法將string的底層數組從頭部複製n個到[]byte對應的底層數組中去（這裏就是copy實現的核心方法，在彙編層面實現 源文件爲memmove_*.s）
24    memmove(unsafe.Pointer(&to[0]), stringStructOf(&fm).str, uintptr(n))
25    return n
26}

copy實現過程圖解以下

string([]byte)的實現（源碼也在src/runtime/string.go中）

 1// Buf is a fixed-size buffer for the result,
 2// it is not nil if the result does not escape.
 3func slicebytetostring(buf *tmpBuf, b []byte) (str string) {
 4    l := len(b)
 5    if l == 0 {
 6        // Turns out to be a relatively common case.
 7        // Consider that you want to parse out data between parens in "foo()bar",
 8        // you find the indices and convert the subslice to string.
 9        return ""
10    }
11  // 若是開啓了競態檢測 -race
12    if raceenabled {
13        racereadrangepc(unsafe.Pointer(&b[0]),
14            uintptr(l),
15            getcallerpc(),
16            funcPC(slicebytetostring))
17    }
18  // 若是開啓了memory sanitizer -msan
19    if msanenabled {
20        msanread(unsafe.Pointer(&b[0]), uintptr(l))
21    }
22    if l == 1 {
23        stringStructOf(&str).str = unsafe.Pointer(&staticbytes[b[0]])
24        stringStructOf(&str).len = 1
25        return
26    }
27
28    var p unsafe.Pointer
29    if buf != nil && len(b) <= len(buf) {
30        p = unsafe.Pointer(buf)
31    } else {
32        p = mallocgc(uintptr(len(b)), nil, false)
33    }
34    stringStructOf(&str).str = p
35    stringStructOf(&str).len = len(b)
36  // 拷貝字節數組至字符串
37    memmove(p, (*(*slice)(unsafe.Pointer(&b))).array, uintptr(len(b)))
38    return
39}
40
41// 實例stringStruct對象
42func stringStructOf(sp *string) *stringStruct {
43    return (*stringStruct)(unsafe.Pointer(sp))
44}

可見，當數組長度超過32時，一樣須要調用mallocgc分配一塊新內存。最後經過memmove完成拷貝。

強轉換的實現細節

1. 萬能的unsafe.Pointer指針

在go中，任何類型的指針*T均可以轉換爲unsafe.Pointer類型的指針，它能夠存儲任何變量的地址。同時，unsafe.Pointer類型的指針也能夠轉換回普通指針，並且能夠沒必要和以前的類型*T相同。另外，unsafe.Pointer類型還能夠轉換爲uintptr類型，該類型保存了指針所指向地址的數值，從而可使咱們對地址進行數值計算。以上就是強轉換方式的實現依據。

而string和slice在reflect包中，對應的結構體是reflect.StringHeader和reflect.SliceHeader，它們是string和slice的運行時表達。

 1type StringHeader struct {
 2    Data uintptr
 3    Len  int
 4}
 5
 6type SliceHeader struct {
 7    Data uintptr
 8    Len  int
 9    Cap  int
10}

2. 內存佈局

從string和slice的運行時表達能夠看出，除了SilceHeader多了一個int類型的Cap字段，Date和Len字段是一致的。因此，它們的內存佈局是可對齊的，這說明咱們就能夠直接經過unsafe.Pointer進行轉換。

[]byte轉string圖解

string轉[]byte圖解

Q1.爲何強轉換性能會比標準轉換好？

對於標準轉換，不管是從[]byte轉string仍是string轉[]byte都會涉及底層數組的拷貝。而強轉換是直接替換指針的指向，從而使得string和[]byte指向同一個底層數組。這樣，固然後者的性能會更好。

Q2.爲何在上述測試中，當x的數據較大時，標準轉換方式會有一次分配內存的操做，從而致使其性能更差，而強轉換方式卻不受影響？

標準轉換時，當數據長度大於32個字節時，須要經過mallocgc申請新的內存，以後再進行數據拷貝工做。而強轉換隻是更改指針指向。因此，當轉換數據較大時，二者性能差距會越發明顯。

Q3.既然強轉換方式性能這麼好，爲何go語言提供給咱們使用的是標準轉換方式？

首先，咱們須要知道Go是一門類型安全的語言，而安全的代價就是性能的妥協。可是，性能的對比是相對的，這點性能的妥協對於如今的機器而言微乎其微。另外強轉換的方式，會給咱們的程序帶來極大的安全隱患。

以下示例

1a := "hello"
2b := String2Bytes(a)
3b[0] = 'H'

a是string類型，前面咱們講到它的值是不可修改的。經過強轉換將a的底層數組賦給b，而b是一個[]byte類型，它的值是能夠修改的，因此這時對底層數組的值進行修改，將會形成嚴重的錯誤（經過defer+recover也不能捕獲）。

1unexpected fault address 0x10b6139
2fatal error: fault
3[signal SIGBUS: bus error code=0x2 addr=0x10b6139 pc=0x1088f2c]

Q4. 爲何string要設計爲不可修改？

我認爲有必要思考一下該問題。string不可修改，意味它是隻讀屬性，這樣的好處就是：在併發場景下，咱們能夠在不加鎖的控制下，屢次使用同一字符串，在保證高效共享的狀況下而不用擔憂安全問題。

取捨場景

在你不肯定安全隱患的條件下，儘可能採用標準方式進行數據轉換。
當程序對運行性能有高要求，同時知足對數據僅僅只有讀操做的條件，且存在頻繁轉換（例如消息轉發場景），可使用強轉換。

題外話

因爲我的在18年以後申請的微信公衆號均沒有了留言功能，因此一直不能和讀者進行交流反饋。苦盡甘來，公衆號新增了讀者討論功能，能夠經過它代替留言功能啦。讀者朋友們有什麼想說的，盡情留言吧！

往期精彩推薦

Golang技術分享

長按識別二維碼關注咱們

更多golang學習資料

回覆關鍵詞1024

本文分享自微信公衆號 - Golang技術分享（gh_1ac13c0742b7）。
若有侵權，請聯繫 support@oschina.cn 刪除。
本文參與「OSC源創計劃」，歡迎正在閱讀的你也加入，一塊兒分享。

你真的懂string與[]byte的轉換了嗎

性能對比

[]byte

string

區別

標準轉換的實現細節

強轉換的實現細節

Q&A

取捨場景