golang 中獲取字符串個數

時間 2019-11-08

標籤 golang 獲取字符串個數欄目 Go 简体版

原文原文鏈接

golang 中獲取字符串個數

在 golang 中不能直接用 len 函數來統計字符串長度，查看了下源碼發現字符串是以 UTF-8 爲格式存儲的，說明 len 函數是取得包含 byte 的個數html

// string is the set of all strings of 8-bit bytes, conventionally but not
// necessarily representing UTF-8-encoded text. A string may be empty, but
// not nil. Values of string type are immutable.

舉個例子，」Hello, 世界「(由於，對比因此用了中文)golang

s := "Hello, 世界"
fmt.Println(len(s)) // 13
fmt.Println([]byte(s)) // [72 101 108 108 111 44 32 228 184 150 231 149 140]

既然是以 byte 存儲的，那天然就想到了取 byte 的長度函數

- bytes.Count() 
- strings.Count() 
- 將字符串轉換爲 []runee 後調用 len 函數
- 使用 utf8.RuneCountInString()

package main

import (
    "bytes"
    "fmt"
    "strings"
    "testing"
    "unicode/utf8"
)

/*
在 golang 中不能直接用 len 函數來統計字符串長度，查看了下源碼發現字符串是以 UTF-8 爲格式存儲的，說明 len 函數是取得包含 byte 的個數

*/

func main() {

    s := "hello, 世界"
    fmt.Println(len(s))    // 13
    fmt.Println([]byte(s)) // [72 101 108 108 111 44 32 228 184 150 231 149 140]

    fmt.Print(f1(s))
}

func f1(s string) int {
    return bytes.Count([]byte(s), nil) - 1
}

func f2(s string) int {
    return strings.Count(s, "") - 1
}

func f3(s string) int {
    return len([]rune(s))
}

func f4(s string) int {
    return utf8.RuneCountInString(s)
}

var s = "Hello, 世界"

func Benchmark1(b *testing.B) {
    for i := 0; i < b.N; i++ {
        f1(s)
    }
}

func Benchmark2(b *testing.B) {
    for i := 0; i < b.N; i++ {
        f2(s)
    }
}

func Benchmark3(b *testing.B) {
    for i := 0; i < b.N; i++ {
        f3(s)
    }
}

func Benchmark4(b *testing.B) {
    for i := 0; i < b.N; i++ {
        f4(s)
    }
}

在 golang ldea配置中我沒有看到 benchamark配置，總說包不對，在命令行中輸入命令行

go test stringCount_test.go -bench ".*"

獲得如下結果code

Benchmark1-12           100000000               17.7 ns/op
Benchmark2-12           100000000               14.0 ns/op
Benchmark3-12           100000000               14.5 ns/op
Benchmark4-12           100000000               13.1 ns/op

最快的是utf8.RuneCountInString()htm

參考unicode

更多文章關注字符串