golang的重要數據結構-slice和map

首先,golang默認都是採用值傳遞,即拷貝傳遞。java

這時就有人以爲疑惑了,爲何slice和map在局部變量也能改變外部變量???golang

其實,他們在函數傳參的時候,仍是值傳遞,只不過slice和map,channel這些屬於指針類型,算法

每次copy一個指針消耗很是低(由於只開闢了一點的堆內存,並不像數組那種須要開闢一個大的棧內存),指針指向的地址仍是原來那些內容。segmentfault

其實就是傳遞指向該地址的指針副本,指針佔4個字節。數組

slice的數據結構數據結構

type slice struct{
    array unsafe.Point   //底層數組的指針
    len int
    cap int
}

從這裏能夠看出slice佔24個字節(8+8+8)app

再看看appendint的操做:函數

func appendInt(x []int, y int) []int {
    var z []int
    zlen := len(x) + 1
    if zlen <= cap(x) {
        // There is room to grow.  Extend the slice.
        z = x[:zlen]
    } else {
        // There is insufficient space.  Allocate a new array.
        // Grow by doubling, for amortized linear complexity.
        zcap := zlen
        if zcap < 2*len(x) {
            zcap = 2 * len(x)
        }
        z = make([]int, zlen, zcap)
        copy(z, x) // a built-in function; see text
    }
    z[len(x)] = y
    return z
}

其中len是slice中存有數據的大小,cap是底層數組的大小。性能

當作切片操做時,引用的是源slice的底層數組,只有在作append操做時若是超過了cap,就會自動分配一個新的底層數組,cap大小是原cap的兩倍(反正就是2的n次方)。ui

    sli := make([]int, 0, 1)
    fmt.Printf("%p, len = %v, cap = %v, byte = %v\n", sli, len(sli), cap(sli), unsafe.Sizeof(sli))
    sli = append(sli, 1)
    fmt.Printf("%p, len = %v, cap = %v, byte = %v\n", sli, len(sli), cap(sli), unsafe.Sizeof(sli))
    sli = append(sli, 2)
    fmt.Printf("%p, len = %v, cap = %v, byte = %v\n", sli, len(sli), cap(sli), unsafe.Sizeof(sli))
    sli = append(sli, 3)
    fmt.Printf("%p, len = %v, cap = %v, byte = %v\n", sli, len(sli), cap(sli), unsafe.Sizeof(sli))
    sli = append(sli, 4)
    fmt.Printf("%p, len = %v, cap = %v, byte = %v\n", sli, len(sli), cap(sli), unsafe.Sizeof(sli))
    sli = append(sli, 5)
    fmt.Printf("%p, len = %v, cap = %v, byte = %v\n", sli, len(sli), cap(sli), unsafe.Sizeof(sli))

仔細觀察一下,如何擴容的,地址變化了,可是大小永遠不會變一直爲24個字節:

 

只有sclie, map, chan能夠用%p來打印地址

    sli1 := []int{4, 5, 6, 7, 7, 7, 7, 77, 7, 77, 7, 7, 7, 7, 7, 7, 7, 7, 7}
    fmt.Printf("slice : %p, len = %v, cap = %v, byte = %v\n\n", sli1, len(sli1), cap(sli1), unsafe.Sizeof(sli1))    
    arr := [...]int{4, 5, 6, 7, 7, 7, 7, 77, 7, 77, 7, 7, 7, 7, 7, 7, 7, 7, 7}
    fmt.Printf("arr : %p, len = %v, cap = %v, byte = %v\n\n", &arr, len(arr), cap(arr), unsafe.Sizeof(arr))

切片大小一直爲24,數組這時候爲152,因此說平時若是數據量大,傳遞切片的指針copy值是要比數組的copy值效率高不少!!!

 

----------------------------------------------------------------- 如下講一下map -----------------------------------------------------------------

map是必需要經過make初始化的,不然會報錯!

golang的map就是用哈希表來實現的,hashmap底層的數據結構:數組 + 鏈表

hashmap 經過一個 bucket 數組實現,HashMap會首先經過一個哈希函數將key轉換爲數組下標,全部元素將被 hash 到數組中的 bucket 中,bucket 填滿後,

將經過一個 overflow 指針來擴展一個 bucket 出來造成鏈表,也就是解決衝突問題。

兩個鍵值對哈希運算後的值相同時就會發生哈希碰撞,當發生哈希碰撞時,鍵值對就會存儲在該數組對應鏈表的下一個節點上。

儘管這樣,HashMap的操做效率也是很高的。當不存在哈希碰撞時查找複雜度爲O(1),存在哈希碰撞時複雜度爲O(N)。因此,但從性能上講HashMap中的鏈表出現越少,性能越好;

固然,當存儲的鍵值對很是多時,從存儲的角度鏈表又能分擔必定的壓力。

 

// HashMap木桶(數組)的個數
const BucketCount = 16

// 鏈表結構裏的數據:鍵值對
type KV struct {
    Key   string
    Value string
}

// 鏈表結構
type LinkNode struct {
    Data     KV
    NextNode *LinkNode
}

// 哈希表的結構
type HashMap struct {
    Buckets [BucketCount]*LinkNode // 數組:散列表,裏面數據結構是哈希桶,存有鍵值對的鏈表
}

//建立只有頭結點的鏈表
func CreateLink() *LinkNode {
    //頭結點數據爲空 是爲了標識這個鏈表尚未存儲鍵值對
    linkNode := &LinkNode{KV{"", ""}, nil}
    return linkNode
}

// 建立HashMap
func CreateHashMap() *HashMap {
    myMap := &HashMap{}
    //爲每一個元素添加一個鏈表對象
    for i := 0; i < BucketCount; i++ {
        myMap.Buckets[i] = CreateLink()
    }
    return myMap
}

 

// 哈希函數,簡單的散列算法:它能夠將不一樣長度的key散列成0-BucketCount的整數
func HashCode(key string) int {
    sum := 0
    for i := 0; i < len(key); i++ {
        sum += int(key[i])
    }
    return sum % BucketCount
}

 

//添加鍵值對
func (myMap *HashMap)AddKeyValue(key string, value string)  {
    //1.將key散列成0-BucketCount的整數做爲Map的數組下標
    mapIndex := HashCode(key)

    //2.獲取對應數組頭結點
    link := myMap.Buckets[mapIndex]

    //3.在此鏈表添加結點
    if link.Data.Key == "" && link.NextNode == nil {
        //若是當前鏈表只有一個節點,說明以前未有值插入  修改第一個節點的值 即未發生哈希碰撞
        link.Data.Key = key
        link.Data.Value = value
    }else {
        //發生哈希碰撞
        link.AddNode(KV{key, value})
    }
}

 

//按鍵取值
func (myMap *HashMap)GetValueForKey(key string) string {
    //1.將key散列成0-BucketCount的整數做爲Map的數組下標
    mapIndex := HashCode(key)
    //2.獲取對應數組頭結點
    link := myMap.Buckets[mapIndex]
    var value string
    //遍歷找到key對應的節點(由於有多是哈希衝突了)
    head := link
    for {
        if head.Data.Key == key {
            value = head.Data.Value
            break
        }else {
            head = head.NextNode
        }
    }
    return value
}

 

 總的來講:就是數組(散列表) + 鏈表(哈希衝突),  散列表經過散列函數(哈希函數)來確認下標,由於是連續內存,能夠根據數組下標偏移量來肯定位置,查詢複雜度爲O(1),

又由於事先已經初始化了數組,只不過存的值爲空而已,因此插入的時候只管往數組裏面篡改數值就好了,操做複雜度爲O(1)。

哈希表的特色是會有一個哈希函數,對你傳來的key進行哈希運算,獲得惟一的值,通常狀況下都是一個數值。Golang的map中也有這麼一個哈希函數,也會算出惟一的值,對於這個值的使用,Golang也是頗有意思。

Golang把求得的值按照用途一分爲二:高位和低位。

  它的tophash 存儲的是哈希函數算出的哈希值的高八位。是用來加快索引的。由於把高八位存儲起來,這樣不用完整比較key就能過濾掉不符合的key,加快查詢速度當一個哈希值的高8位和存儲的高8位相符合,

再去比較完整的key值,進而取出value。(若是是java的hashmap,比較key是否相等直接用equal,效率會比較低,由於key有多是一大串字符)

 參考文獻:

https://segmentfault.com/a/1190000018380327?utm_source=tag-newest

https://studygolang.com/articles/14583

 

hashmap 做者原文註釋:

This file contains the implementation of Go's map type.

A map is just a hash table. The data is arranged
into an array of buckets. Each bucket contains up to
8 key/value pairs. The low-order bits of the hash are
used to select a bucket. Each bucket contains a few
high-order bits of each hash to distinguish the entries
within a single bucket.

If more than 8 keys hash to a bucket, we chain on
extra buckets.

When the hashtable grows, we allocate a new array
of buckets twice as big. Buckets are incrementally
copied from the old bucket array to the new bucket array.

意思爲:當bucket數組須要擴容時,它會開闢一倍的內存空間,而且會漸進式的把原數組拷貝,即用到舊數組的時候就拷貝到新數組。

Map iterators walk through the array of buckets and
return the keys in walk order (bucket #, then overflow
chain order, then bucket index). To maintain iteration
semantics, we never move keys within their bucket (if
we did, keys might be returned 0 or 2 times). When
growing the table, iterators remain iterating through the
old table and must check the new table if the bucket
they are iterating through has been moved ("evacuated")
to the new table.

Picking loadFactor: too large and we have lots of overflowbuckets, too small and we waste a lot of space. I wrotea simple program to check some stats for different loads:

相關文章
相關標籤/搜索