lodash源碼淺析之如何實現深拷貝

時間 2020-05-12

標籤 lodash 源碼淺析如何實現拷貝简体版

原文原文鏈接

😄本文首發於： lodash-source-learning/

1、概要

工具庫 lodash 在開發過程當中爲咱們封裝了豐富便捷的js函數，實現一些經常使用的功能，在使用過程當中就會對lodash的內部實現原理感到好奇。javascript

本次文章的主要內容分析閱讀了lodash中深拷貝 _.cloneDeep（）的實現。html

2、深拷貝和淺拷貝之間的區別

淺拷貝：對於引用類型的數據來講，賦值運算只是更改了引用的指針，可是指針指向的地址仍是同一個，因此對應的變更會影響雙方。java

深拷貝：遞歸拷貝一個對象中的字對象，完成後兩個對象不互相影響。node

3、什麼樣的數據在深拷貝適用範圍

包括但不限於：git

Date對象
Object
Array
TypedArray
Map
Set
ArrayBuffer
RegExp

4、lodash如何實現深拷貝

一、初始化

const CLONE_DEEP_FLAG = 1
const CLONE_SYMBOLS_FLAG = 4

function deepClone(value) {
  return baseClone(value, CLONE_DEEP_FLAG | CLONE_SYMBOLS_FLAG)
}

cloneDeep的主體函數baseClone:github

function baseClone(value, bitmask, customizer, key, object, stack) {
  let result
  const isDeep = bitmask & CLONE_DEEP_FLAG
  const isFlat = bitmask & CLONE_FLAT_FLAG
  const isFull = bitmask & CLONE_SYMBOLS_FLAG
}

以上入口代碼看起來很簡潔：定義兩個位掩碼常量，經過位運算控制參數類型，達到控制參數權限的基本實現:api

1 | 4 & 1 => 1  
1 | 4 & 2 => 0 
1 | 4 & 4 => 4

由上面的位元算可得知，在當前深拷貝模式下，isDeep 和 isFull爲true，這兩個變量在下面的代碼中起到很大的判斷做用。數組

關於javascript中位運算能夠參考MDN：Bitwise_Operators。緩存

二、標記值的類型

const tag = getTag(value)

const toString = Object.prototype.toString

function getTag(value) {
  if (value == null) {
    return value === undefined ? '[object Undefined]' : '[object Null]'
  }
  return toString.call(value)
}

以上實現經過調用Object的原型toString()方法，區別不一樣value對應的具體類型：數據結構

var toString = Object.prototype.toString;
 toString.call(new Date); // [object Date]
 toString.call(new String); // [object String]
 toString.call(Math); // [object Math]
 //JavaScript版本1.8.5 及以上
 toString.call(undefined); // [object Undefined]
 toString.call(null); // [object Null]
 toString.call(argument); // [object Arguments]

三、數組的拷貝

if (isArr) {
    // 數組深拷貝的初始化，返回了一個新數組的雛形
    result = initCloneArray(value)
}

function initCloneArray(array) {
  const { length } = array
  const result = new array.constructor(length)
  
  if (length && typeof array[0] === 'string' && hasOwnProperty.call(array, 'index')) {
    result.index = array.index
    result.input = array.input
  }
  return result
}

export default initCloneArray

看到這裏會有疑問，爲何數組類型的拷貝，須要判斷typeof array[0] === 'string' && hasOwnProperty.call(array, 'index')？index和input是什麼狀況?

熟悉js正則匹配的會知道，這裏考慮了一種特殊的數組狀況，那就是regexObj.exec(str)，用來處理匹配正則時，執行exec()的返回結果狀況，若是匹配成功，exec() 方法返回一個數組（包含額外的屬性 index 和 input）

const matches = /(hello \S+)/.exec('hello world, javascript');
console.log(matches);
輸出=>
[
    0: "hello world,"
    1: "hello world,"
    index: 0
    input: "hello world, javascript"
    groups: undefined
    length: 2
]

四、Buffer的拷貝

if (isBuffer(value)) {
  return cloneBuffer(value, isDeep)
}

const Buffer = moduleExports ? root.Buffer : undefined, allocUnsafe = Buffer ? Buffer.allocUnsafe : undefined

function cloneBuffer(buffer, isDeep) {
  if (isDeep) {
    return buffer.slice()
  }
  const length = buffer.length
  const result = allocUnsafe ? allocUnsafe(length) : new buffer.constructor(length)

  buffer.copy(result)
  return result
}

以上對buffer對象相關的一些引用作處理。Buffer.allocUnsafe() 在node中返回指定大小的新未初始化Buffer實例。

具體能夠參考：Buffer.allocUnsafe。

五、Object的拷貝

Object的拷貝開始，會使用Object.create()構造出一個空對象，用以實現原對象的原型繼承。

// 用來檢測value是否爲原型對象
function isPrototype(value) {
  const Ctor = value && value.constructor
  const proto = (typeof Ctor === 'function' && Ctor.prototype) || objectProto

  return value === proto
}

function initCloneObject(object) {
  return (typeof object.constructor === 'function' && !isPrototype(object))
    ? Object.create(Object.getPrototypeOf(object))
    : {}
}

4.一、使用數據緩存來維護對象的拷貝

stack || (stack = new Stack)
const stacked = stack.get(value)
if (stacked) {
return stacked
}
// 這裏的result是上面一系列代碼生成的初始化對象，能夠暫時把它理解爲一個包含原型繼承關係的空對象
stack.set(value, result)

上面代碼創建了Stack，這是個數據管理接口，將子對象的值做爲key-value一對一的形式緩存起來，其內部詳細的緩存行爲大概細分爲HashCache、MapCache和ListCache，爲何使用三種對象緩存策略？

HashCache本質上是用對象的存儲方式，但是會有個限制，js中的對象存儲，本質上是鍵值對的集合（Hash 結構），只能限制使用字符串/Symbol看成鍵，這給它的使用帶來了很大的限制。而Map提供了一種更完善的 Hash 結構實現，它的key能夠是各類類型，因此在key爲Object/Array等類型的場景下，lodash內部使用了MapCache。

4.1.一、Stack

class Stack{
    ...
    const LARGE_ARRAY_SIZE = 200
    // Stack的set方法
    set(key, value) {
        let data = this.__data__
        if (data instanceof ListCache) {
          const pairs = data.__data__
          if (pairs.length < LARGE_ARRAY_SIZE - 1) {
            pairs.push([key, value])
            this.size = ++data.size
            return this
          }
          data = this.__data__ = new MapCache(pairs)
        }
        data.set(key, value)
        this.size = data.size
        return this
    }
    ...
}

由Stack的入口邏輯能夠看到，當緩存內部__data__的長度超出LARGE_ARRAY_SIZE限額時，構造了MapCache的實例，並採用了MapCache的內部set方法，不然使用ListCache。

4.1.二、LstCache

ListCache實際上是一個二維數組類型的數據結構

class ListCache {
    ...
    // ListCache中set方法，實現了二維數組式存儲
    set(key, value) {
        const data = this.__data__
        const index = assocIndexOf(data, key)
    
        if (index < 0) {
          ++this.size
          data.push([key, value])
        } else {
          data[index][1] = value
        }
        return this
    }
    ...
}

4.1.三、MapCache

下面是MapCache的存儲主要實現：

// 初始化數據結構
this.__data__ = {
  'hash': new Hash,
  'map': new Map,
  'string': new Hash
}

set(key, value) {
    const data = getMapData(this, key)
    const size = data.size
    data.set(key, value)
    this.size += data.size == size ? 0 : 1
    return this
}

// 根據key的類型來判斷該數據的存儲方式，Hash或者Map
function getMapData({ __data__ }, key) {
  const data = __data__
  return isKeyable(key)
    ? data[typeof key === 'string' ? 'string' : 'hash']
    : data.map
}

// 檢查 value 是否適合用做惟一對象鍵
function isKeyable(value) {
  const type = typeof value
  return (type === 'string' || type === 'number' || type === 'symbol' || type === 'boolean')
    ? (value !== '__proto__')
    : (value === null)
}

4.1.四、HashCache

有下面的代碼能夠看出，Hash 實際上是用對象形式作緩存

const HASH_UNDEFINED = '__lodash_hash_undefined__'

this.__data__ = Object.create(null)

set(key, value) {
    const data = this.__data__
    this.size += this.has(key) ? 0 : 1
    data[key] = value === undefined ? HASH_UNDEFINED : value
    return this
}

4.二、循環引用問題

const loopObject = { a: 1 }
loopObject.b = loopObject

🌰中，loopObject中的b就是一個循環引用的屬性。

因爲這個特殊狀況的存在，在使用JSON.parse(JSON.stringify(loopObject))時會出現內存溢出的問題。

使用緩存的另外一個好處是，可以處理對象中循環引用的狀況。在遍歷到循環引用對象時，緩存策略會從ceche中利用對應的key找出對應的value，若是對應的引用已經拷貝了，就不須要在再次執行拷貝了，避免了溢出的問題。

五、遞歸拷貝

if (tag == mapTag) {
    value.forEach((subValue, key) => {
      result.set(key, baseClone(subValue, bitmask, customizer, key, value, stack))
    })
    return result
}
// 當前是set類型
if (tag == setTag) {
    value.forEach((subValue) => {
      result.add(baseClone(subValue, bitmask, customizer, subValue, value, stack))
    })
    return result
}

// 其餘的可迭代對象，好比Array/Object
arrayEach(props || value, (subValue, key) => {
    if (props) {
      key = subValue
      subValue = value[key]
    }
    // 遞歸進行數據的克隆
    assignValue(result, key, baseClone(subValue, bitmask, customizer, key, value, stack))
})

字對象的遞歸拷貝主要遞歸使用了baseClone()，並對不一樣類型的對象做區分。