Lodash系列之數組篇

時間 2020-12-02

原文原文鏈接

本文從使用頻率和實用性依次遞減的順序來聊一聊幾個Lodash數組類工具函數。對於大多數函數本文不會給出Lodash源碼的完整實現，而更多側重於實現思路的探討。面試

本文共11371字，閱讀完成大約須要23分鐘。算法

扁平化(flatten)

用法

flatten這個函數很是實用，面試的時候你們也很喜歡問。先來看下用法, 對於不一樣深度的嵌套數組Lodash提供了3種調用方式:編程

// 展開全部的嵌套
_.flattenDeep([1, [2, [3, [4]], 5]]) // [1, 2, 3, 4, 5]

// 展開數組元素最外一層的嵌套
_.flattenDepth([1, [2, [3, [4]], 5]], 1) // [1, 2, [3, [4]], 5]

// 等同於flattenDepth(, 1)，展開元素最外一層的嵌套
_.flatten([1, [2, [3, [4]], 5]]) // [1, 2, [3, [4]], 5]

不難看出其餘兩種調用方式都是由flattenDepth派生出來的, flattenDeep至關於第二個參數傳入了無窮大，flatten至關於第二個參數傳入1。數組

實現思路

那麼問題來了，這個能夠指定展開深度的flattenDepth函數怎麼實現呢?數據結構

一個簡單的思路是: 咱們能夠利用展開語法(Spread syntax)/遍歷賦值來展開單層的數組, 例如:函數式編程

const a = [1];
const b = [
  ...a, 2, 3,
];

那麼遞歸地調用單層展開, 咱們天然就能夠實現多層的數組展開了。函數

Lodash的實現方式

在Lodash中這個函數叫baseFlatten, 各位須要對這個函數留點印象，本文後面討論集合操做的時候還會看到。工具

// 保留predicate參數爲本文後面幾個函數服務
function baseFlatten(array, depth, predicate = Array.isArray, result = []) {
  if (array == null) {
    return result
  }
  for (const value of array) {
    if (depth > 0 && predicate(value)) {
      if (depth > 1) {
        // 遞歸調用, 深度-1
        // Recursively flatten arrays (susceptible to call stack limits).
        baseFlatten(value, depth - 1, predicate, result)
      } else {
        // 未達到指定深度展開當前一層
        result.push(...value)
      }
    } else {
      // 通常條件
      result[result.length] = value
    }
  }
  return result
}

典型的迭代+遞歸函數，迭代時不斷將非數組元素推入result實現扁平化。對於指定深度的調用，超出深度的只展開當前一層, 不然深度遞減。性能

另類的實現方式

另外數組扁平化還有一種比較簡短的實現方式, 利用toString()或join()將數組轉爲字符串, 而後將獲得的字符串用split()函數分割。不過這種方式有個比較大的問題在於會直接忽略數組中的null和undefined元素, 且獲得的數組是字符串數組，其餘基礎類型(如布爾值，數字)須要手動轉換。大數據

這種寫法運行效率與遞歸差異不大，在特定場景也能夠有其使用價值。

[1, [2, [3, [4]], 5]].join().split(',')
// or
[1, [2, [3, [4]], 5]].toString().split(',')

去重(uniq)

用法

數組去重也很是的實用，Lodash爲不一樣的數據類型提供了兩種調用方式:

_.uniq([2, 1, 2]) // [2, 1]

_.uniqWith([{ 'x': 1, 'y': 2 }, { 'x': 2, 'y': 1 }, { 'x': 1, 'y': 2 }], _.isEqual) // [{ 'x': 1, 'y': 2 }, { 'x': 2, 'y': 1 }]

實現思路

數據去重有衆多的實現思路, 其中流傳程度最廣的當屬利用Set數據結構性質進行去重的實現。

其他的都是對數組進行單次遍歷，而後構造新數組或者過濾掉重複元素。

不過有須要注意的點: 如何處理NaN的相等性判斷(NaN !== NaN), 延伸一下就是如何控制元素相等性判斷策略(例如如何能傳入一個函數能使得認爲[1, 2, 3]和[1, 2, 3]是相等的)。

引用下MDN上的說法, ES2015中有四種相等算法：

抽象（非嚴格）相等比較 (==)
嚴格相等比較 (===): 用於Array.prototype.indexOf, Array.prototype.lastIndexOf
同值零: 用於 TypedArray 和 ArrayBuffer 構造函數、以及Map和Set操做, 並將用於 ES2016/ES7 中的String.prototype.includes
同值(Object.is): 用於全部其餘地方

實現方式一(Set)

利用Set數據結構性質進行去重最爲簡潔且大數據量下效率最高:

// 數組轉爲set後轉回數組, 沒法區分±0
const uniq = (arr) => [...new Set(arr)];

須要注意的是Set中的同值零(SameValueZero)相等性判斷認爲NaN之間，±0之間都是相等的, 所以沒法區分±0，且沒法傳入相等性判斷策略。

實現方式二(單次遍歷構造新數組)

單次遍歷並構造新數組, 空間複雜度O(N)。

須要注意的是NaN的判斷，Array.prototype.indexOf使用的是嚴格相等性判斷策略, 沒法正確獲得NaN元素的索引。例如:

[1, NaN, 2].indexOf(NaN) // -1

因而咱們須要使用Array.prototype.includes的同值零相等性判斷策略進行判斷:

function unique(array) {
  const result = [];
  for (const value of array) {
    // 一樣的, 同值零相等性判斷策略沒法區分±0
    if (!result.includes(value)) {
      result[result.length] = value;
    }
  }
  return result;
}

更進一步，咱們能夠實現一個includesWith函數來手動傳入相等判斷策略:

function includesWith(array, target, comparator) {
  if (array == null) return false;

  for (const value of array) {
    if (comparator(target, value)) return true;
  }
  return false;
}

function unique(array, comparator) {
  const result = [];
  for (const value of array) {
    if (!includesWith(result, value, comparator)) {
      result[result.length] = value;
    }
  }
  return result;
}

// 傳入同值零相等性判斷策略, 能夠區分±0
unique([+0, 1, NaN, NaN, -0, 0], Object.is) // [0, 1, NaN, -0]

// 傳入外形相等性判斷策略
unique([
  [1, 2, 3], {},
  [1, 2, 3], {},
], _.isEqual) // [[1, 2, 3], {}]

實現方式三(單次遍歷過濾重複元素)

單次遍歷並過濾重複元素的思路有兩種實現方式, 一種是利用哈希表過濾存儲遍歷過的元素，空間複雜度O(N):

function unique(arr) {
  const seen = new Map()
  // 遍歷時添加至哈希表, 跟Set同樣沒法區分±0
  return arr.filter((a) => !seen.has(a) && seen.set(a, 1))
}

對於Map咱們雖然不能控制其相等性判斷策略，可是咱們能夠控制其鍵值生成策略。例如咱們能夠粗暴地利用JSON.stringify來完成一個簡陋的"外形"相等性鍵值生成策略:

function unique(array) {
  const seen = new Map()
  return array.filter((item) => {
    // 若是你須要將基本類型及其包裝對象(如`String(1)`與`"1"`)視爲同值，那麼也能夠將其中的`typeof`去掉
    const key = typeof item + JSON.stringify(item)
    return !seen.has(key) && seen.set(key, 1)
  })
}

另外一種方式是利用Array.prototype.findIndex的性質，空間複雜度O(1):

function unique(array) {
  return array.filter((item, index) => {
    // 存在重複元素時，findIndex的結果永遠是第一個匹配到的元素索引
    return array.findIndex(e => Object.is(e, item)) === index; // 利用同值相等性判斷處理NaN
  });
}

Lodash的實現方式

因爲IE8及如下不存在Array.prototype.indexOf函數，Lodash選擇使用兩層嵌套循環來代替Array.prototype.indexOf:

const LARGE_ARRAY_SIZE = 200

function baseUniq(array, comparator) {
  let index = -1

  const { length } = array
  const result = []

  // 超過200使用Set去重
  if (length >= LARGE_ARRAY_SIZE && typeof Set !== 'undefined') {
    return [...new Set(array)]
  }

  outer:
  while (++index < length) {
    let value = array[index]

    // Q: 什麼值自身不等於自身?
    if (value === value) {
      let seenIndex = result.length
      // 等價於indexOf
      while (seenIndex--) {
        if (result[seenIndex] === value) {
          continue outer
        }
      }
      result.push(value)
      // Q: 能夠用indexOf嗎?
    } else if (!includesWith(result, value, comparator)) {
      result.push(value)
    }
  }
  return result
}

求並集(union)

下文的三個函數是集合的三個核心操做，關於集合論一圖勝千言，我就不畫了放個網圖。

用法

以同值零相等性判斷策略合併數組, Lodash一樣爲不一樣的數據類型提供了兩種調用方式:

_.union([2, 3], [1, 2]) // [2, 3, 1]
_.union([0], [-0]) // [0]
_.union([1, [2]], [1, [2]]) // [1, [2], [2]]

// 外形相等性判斷
_.unionWith([1, [2]], [1, [2]], _.isEqual) // [1, [2]]

實現思路

思路很簡單，就是將傳入的數組展開一層到同一數組後去重。

那不就是利用flatten和unique嗎?

是的, Lodash也就是這樣實現union函數的。

Lodash的實現方式

下面只給出了Lodah的實現方式，各位能夠嘗試組合上文中的各類unique與flatten實現。

function union(...arrays) {
  // 第三個參數再也不是默認的Array.isArray
  return baseUniq(baseFlatten(arrays, 1, isArrayLikeObject))
}

function isArrayLikeObject(value) {
  return isObjectLike(value) && isLength(value.length)
}

// 非null對象
function isObjectLike(value) {
  return typeof value === 'object' && value !== null
}

// 小於2的53次冪的非負整數
function isLength(value) {
  return typeof value === 'number' &&
    value > -1 && value % 1 == 0 && value <= Number.MAX_SAFE_INTEGER
}

求交集(intersection)

用法

求集合中的共有部分，Lodash一樣爲不一樣的數據類型提供了兩種調用方式:

intersection([2, 1], [2, 3]) // [2]
intersection([2, 3, [1]], [2, [1]]) // [2]

// 外形相等性判斷
_.intersectionWith([2, 3, [1]], [2, [1]], _.isEqual) // [2, [1]]

實現思路

集合中的共有部分，那麼咱們只須要遍歷一個集合便可，而後構建新數組/過濾掉其餘集合不存在的元素

函數式實現方式

const intersection = (a, b) => a.filter(x => b.includes(x))

// 還記得上文中的includesWith函數嗎?
const intersectionWith = (a, b, comparator = Object.is) => a.filter(x => includesWith(b, x, comparator))

求差集(difference)

用法

求集合中的差別部分，Lodash一樣爲不一樣的數據類型提供了兩種調用方式:

difference([2, 1], [2, 3]) // 獲得[1]
difference([2, 1], [2, 3, 1], [2]) // 獲得[]
difference([2, 1, 4, 4], [2, 3, 1]) // 獲得[4, 4]

須要注意的是差集是存在單個做用主體的，difference的語義是"集合A相對與其餘集合的差集", 因此獲得的值一定是傳入的第一個參數數組(即集合A)中的元素，若是集合A是其餘集合的子集，那麼獲得的值一定爲空數組，理解上有困難的不妨畫圖看看。

實現思路

存在單個做用主體的差別部分，那麼咱們只須要遍歷一個集合便可，而後構建新數組/過濾掉其餘集合存在的元素

函數式實現方式

const difference = (a, b) => a.filter(x => !b.includes(x))
// 外形相等性判斷
const differenceWith = (a, b, comparator = Object.is) => a.filter(x => !includesWith(b, x, comparator))

分塊(chunk)

用法

就是將數組等分爲若干份, 最後一份不足的不進行補齊:

chunk(['a', 'b', 'c', 'd'], 2) //  [['a', 'b'], ['c', 'd']]
chunk(['a', 'b', 'c', 'd'], 3) //  [['a', 'b', 'c'], ['d']]

實現思路

看到執行函數的結果就不難想到它是如何實現的, 遍歷時將數組切片(slice)獲得的若干份新數組合並便可。

另外，若是我不想使用循環遍歷，想用函數式編程的寫法用Array.prototype.map與Array.prototype.reduce該怎麼作呢?

首先咱們要構造出一個長度等於Math.ceil(arr.length / size)的新數組對象做爲map/reduce的調用對象, 而後進行返回數組切片便可。

不過這裏有個問題須要注意: 調用Array構造函數只會給這個新數組對象設置length屬性，而其索引屬性並不會被自動設置。

const a = new Array(3)
// 不存在索引屬性
a.hasOwnProperty("0") // false
a.hasOwnProperty(1) // false

那麼問題來了，如何如何設置新數組對象的索引屬性呢?

讀者能夠先本身思考下，答案在下文中揭曉。

實現方式

function chunk(array, size = 1) {
  // toInteger作的就是捨棄小數
  size = Math.max(toInteger(size), 0)
  const length = array == null ? 0 : array.length
  if (!length || size < 1) {
    return []
  }
  let index = 0
  let resIndex = 0
  const result = new Array(Math.ceil(length / size))

  while (index < length) {
    // Array.prototype.slice須要處理一些非數組類型元素，小數據規模下性能較差
    result[resIndex++] = slice(array, index, (index += size))
  }
  return result
}

函數式實現方式

上文說到調用Array構造函數生成的數組對象不存在索引屬性，所以咱們在須要用到索引屬性時須要填充數組對象。

一共有三種方式: 數組展開語法, Array.prototype.fill, Array.from。

// 利用展開語法
const chunk = (arr, size) =>
  [...Array(Math.ceil(arr.length / size))].map((e, i) => arr.slice(i * size, i * size + size));

// 利用`Array.prototype.fill`
const chunk = (arr, size) =>
  Array(Math.ceil(arr.length / size)).fill(0).map((e, i) => arr.slice(i * size, i * size + size));

// 利用`Array.from`的回調函數
const chunk = (arr, size) =>
  Array.from({ length: Math.ceil(arr.length / size) }, (e, i) => arr.slice(i * size, i * size + size));

// 利用`Array.from`
const chunk = (arr, size) =>
  Array.from({ length: Math.ceil(arr.length / size) }).map((e, i) => arr.slice(i * size, i * size + size));

// 利用`Array.prototype.reduce`, 索引等於size倍數時將當前切片合併進累計器(accumulator)
const chunk = (arr, size) =>
  arr.reduce((a, c, i) => !(i % size) ? a.concat([arr.slice(i, i + size)]) : a, []);

數組切片(slice)

根據索引獲得更小規模的數組:

用法

_.slice([1, 2, 3, 4], 2) // [3, 4]
_.slice([1, 2, 3, 4], 1, 2) // [2]
_.slice([1, 2, 3, 4], -2) // [3, 4]

// 等於 _.slice([1, 2, 3, 4], 4 - 2, 3)
_.slice([1, 2, 3, 4], -2, 3) // [3]

// 等於 _.slice([1, 2, 3, 4], 4 - 3, 3)
_.slice([1, 2, 3, 4], -3, -1) // [2, 3]

實現思路

對於數組切片咱們須要記住的是，區間包含start不包含end, 負數索引等同於數組長度加該數, start絕對值大於數組長度時等同於0, end絕對值大於數組長度時等同於數組長度。

這些策略就是Lodash乃至V8實現數組切片的思路。

Lodash的實現方式

function slice(array, start, end) {
  let length = array == null ? 0 : array.length
  if (!length) {
    return []
  }
  start = start == null ? 0 : start
  end = end === undefined ? length : end

  if (start < 0) {
    // 負數索引等同於數組長度加該數, start絕對值大於數組長度時等同於0
    start = -start > length ? 0 : (length + start)
  }
  // end絕對值大於數組長度時等同於數組長度
  end = end > length ? length : end
  // 負數索引等同於數組長度加該數
  if (end < 0) {
    end += length
  }
  length = start > end ? 0 : ((end - start) >>> 0)
  // toInt32
  start >>>= 0

  let index = -1
  const result = new Array(length)
  while (++index < length) {
    result[index] = array[index + start]
  }
  return result
}

一個比較有趣的點是這裏的位運算: 無符號右移(end - start) >>> 0, 它起到的做用是toInt32(由於位運算是32位的), 也就是小數取整。

那麼問題來了, 爲何不用封裝好的toInteger函數呢?

我的理解一是就JS運行時而言，咱們沒有32位以上的數組切片需求；二是做爲一個基礎公用函數，位運算的運行效率顯然更高。

好了，以上就是本文關於Lodash數組類工具函數的所有內容。行文不免有疏漏和錯誤，還望讀者批評指正。