淺析ImmutableJS持久化數據結構的實現

前言

內部實現是使用改寫的新的數據結構,這種數據結構只要改變就會返回新的引用,同時改變先後對象共享結構,用於解決數據複製和緩存的需求,配合單向數據流,方便追終錯誤,經常使用於搭配 Flux,Reactjavascript

使用

const { Map } = require("immutable");
const map1 = Map({ a: 1, b: 2, c: 3 });
const map2 = map1.set("b", 50);
map1.get("b") + " vs. " + map2.get("b"); // 2 vs. 50
複製代碼

使用很簡單就不詳細說了,主要說怎麼用在實戰中java

實戰

搭配 Reactnode

Component --> React.PureComponent(函數組件的React.memo)---> React.PureComponent+ImmutableJSreact

普通 Componentgit

在不編寫 SCU 以前,一旦父組件更新,子組件是必定會更新的,不管傳入的 props 如何,甚至是原本就沒有傳 props,這就會致使嚴重的性能浪費.github

React.PureComponent(React.memo)面試

對於上述的性能浪費,能夠經過 React.PureCompoent 組件來使用內置實現的 SCU,該內置 SCU 採用淺對比typescript

class CounterButton extends React.PureComponent {
  constructor(props) {
    super(props);
    this.state = { count: 1 };
  }
  render() {
    return (
      <button color={this.props.color} onClick={() => this.setState((state) => ({ count: state.count + 1 }))} > Count: {this.state.count} </button>
    );
  }
}
複製代碼

固然,但一旦深層的屬性被更改,組件是感知不到的,天然不會更新,從而就會出錯(也能夠手寫 SCU,會更智能化一點,但寫針對性 SCU 會消耗人力)數組

React.PureComponent+ImmutableJS緩存

讓組件精確更新,同時更新數據極快,返回新的頂層引用,不用每次都進行深對比,這能作到嗎？

能,他就是大名鼎鼎的 ImmutableJS,它不只知足上述優勢還附贈你高效緩存功能,簡單實現時間旅行,在這以前想要實現時間旅行就要對數據進行深拷貝.

請記住,上述所說的是在單向數據流的大概念下才有效,好比 Vue 的雙向綁定就直接 proxy 代理或者遞歸的 defineProperty 完事了,照樣精確更新(但 Proxy 效率不咋滴)

Redux+Immutable

說到狀態管理怎麼能沒有咱們的 Redux,其中 reducer 要保持純淨,每次都要返回新的引用,要寫成下面這樣

function todoApp(state = initialState, action) {
  switch (action.type) {
    case SET_VISIBILITY_FILTER:
      return Object.assign({}, state, {
        visibilityFilter: action.filter,
      });
    case ADD_TODO:
      return Object.assign({}, state, {
        todos: [
          ...state.todos,
          {
            text: action.text,
            completed: false,
          },
        ],
      });
    default:
      return state;
  }
}
複製代碼

每次都要寫這樣的語法,我忍不了,使用 Immutable 的話,直接更改就好了,由於只要改變就會返回新的頂層引用,美滋滋~

正文

前面說的就是 Immutable 在我腦海中的使用,但使用並非本文的目的,本文的主要目的是這種神奇的數據結構究竟是怎麼實現的,做爲一個源碼 reader,絕對忍不了黑魔法(固然也爲了面試有水可吹)

前綴樹

有的朋友這時可能已經滿臉問號了,Immutable 說着說着你怎麼還給整道 leetcode 過來,強迫症點這裏去刷了它,別急,會的能夠直接跳過,沒了解過的能夠先看看,這和 Immutable 數據結構的實現但是有很密切的關係的.

Trie is an efficient information re***Trie***val data structure. Using Trie, search complexities can be brought to optimal limit (key length). If we store keys in binary search tree, a well balanced BST will need time proportional to M * log N, where M is maximum string length and N is number of keys in tree. Using Trie, we can search the key in O(M) time. However the penalty is on Trie storage requirements

簡單來講,前綴樹是一種多叉樹,從根到某一個節點的路徑構成一個單詞,注意!!!不必定是到葉子節點,雖然例圖上都是,但具體到哪一個節點結束要看該節點是否有 isEnd 標誌,這是自定義的,對於前綴樹,主要的操做有 insert 和 search,insert 就是將一個單詞插入樹中,具體操做就是一個一個字母插入,search 是查找樹中是否存在某一單詞,瞭解更多猛烈點擊我,優勢就是查詢快.

List 和 Map

List的實現就是使用 索引前綴樹,索引的生成使用的是 Bitmap 也就是 Bit 位映射,特殊的是,Vector Trie 用葉子節點存放信息,其餘節點存放索引

// Constants describing the size of trie nodes.
export const SHIFT = 5; // Resulted in best performance after ______?
export const SIZE = 1 << SHIFT;
export const MASK = SIZE - 1;
複製代碼

此處定義了三個前綴樹相關的常量

SHIFT 常量定義每 SHIFT 位映射一個索引,此處規定爲 5 位,即索引爲 [0,31]

SIZE 常量定義每一個樹節點索引數組的長度爲 SIZE,此處爲 2^5 = 32,與 SHIFT 位映射索引相匹配

MASK 常量定義掩碼,用於移位後的&運算,此處爲 11111

具體是怎麼映射呢?

先看僞代碼

public class BitTrie {
  public static final int BITS = 5,
                          WIDTH = 1 << BITS, // 2^5 = 32
                          MASK = WIDTH - 1; // 31, or 0x1f

  // Array of objects. Can itself contain an array of objects.
  Object[] root;
  // BITS times (the depth of this trie minus one).
  int shift;
  public Object lookup(int key) {
    Object[] node = this.root;
    // perform branching on internal nodes here
    for (int level = this.shift; level > 0; level -= BITS) {
      node = (Object[]) node[(key >>> level) & MASK];
      // If node may not exist, check if it is null here
    }
    // Last element is the value we want to lookup, return it.
    return node[key & MASK];
  }
}
複製代碼

這裏找的是 java 版的,由於 java 代碼比較好讀

public Object lookup(int key) {
    Object[] node = this.root;
    // perform branching on internal nodes here
    for (int level = this.shift; level > 0; level -= BITS) {
      node = (Object[]) node[(key >>> level) & MASK];
      // If node may not exist, check if it is null here
    }
    // Last element is the value we want to lookup, return it.
    return node[key & MASK];
 }
複製代碼

注意其中的 lookup 方法,經過不斷的移位,截取高位換成索引,進入下一層,而 Immutable 源碼是經過遞歸作的,詳細Bit Partitioning戳我瞭解

//源碼有刪減
function updateVNode(node, ownerID, level, index, value, didAlter) {
  const idx = (index >>> level) & MASK;
  const nodeHas = node && idx < node.array.length;
  if (!nodeHas && value === undefined) {
    return node;
  }
  let newNode;
  if (level > 0) {
    const lowerNode = node && node.array[idx];

    //遞歸
    const newLowerNode = updateVNode(
      lowerNode,
      ownerID,
      level - SHIFT, //更新level,由於index是不變的,因此要更新level來截取不一樣位數
      index,
      value,
      didAlter
    );
    if (newLowerNode === lowerNode) {
      return node;
    }
    newNode = editableVNode(node, ownerID);
    newNode.array[idx] = newLowerNode;
    return newNode;
  }
}
複製代碼

舉個栗子

get 操做

以 List.get(141)爲例,爲了簡便,此處 SHIFT 定義爲 2

141 轉換爲二進制 10001101
從根節點開始,每 SHIFT=2 位做爲一層的索引
第一個索引爲二進制10 = 2 ,找到當前節點索引數組 index = 2 位置,取得下一層索引節點引用
第二個索引爲二進制00 = 0, 找到當前節點索引數組 index = 0 位置,取得下一層索引節點引用
第二個索引爲二進制11 = 3, 找到當前節點索引數組 index = 3 位置,取得下一層索引節點引用
第二個索引爲二進制01 = 1, 找到當前節點索引數組 index = 1 位置,取得結果,返回結果

set 操做

set 操做與 get 操做相似,不過爲了保持數據的持久化,須要返回新的頂層引用,因此在從根節點向下索引的過程當中要 copy 沿路的節點,最後找到最終節點再替換數據

如上圖所示,List.set(4,"beef")

與 get 相似,二進制化 4,取 SHIFT 位做爲索引
複製沿路節點
找到存放數據的葉子節點,複製一份再修改數據
返回新的·

Map 採用的是 Hash Trie

大致與 Vector Trie 相同,不過須要將 key 經過 hash 映射獲得整數 index

function hash(key: any): Number {
  //hash key
}
複製代碼

hash 源碼以下

export function hash(o) {
  switch (typeof o) {
    case "boolean":
      // The hash values for built-in constants are a 1 value for each 5-byte
      // shift region expect for the first, which encodes the value. This
      // reduces the odds of a hash collision for these common values.
      return o ? 0x42108421 : 0x42108420;
    case "number":
      return hashNumber(o);
    case "string":
      return o.length > STRING_HASH_CACHE_MIN_STRLEN
        ? cachedHashString(o)
        : hashString(o);
    case "object":
    case "function":
      if (o === null) {
        return 0x42108422;
      }
      if (typeof o.hashCode === "function") {
        // Drop any high bits from accidentally long hash codes.
        return smi(o.hashCode(o));
      }
      if (o.valueOf !== defaultValueOf && typeof o.valueOf === "function") {
        o = o.valueOf(o);
      }
      return hashJSObj(o);
    case "undefined":
      return 0x42108423;
    default:
      if (typeof o.toString === "function") {
        return hashString(o.toString());
      }
      throw new Error("Value type " + typeof o + " cannot be hashed.");
  }
}
複製代碼

舉個栗子

map.get("R")

將 key 進行 hash 得到二進制 01010010
從後往前依此取 2 位做爲索引,獲取下一層節點
重複 2 直到找到該 key 存儲位置

有的朋友就會問了,爲何要從後往前,不能從前日後取,由於我壓根就不知道 hash 出來是幾位,只能從後往前索引

尾聲

持久化數據的核心理念其實在一篇論文中首先出現的,在譬如 GO 語言中也有相似實現,Immutable 做者也自嘲是 copycat,但配合 React 的如虎添翼也是有目共睹的,本人水平有限,因此本文也沒有過於深刻研究,諸如 Sep,Record 等數據結構也沒有分析,出錯不免,請各位指正.

參考

immutable-js

Functional Go: Vector Trie 的實現

wiki hash_array_mapped_trie

Understanding Clojure's Persistent Vectors, pt. 2

使用 immutable 優化 React

React 的性能優化(一)當 PureComponent 趕上 ImmutableJS

Reconciliation in React detailed explanation

React.js Conf 2015 - Immutable Data and React

trie 樹這個數據結構的優勢是什麼,中文名稱是什麼? 提升點難度:他的缺點是什麼,效率用 big O 表示是多高?