markdown-it源碼分析2-Ruler & Token

時間 2019-11-05

標籤 markdown 源碼分析 ruler token 欄目 Markdown 简体版

原文原文鏈接

做者：嵇智node

前言

要想理清 MarkdownIt 源碼的前因後果，必需要清楚兩個基礎類—— Ruler & Token。git

Token

俗稱詞法單元。github

md 接收一個字符串，通過一系列的 parser 的處理，變成了一個個 token，接着調用 render 對應的rule，將 token 做爲輸入，最後輸出 HTML 字符串。數組

先來看下 Token 的定義，位於 lib/token.js。緩存

function Token(type, tag, nesting) {

  this.type     = type;

  this.tag      = tag;

  this.attrs    = null;

  this.map      = null;

  this.nesting  = nesting;

  this.level    = 0;

  this.children = null;

  this.content  = '';

  this.markup   = '';

  this.info     = '';

  this.meta     = null;

  this.block    = false;

  this.hidden   = false;
}
複製代碼

type架構

token 的類型，好比 paragraph_open 、paragraph_close、hr，分別會渲染成 <p>、</p>、<hr>。函數
tagpost

標籤名稱，好比 p、strong、''(空字符串。表明是文字)等等。ui
attrsthis

HTML 標籤元素的特性，若是存在，則是一個二維數組，好比 [["href", "http://dev.nodeca.com"]]
map

token 的位置信息，數組只有兩個元素，前者是起始行、後者是結束行。
nesting

標籤的類型，1 是開標籤，0 是自閉合標籤，-1 是關標籤。例如 <p>、<hr>、</p>。
level

縮緊的層級。

children

子token。只有 type 爲 inline 或者 image 的 token 會有 children。由於 inline 類型的 token 還會經歷一次 parser，提取出更詳細的 token，好比如下的場景。

const src = '__advertisement__'
const result = md.render(src)

// 首先獲得以下的一個 token
{
  ...,
  content:"__Advertisement :)__",
  children: [Token, ...]
}
// 看出 content 是須要解析並提取出 "__"， "__" 須要被渲染成 <strong> 標籤。所以 inline 類型的 children 是用來存放子 token的。
複製代碼

content

放置標籤之間的內容。
markup

一些特定語法的標記。好比 "```" 代表是一個 code block。"**" 是強調的語法。"-" 或者 "+" 是一個列表。
info

type 爲 fence 的 token 會有 info 屬性。什麼是 fence 呢，以下：
```
/** ```js let md = new MarkdownIt() ``` **/
複製代碼
```
上述的註釋內部就是 fence token。它的 info 就是 js，markup 是 "```"。
meta

通常插件用來聽任意數據的。
block

ParserCore 生成的 token 的 block 爲 true，ParserInline 生成的 token 的 block 爲 true。
hidden

若是爲 true，該 token 不會被 render。

接下來看一下原型上的方法。

attrIndex()

Token.prototype.attrIndex = function attrIndex(name) {
  var attrs, i, len;

  if (!this.attrs) { return -1; }

  attrs = this.attrs;

  for (i = 0, len = attrs.length; i < len; i++) {
    if (attrs[i][0] === name) { return i; }
  }
  return -1;
};
複製代碼

根據 attribute name 返回索引。

attrPush()

Token.prototype.attrPush = function attrPush(attrData) {
  if (this.attrs) {
    this.attrs.push(attrData);
  } else {
    this.attrs = [ attrData ];
  }
};
複製代碼

添加一個 [name, value] 對。

attrSet

Token.prototype.attrSet = function attrSet(name, value) {
  var idx = this.attrIndex(name),
      attrData = [ name, value ];

  if (idx < 0) {
    this.attrPush(attrData);
  } else {
    this.attrs[idx] = attrData;
  }
};
複製代碼

覆蓋或添加一個 [name, value] 對。

attrGet

Token.prototype.attrGet = function attrGet(name) {
  var idx = this.attrIndex(name), value = null;
  if (idx >= 0) {
    value = this.attrs[idx][1];
  }
  return value;
};
複製代碼

根據 name 返回屬性值

attrJoin

Token.prototype.attrJoin = function attrJoin(name, value) {
  var idx = this.attrIndex(name);

  if (idx < 0) {
    this.attrPush([ name, value ]);
  } else {
    this.attrs[idx][1] = this.attrs[idx][1] + ' ' + value;
  }
};
複製代碼

根據 name 將當前的 value 拼接到之前的 value 上去。

Token 小結

Token 是 MarkdownIt 內部最基礎的類，也是最小的分割單元。它是 parse 的產物，也是 output 的依據。

Ruler

再來看下 MarkdownIt 另外的一個類 —— Ruler，能夠認爲它是職責鏈函數的管理器。由於它內部存儲了不少 rule 函數，rule 的職能分爲兩種，一種是 parse rule，用來解析用戶傳入的字符串，生成 token，另外一種是 render rule，在產出 token 以後，再根據 token 的類型調用不一樣的 render rule，最終吐出 HTML 字符串。

先從 constructor 提及。

function Ruler() {
  this.__rules__ = [];

  this.__cache__ = null;
}
複製代碼

__rules__

用來放全部的 rule 對象，它的結構以下：

{
  name: XXX,
  enabled: Boolean, // 是否開啓
  fn: Function(), // 處理函數
  alt: [ name2, name3 ] // 所屬的職責鏈名稱
}
複製代碼

有些人會對 alt 疑惑，這個先留個坑，在分析 __compile__ 方法的時候會細說。

cache

用來存放 rule chain 的信息，它的結構以下：
```
{
  職責鏈名稱: [rule1.fn, rule2.fn, ...]
}
複製代碼
```
注意: 默認有個名稱爲空字符串('')的 rule chain，它的 value 是一個囊括全部 rule.fn 的數組。

再來分析一下原型上各個方法的做用。

__find__

Ruler.prototype.__find__ = function (name) {
  for (var i = 0; i < this.__rules__.length; i++) {
    if (this.__rules__[i].name === name) {
      return i;
    }
  }
  return -1;
};
複製代碼

根據 rule name 查找它在 __rules__ 的索引。

__compile__

Ruler.prototype.__compile__ = function () {
  var self = this;
  var chains = [ '' ];

  // collect unique names
  self.__rules__.forEach(function (rule) {
    if (!rule.enabled) { return; }

    rule.alt.forEach(function (altName) {
      if (chains.indexOf(altName) < 0) {
        chains.push(altName);
      }
    });
  });

  self.__cache__ = {};

  chains.forEach(function (chain) {
    self.__cache__[chain] = [];
    self.__rules__.forEach(function (rule) {
      if (!rule.enabled) { return; }

      if (chain && rule.alt.indexOf(chain) < 0) { return; }

      self.__cache__[chain].push(rule.fn);
    });
  });
};
複製代碼

生成職責鏈信息。

先經過 __rules__ 的 rule 查找全部的 rule chain 對應的 key 名稱。這個時候 rule 的 alt 屬性就顯得尤其重要，由於它表示除了屬於默認的職責鏈以外，還屬於 alt 所對應的職責鏈。默認存在一個 key 爲空字符串('') 的職責鏈，任何 rule.fn 都屬於這個職責鏈。
再將 rule.fn 映射到對應的 key 屬性上，緩存在 __cache__ 屬性上。

舉個栗子：

let ruler = new Ruler()
ruler.push('rule1', rule1Fn, {
  alt: 'chainA'
})
ruler.push('rule2', rule2Fn, {
  alt: 'chainB'
})
ruler.push('rule3', rule3Fn, {
  alt: 'chainB'
})
ruler.__compile__()

// 咱們能獲得以下的結構
ruler.__cache__ = {
  '': [rule1Fn, rule2Fn, rule3Fn],
  'chainA': [rule1Fn],
  'chainB': [rule2Fn, rule3Fn],
}
// 獲得了三個 rule chain,分別爲 '', 'chainA', 'chainB'.
複製代碼

Ruler.prototype.at = function (name, fn, options) {
  var index = this.__find__(name);
  var opt = options || {};

  if (index === -1) { throw new Error('Parser rule not found: ' + name); }

  this.__rules__[index].fn = fn;
  this.__rules__[index].alt = opt.alt || [];
  this.__cache__ = null;
};
複製代碼

用來替換某一個 rule 的 fn 或者更改它所屬的 chain name。

before

Ruler.prototype.before = function (beforeName, ruleName, fn, options) {
  var index = this.__find__(beforeName);
  var opt = options || {};

  if (index === -1) { throw new Error('Parser rule not found: ' + beforeName); }

  this.__rules__.splice(index, 0, {
    name: ruleName,
    enabled: true,
    fn: fn,
    alt: opt.alt || []
  });

  this.__cache__ = null;
};
複製代碼

在某個 rule 以前插入一個新 rule。

after

Ruler.prototype.after = function (afterName, ruleName, fn, options) {
  var index = this.__find__(afterName);
  var opt = options || {};

  if (index === -1) { throw new Error('Parser rule not found: ' + afterName); }

  this.__rules__.splice(index + 1, 0, {
    name: ruleName,
    enabled: true,
    fn: fn,
    alt: opt.alt || []
  });

  this.__cache__ = null;
};
複製代碼

在某個 rule 以後插入一個新 rule。

push

Ruler.prototype.push = function (ruleName, fn, options) {
  var opt = options || {};

  this.__rules__.push({
    name: ruleName,
    enabled: true,
    fn: fn,
    alt: opt.alt || []
  });

  this.__cache__ = null;
};
複製代碼

增長 rule。

enable

Ruler.prototype.enable = function (list, ignoreInvalid) {
  if (!Array.isArray(list)) { list = [ list ]; }

  var result = [];

  // Search by name and enable
  list.forEach(function (name) {
    var idx = this.__find__(name);

    if (idx < 0) {
      if (ignoreInvalid) { return; }
      throw new Error('Rules manager: invalid rule name ' + name);
    }
    this.__rules__[idx].enabled = true;
    result.push(name);
  }, this);

  this.__cache__ = null;
  return result;
};
複製代碼

開啓 list 列出的 rule，不影響其餘 rule。

enableOnly

Ruler.prototype.enableOnly = function (list, ignoreInvalid) {
  if (!Array.isArray(list)) { list = [ list ]; }

  this.__rules__.forEach(function (rule) { rule.enabled = false; });

  this.enable(list, ignoreInvalid);
};
複製代碼

先將其餘 rule 都禁用，僅僅只開啓 list 對應的 rule。

getRules
```
Ruler.prototype.getRules = function (chainName) {
  if (this.__cache__ === null) {
    this.__compile__();
  }
  return this.__cache__[chainName] || [];
};
複製代碼
```
根據 rule chain 的 key，獲取對應的 fn 函數隊列。

Ruler 小結

能夠看出，Ruler 是至關的靈活，不論是 at、before、after、enable 仍是其餘方法，都賦予了 Ruler 極大的靈活性與擴展性，做爲使用方，能夠利用這些優秀的架構設計知足特定需求。

總結

分析完了 Token 與 Ruler 這些基礎類，咱們將進一步揭開 MarkdownIt 源碼的面紗。後面的文章，再分析怎麼從 src 字符串 parse 生成 token 的，token 又是怎麼被 renderer.render 輸出成最後的字符串。下一篇，咱們將進入 MarkdownIt 的入口 parser —— CoreParser 的分析。