簡易表達式解析器編寫

時間 2019-11-16

標籤簡易表達式解析編寫简体版

原文原文鏈接

前言

在作一個Node監控系統的時候要作了一個郵件報警的需求，這時候就須要自定義一套規則來書寫觸發報警的表達式，本文就介紹一下如何編寫一個簡易的表達式解析器。附上界面截圖，圖中就是一個表達式的例子。javascript

參考書籍：《編程語言實現模式》前端

肯定一些基本規則

在開始編寫以前你須要肯定你的表達式須要擁有些什麼能力。本文的表達式是判斷是否觸發報警的條件，表達式的規則不該該過於複雜。同時，因爲書寫表達式的用戶是前端開發或者node開發，因此語法貼近js表達式語法最合適：java

支持變量，這裏規定變量以@開頭，由字母和數字組成，如： @load15
支持常量：布爾值、數字、字符串
支持加法（+）、減法（-）、乘法（*）、除法（/）、取餘（%）運算
支持全等（===）、不等（!==）、大於（>）、小於（<）、大於等於（>=）、小於等於（<=）等比較運算
支持與（&&）、或（||）、非（!）等邏輯運算
擴展字符串包含include運算
支持括號（）

構建抽象語法樹（AST）

構建抽象語法樹的目的是設計一種數據結構來講明一個表達式的含義。好比@var > 5，在代碼中它做爲一個字符串，是沒有規定其是怎樣去進行運算的。node

若是解析成以下抽象語法樹：git

根節點爲運算符，子節點爲運算對象，這樣咱們就用二叉樹抽象了一個運算。express

固然，上面的例子是一個較爲簡單的運算表達式，一旦表達式變得複雜，沒法避免的是運算的優先級，那麼二叉樹如何來表示運算的優先級呢？編程

咱們用兩個表達式來講明：bash

從圖中能夠看出，第二個表達式加入括號後改變了原來的運算順序，對應的抽象語法樹也發生了變化，不難看出：運算的優先級越高，在語法樹中的位置越低。數據結構

AST的數據結構舉個例子來看，如如下表達式編程語言

@load > 1 + 5
複製代碼

解析成的token爲：

{
  "type": "LOGICAL_EXP",
  "operator": ">",
  "left": {
    "type": "VARIABLE",
    "value": "load",
    "raw": "@load"
  },
  "right": {
    "type": "BINARY_EXP",
    "operator": "+",
    "left": {
      "type": "NUMBER",
      "value": 1,
      "raw": "1"
    },
    "right": {
      "type": "NUMBER",
      "value": 5,
      "raw": "5"
    }
  }
}
複製代碼

肯定了抽象語法樹的結構，咱們就能夠開始考慮怎麼去解析表達式字符串了。

使用LL(1)遞歸降低詞法解析器

詞法單元(tokenizing)是指字符的語法，相似咱們用字母以必定的語法拼成一個單詞同樣。

LL(1)遞歸降低詞法解析器是向前看一個詞法單元的自頂向下的解析器，LL中的兩個L都是"left-to-right"，第一個L表示解析器按"從作到右"的順序解析輸入內容，第二個L表示降低解析時也時從左到右遍歷子節點。

這個是最簡單的解析器，可以知足需求的狀況下，使用這個模式更爲通俗易懂。

詞法單元羅列

既然要向前看一個詞法單元，那麼咱們首先應該羅列出這個表達式可能擁有的詞法單元，本文的表達式可能擁有以下詞法單元：

變量詞法單元，標誌：以"@"開頭
數字詞法單元，標誌：以0-9開頭或者"."開頭
字符串詞法單元，標誌：以單引號或者雙引號開頭
括號詞法單元，標誌：以左括號爲開頭
一元運算符詞法單元，標誌：以"!"、"+"、"-"開頭

下面咱們就能夠正式開始寫代碼了。

尋找遞歸點

一個表達式能夠分解成多個沒有括號的表達式，如如下表達式：

5 * (3 + 2 * (5 + 6))
複製代碼

咱們能夠把它分解成如下表達式：

5 * expression_a
3 + 2 * expression_b // expression_a 
5 + 6 // expression_b
複製代碼

整個字符串表示的就是一個表達式，這就是咱們要找的遞歸點，定義一個函數eatExpression去解析一個表達式，遇到括號就遞歸eatExpression，直到沒法匹配任何詞法單元。

下面開始擼代碼。

class ExpressionParser

class ExpressionParser {
  constructor(expr) {
    if (typeof expr !== 'string') {
      throw new Error(`[expression-parser] constructor need a string parameter, but get [${typeof expr}]`);
    }
    this.expr = expr;
    this.parse();
  }

  parse() {
    this.index = 0;
    this.tokens = this.eatExpression();
    if (this.index < this.expr.length) {
      this.throwError(`非法字符"${this.charAt()}"`);
    }
  }
}

const expression
const parser = new ExpressionParser(expression)
複製代碼

這是解析器類，初始工做就是保存expression，而後使用eatExpression去解析，當咱們開始解析的時候，咱們是一位一位字符地查看，咱們形象地用"eat"來表示遍歷這個動做，所以要有幾個輔助參數和輔助函數：

this.index // 標記當前遍歷的字符的下標

// 獲取當前遍歷字符
charAt(index = this.index) {
  return this.expr.charAt(index);
}

// 獲取當前字符的 Unicode 編碼
charCodeAt(index = this.index) {
  return this.expr.charCodeAt(index);
}
複製代碼

ExpressionParser.prototype.eatExpression

這個函數是整個解析器的入口，這個函數的思路很重要。一個表達式能夠分爲兩種：

有二元運算符的表達式
沒有二元運算符的表達式

沒有二元運算符的表達式很簡單，只要遍歷分解成上述詞法單元的集合便可，而若是有二元運算符且大於1個的時候，這時候就比較複雜了，由於咱們解析的順序是從左到右，而運算符的順序是不肯定的。

那麼如何解決呢？下面用一個例子的處理流程解釋核心的優先級計算思想：

主要的思想就是利用一個堆棧，把解析的token存進堆棧，當解析到運算符（一元運算符解析成token，這裏指二元運算符）的時候，對比棧中最靠近頂部的運算符，若是發現新解析的運算符優先級更高，直接推動棧內。因此，在棧中，運算符的優先級保證是從低到高的，一旦新解析的運算符優先級更低，說明棧內的token能夠合成一個node直到棧內的運算符優先級所有都是從低到高。最後，再從右向左依次合成node，獲得一個完整的表達式二叉樹。

最核心的思想就是保證棧內優先級從低到高，下面貼代碼讓你們再鞏固理解：

eatExpression() {
    let left = this.eatToken();
    let operator = this.eatBinaryOperator();
    // 說明這個運算樹只有左側
    if (!operator) {
        return left;
    }
    let operatorInfo = {
        precedence: this.getOperatorPrecedence(operator), // 獲取運算符優先級
        value: operator,
    };
    let right = this.eatToken();
    if (!right) {
        this.throwError(`"${operator}"運算符後應該爲表達式`);
    }
    const stack = [left, operatorInfo, right];
    // 獲取下一個運算符
    while (operator = this.eatBinaryOperator()) {
        const precedence = this.getOperatorPrecedence(operator);
        // 若是遇到了非法的yuan fa
        if (precedence === 0) {
            break;
        }
        operatorInfo = {
            precedence,
            value: operator,
        };
        while (stack.length > 2 && precedence < stack[stack.length - 2].precedence) {
            right = stack.pop();
            operator = stack.pop().value;
            left = stack.pop();
            const node = this.createNode(operator, left, right);
            stack.push(node);
        }
        const node = this.eatToken();
        if (!node) {
            this.throwError(`"${operator}"運算符後應該爲表達式`);
        }
        stack.push(operatorInfo, node);
    }
    let i = stack.length - 1;
    let node = stack[i];
    while (i > 1) {
        node = this.createNode(stack[i - 1].value, stack[i - 2], node);
        i -= 2;
    }
    return node;
}
複製代碼

createNode:

const LOGICAL_OPERATORS = ['||', '&&', '===', '!==', '>', '<', '>=', '<=', 'include'];

...

createNode(operator, left, right) {
    const type = LOGICAL_OPERATORS.indexOf(operator) !== -1 ? 'LOGICAL_EXP' : 'BINARY_EXP';
    return {
        type,
        operator,
        left,
        right,
    };
}
複製代碼

getOperatorPrecedence:

const BINARY_OPERATORS = {
    '||': 1, 
    '&&': 2,
    '===': 6, '!==': 6,
    '<': 7, '>': 7, '<=': 7, '>=': 7,
    '+': 9, '-': 9,
    '*': 10, '/': 10, '%': 10,
    include: 11,
};

...

getOperatorPrecedence(operator) {
    return BINARY_OPERATORS[operator] || 0;
}
複製代碼

相信如今你對於總體的思路已經很清晰了，接下來還有一個比較重要的須要講一下，這就是token的解析

Token解析

token解析的過程，能夠想象成有個下標一個字符一個字符地從左到右移動，遇到能夠辨識的token開頭標誌，就解析token，而後繼續移動下標直到整個字符串結束或者沒法遇到可辨識的標誌。

具體每種類型的token如何去匹配，能夠查看完整的代碼。

計算結果值

token解析好後，咱們是須要計算表達式值的，因爲變量的存在，因此咱們須要提供一個context來供變量獲取值，形如：

const expr = new ExpressionParser('@load > 5');
console.log(expr.valueOf({ load: 8 })); // true
複製代碼

由於咱們已經把表達式生成一個二叉樹了，因此只要遞歸遍歷計算每一個二叉樹的值便可，因爲是遞歸計算，越底下的樹越早計算，與咱們開始設計優先級的思路一致。

完整代碼：

const OPEN_PAREN_CODE = 40; // (
const CLOSE_PAREN_CODE = 41; // )
const VARIABLE_BEGIN_CODE = 64; // @，變量開頭
const PERIOD_CODE = 46; // '.'
const SINGLE_QUOTE_CODE = 39; // single quote
const DOUBLE_QUOTE_CODE = 34; // double quotes
const SPACE_CODES = [32, 9, 10, 13]; // space
// 一元運算符
const UNARY_OPERATORS = { '-': true, '!': true, '+': true };
// 二元運算符
const LOGICAL_OPERATORS = ['||', '&&', '===', '!==', '>', '<', '>=', '<=', 'include'];
const BINARY_OPERATORS = {
  '||': 1,
  '&&': 2,
  '===': 6, '!==': 6,
  '<': 7, '>': 7, '<=': 7, '>=': 7,
  '+': 9, '-': 9,
  '*': 10, '/': 10, '%': 10,
  include: 11,
};

// 獲取對象鍵的最大長度
const getMaxKeyLen = function getMaxKeyLen(obj) {
  let max = 0;
  Object.keys(obj).forEach((key) => {
    if (key.length > max && obj.hasOwnProperty(key)) {
      max = key.length;
    }
  });
  return max;
};
const maxBinaryOperatorLength = getMaxKeyLen(BINARY_OPERATORS);
const maxUnaryOperatorLength = getMaxKeyLen(UNARY_OPERATORS);

class ExpressionParser {
  constructor(expr) {
    if (typeof expr !== 'string') {
      throw new Error(`[expression-parser] constructor need a string parameter, but get [${typeof expr}]`);
    }
    this.expr = expr;
  }

  parse() {
    this.index = 0;
    try {
      this.tokens = this.eatExpression();
      if (this.index < this.expr.length) {
        throw new Error(`非法字符"${this.charAt()}"`);
      }
    } catch (error) {
      this.tokens = undefined;
      if (typeof this.onErrorCallback === 'function') {
        this.onErrorCallback(error.message, this.index, this.charAt());
      } else {
        throw new Error(error.message);
      }
    }
    return this;
  }

  eatExpression() {
    let left = this.eatToken();
    let operator = this.eatBinaryOperator();
    // 說明這個運算樹只有左側
    if (!operator) {
      return left;
    }
    let operatorInfo = {
      precedence: this.getOperatorPrecedence(operator), // 獲取運算符優先級
      value: operator,
    };
    let right = this.eatToken();
    if (!right) {
      throw new Error(`"${operator}"運算符後應該爲表達式`);
    }
    const stack = [left, operatorInfo, right];
    // 獲取下一個運算符
    while (operator = this.eatBinaryOperator()) {
      const precedence = this.getOperatorPrecedence(operator);
      // 若是遇到了非法的yuan fa
      if (precedence === 0) {
        break;
      }
      operatorInfo = {
        precedence,
        value: operator,
      };
      while (stack.length > 2 && precedence < stack[stack.length - 2].precedence) {
        right = stack.pop();
        operator = stack.pop().value;
        left = stack.pop();
        const node = this.createNode(operator, left, right);
        stack.push(node);
      }
      const node = this.eatToken();
      if (!node) {
        throw new Error(`"${operator}"運算符後應該爲表達式`);
      }
      stack.push(operatorInfo, node);
    }
    let i = stack.length - 1;
    let node = stack[i];
    while (i > 1) {
      node = this.createNode(stack[i - 1].value, stack[i - 2], node);
      i -= 2;
    }
    return node;
  }

  eatToken() {
    this.eatSpaces();
    const ch = this.charCodeAt();
    if (ch === VARIABLE_BEGIN_CODE) {
      // 變量
      return this.eatVariable();
    } else if (this.isDigit(ch) || ch === PERIOD_CODE) {
      // 數字
      return this.eatNumber();
    } else if (ch === SINGLE_QUOTE_CODE || ch === DOUBLE_QUOTE_CODE) {
      // 字符串
      return this.eatString();
    } else if (ch === OPEN_PAREN_CODE) {
      // 括號
      return this.eatGroup();
    } else {
      // 檢查單操做符 ！+ -
      let toCheck = this.expr.substr(this.index, maxUnaryOperatorLength);
      let toCheckLength;
      while (toCheckLength = toCheck.length) {
        if (
          UNARY_OPERATORS.hasOwnProperty(toCheck) &&
          this.index + toCheckLength <= this.expr.length
        ) {
          this.index += toCheckLength;
          return {
            type: 'UNARY_EXP',
            operator: toCheck,
            argument: this.eatToken(),
          };
        }
        toCheck = toCheck.substr(0, --toCheckLength);
      }
    }
  }

  eatGroup() {
    this.index++; // eat "("
    const node = this.eatExpression();
    this.eatSpaces();
    const ch = this.charCodeAt();
    if (ch !== CLOSE_PAREN_CODE) {
      throw new Error('括號沒有閉合');
    } else {
      this.index++; // eat ")"
      return node;
    }
  }

  eatVariable() {
    const ch = this.charAt();
    this.index++; // eat "@"
    const start = this.index;
    while (this.index < this.expr.length) {
      const ch = this.charCodeAt();
      if (this.isVariablePart(ch)) {
        this.index++;
      } else {
        break;
      }
    }
    const identifier = this.expr.slice(start, this.index);
    return {
      type: 'VARIABLE',
      value: identifier,
      raw: ch + identifier,
    };
  }

  eatNumber() {
    let number = '';
    // 數字開頭
    while (this.isDigit(this.charCodeAt())) {
      number += this.charAt(this.index++);
    }
    // '.'開頭
    if (this.charCodeAt() === PERIOD_CODE) {
      number += this.charAt(this.index++);
      while (this.isDigit(this.charCodeAt())) {
        number += this.charAt(this.index++);
      }
    }
    // 科學計數法
    let ch = this.charAt();
    if (ch === 'e' || ch === 'E') {
      number += this.charAt(this.index++);
      ch = this.charAt();
      if (ch === '+' || ch === '-') {
        number += this.charAt(this.index++);
      }
      while (this.isDigit(this.charCodeAt())) {
        number += this.charAt(this.index++);
      }
      // 若是e + - 後無數字，報錯
      if (!this.isDigit(this.charCodeAt(this.index - 1))) {
        throw new Error(`非法數字(${number}${this.charAt()})，缺乏指數`);
      }
    }

    return {
      type: 'NUMBER',
      value: parseFloat(number),
      raw: number,
    };
  }

  eatString() {
    let str = '';
    const quote = this.charAt(this.index++);
    let closed = false;
    while (this.index < this.expr.length) {
      let ch = this.charAt(this.index++);
      if (ch === quote) {
        closed = true;
        break;
      } else if (ch === '\\') {
        // Check for all of the common escape codes
        ch = this.charAt(this.index++);
        switch (ch) {
          case 'n':
            str += '\n';
            break;
          case 'r':
            str += '\r';
            break;
          case 't':
            str += '\t';
            break;
          case 'b':
            str += '\b';
            break;
          case 'f':
            str += '\f';
            break;
          case 'v':
            str += '\x0B';
            break;
          default:
            str += ch;
        }
      } else {
        str += ch;
      }
    }

    if (!closed) {
      throw new Error(`字符"${str}"缺乏閉合括號`);
    }

    return {
      type: 'STRING',
      value: str,
      raw: quote + str + quote,
    };
  }

  eatBinaryOperator() {
    this.eatSpaces();
    let toCheck = this.expr.substr(this.index, maxBinaryOperatorLength);
    let toCheckLength = toCheck.length;
    while (toCheckLength) {
      if (
        BINARY_OPERATORS.hasOwnProperty(toCheck) &&
        this.index + toCheckLength <= this.expr.length
      ) {
        this.index += toCheckLength;
        return toCheck;
      }
      toCheck = toCheck.substr(0, --toCheckLength);
    }
    return false;
  }

  getOperatorPrecedence(operator) {
    return BINARY_OPERATORS[operator] || 0;
  }

  createNode(operator, left, right) {
    const type = LOGICAL_OPERATORS.indexOf(operator) !== -1
      ? 'LOGICAL_EXP'
      : 'BINARY_EXP';
    return {
      type,
      operator,
      left,
      right,
    };
  }

  isVariablePart(ch) {
    return (ch >= 65 && ch <= 90) || // A...Z
      (ch >= 97 && ch <= 122) || // a...z
      (ch >= 48 && ch <= 57); // 0...9
  }

  isDigit(ch) {
    return ch >= 48 && ch <= 57; // 0...9
  }

  eatSpaces() {
    let ch = this.charCodeAt();
    while (SPACE_CODES.indexOf(ch) !== -1) {
      ch = this.charCodeAt(++this.index);
    }
  }

  onError(callback) {
    this.onErrorCallback = callback;
    return this;
  }

  charAt(index = this.index) {
    return this.expr.charAt(index);
  }

  charCodeAt(index = this.index) {
    return this.expr.charCodeAt(index);
  }

  valueOf(scope = {}) {
    if (this.tokens == null) {
      return undefined;
    }
    const value = this.getNodeValue(this.tokens, scope);
    return !!value;
  }

  getNodeValue(node, scope = {}) {
    const { type, value, left, right, operator } = node;
    if (type === 'VARIABLE') {
      return scope[value];
    } else if (type === 'NUMBER' || type === 'STRING') {
      return value;
    } else if (type === 'LOGICAL_EXP') {
      const leftValue = this.getNodeValue(left, scope);
      // 若是是邏輯運算的&&和||，那麼可能不須要解析右邊的值
      if (operator === '&&' && !leftValue) {
        return false;
      }
      if (operator === '||' && !!leftValue) {
        return true;
      }
      const rightValue = this.getNodeValue(right, scope);
      switch (node.operator) {
        case '&&': return leftValue && rightValue;
        case '||': return leftValue || rightValue;
        case '>': return leftValue > rightValue;
        case '>=': return leftValue >= rightValue;
        case '<': return leftValue < rightValue;
        case '<=': return leftValue <= rightValue;
        case '===': return leftValue === rightValue;
        case '!==': return leftValue !== rightValue;
        case 'include': return leftValue.toString &&
          rightValue.toString &&
          leftValue.toString().indexOf(rightValue.toString()) !== -1;
        // skip default case
      }
    } else if (type === 'BINARY_EXP') {
      const leftValue = this.getNodeValue(left, scope);
      const rightValue = this.getNodeValue(right, scope);
      switch (node.operator) {
        case '+': return leftValue + rightValue;
        case '-': return leftValue - rightValue;
        case '*': return leftValue * rightValue;
        case '/': return leftValue - rightValue;
        case '%': return leftValue % rightValue;
        // skip default case
      }
    } else if (type === 'UNARY_EXP') {
      switch (node.operator) {
        case '!': return !this.getNodeValue(node.argument, scope);
        case '+': return +this.getNodeValue(node.argument, scope);
        case '-': return -this.getNodeValue(node.argument, scope);
        // skip default case
      }
    }
  }
}

const expression = new ExpressionParser('@load + 3');
expression.onError((message, index, ch) => {
  console.log(message, index, ch);
}).parse();
console.log(expression.valueOf({ load: 5 }));
複製代碼

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。