在瀏覽器中進行深度學習：TensorFlow.js (七）遞歸神經網絡（RNN）

時間 2019-11-17

標籤瀏覽器進行深度學習 tensorflow.js tensorflow 遞歸神經網絡 rnn 欄目瀏覽器简体版

原文原文鏈接

介紹

上一篇博客咱們討論了CNN，卷積神經網絡。CNN普遍應用於圖像相關的深度學習場景中。然而CNN也有一些限制：javascript

很難應用於序列數據
輸入數據和輸出數據都是固定長度
不理解上下文

這些問題就能夠由RNN來處理了。java

神經網絡除了CNN以外的另外一個常見的類別是RNN，遞歸/循環神經網絡。這裏的R實際上是兩種神經網絡，Recurrent：時間遞歸， Recusive：結構遞歸。時間遞歸神經網絡的神經元間鏈接構成有向圖，而結構遞歸神經網絡利用類似的神經網絡結構遞歸構造更爲複雜的深度網絡。咱們大部分時間講的RNN指的是前一種，時間遞歸神經網絡。git

RNN的結構如上圖所示，爲了解決上下文的問題，RNN網路中，輸入Xt，輸出ht，並且輸出ht回反饋回處理單元A。上圖右邊是隨時間展開的序列圖。tn時間的輸出hn反饋成爲tn+1時間的輸入，hn和Xn+1一塊兒成爲tn+1時間的輸入。這樣也就保留了上下文對模型的影響，以更好的針對時間序列建模。github

以下圖所示，RNN能夠支持不一樣的輸入輸出序列。算法

RNN有一些變體，常見的是LSTM和GRUapi

LSTM即Long Short Memory Network，長短時記憶網絡。它實際上是屬於RNN的一種變種，能夠說它是爲了克服RNN沒法很好處理遠距離依賴而提出的。網絡

GRU即Gated Recurrent Unit，是LSTM的一個變體。GRU保持了LSTM的效果同時又使結構更加簡單，因此它也很是流行。app

RNN能夠有效的應用在如下的領域中：dom

音樂做曲
圖像捕捉
語音識別
時序異常處理
股價預測
文本翻譯

例子：用RNN實現加法運算

咱們這裏介紹一個利用RNN來實現加法運算的例子，源代碼在這裏，或者去個人Codepen運行個人例子。這個例子最先源自keras。機器學習

科學世界的論證（reasoning）方式有兩種，演繹（deduction）和概括（induction）。

所謂的演繹就是根據已有的理論，經過邏輯推導，得出結論。經典的就是歐幾里得的幾何原理，利用基本的公設和公理演繹出了整個歐氏幾何的大廈。而機器學習則是典型的概括法，數據先行，現有觀測數據，而後利用數學建模，找到最可以解釋當前觀察數據的公式。這就像是理論物理學家和實驗物理學家，理論物理學家利用演繹，根據理論推出萬物運行的道理，實驗物理學家經過實驗數據，反推理論，證明或者否認理論。固然兩種方法是相輔相成的，都是科學的利器。

好了咱們回到加法的例子，這裏咱們要用機器學習的方法來教會計算機加法，記得用概括而不是演繹。由於計算機是很擅長演繹的，加法的演繹是全部計算的基礎之一，定義0，1，2=1+1，而後演繹出全部的加法。這裏用概括，然計算機算法經過已有的加法例子數據找到如何計算加法。這樣作固然不是最有效的，可是頗有趣。

咱們來看例子吧。

首先是一個須要一個字符表的類來管理字符到張量的映射：

class CharacterTable {
  /**
   * Constructor of CharacterTable.
   * @param chars A string that contains the characters that can appear
   *   in the input.
   */
  constructor(chars) {
    this.chars = chars;
    this.charIndices = {};
    this.indicesChar = {};
    this.size = this.chars.length;
    for (let i = 0; i < this.size; ++i) {
      const char = this.chars[i];
      if (this.charIndices[char] != null) {
        throw new Error(`Duplicate character '${char}'`);
      }
      this.charIndices[this.chars[i]] = i;
      this.indicesChar[i] = this.chars[i];
    }
  }

  /**
   * Convert a string into a one-hot encoded tensor.
   *
   * @param str The input string.
   * @param numRows Number of rows of the output tensor.
   * @returns The one-hot encoded 2D tensor.
   * @throws If `str` contains any characters outside the `CharacterTable`'s
   *   vocabulary.
   */
  encode(str, numRows) {
    const buf = tf.buffer([numRows, this.size]);
    for (let i = 0; i < str.length; ++i) {
      const char = str[i];
      if (this.charIndices[char] == null) {
        throw new Error(`Unknown character: '${char}'`);
      }
      buf.set(1, i, this.charIndices[char]);
    }
    return buf.toTensor().as2D(numRows, this.size);
  }

  encodeBatch(strings, numRows) {
    const numExamples = strings.length;
    const buf = tf.buffer([numExamples, numRows, this.size]);
    for (let n = 0; n < numExamples; ++n) {
      const str = strings[n];
      for (let i = 0; i < str.length; ++i) {
        const char = str[i];
        if (this.charIndices[char] == null) {
          throw new Error(`Unknown character: '${char}'`);
        }
        buf.set(1, n, i, this.charIndices[char]);
      }
    }
    return buf.toTensor().as3D(numExamples, numRows, this.size);
  }

  /**
   * Convert a 2D tensor into a string with the CharacterTable's vocabulary.
   *
   * @param x Input 2D tensor.
   * @param calcArgmax Whether to perform `argMax` operation on `x` before
   *   indexing into the `CharacterTable`'s vocabulary.
   * @returns The decoded string.
   */
  decode(x, calcArgmax = true) {
    return tf.tidy(() => {
      if (calcArgmax) {
        x = x.argMax(1);
      }
      const xData = x.dataSync(); // TODO(cais): Performance implication?
      let output = "";
      for (const index of Array.from(xData)) {
        output += this.indicesChar[index];
      }
      return output;
    });
  }
}

這個類存儲了加法運算所能用到的全部字符，「0123456789+ 」，其中空格是佔位符，兩位數的2會變成「 2」。

爲了實現字符到索引的雙向映射，這個類保存了兩個表，charIndices是字符到索引，indicesChar是索引到字符。

encode方法把一個加法字符串映射爲一個one hot的tensor：

this.charTable.encode("1+2",3).print()；

Tensor
    [[0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
     [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0],
     [0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0]]

this.charTable.encode("3",1).print()

Tensor
     [[0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0],]

例如對於「1+2」等於「3」，輸入和輸出的張量如上所示。

decode是上述encode方法的逆向操做，把張量映射爲字符串。

而後進行數據生成：

function generateData(digits, numExamples, invert) {
  const digitArray = ["0", "1", "2", "3", "4", "5", "6", "7", "8", "9"];
  const arraySize = digitArray.length;

  const output = [];
  const maxLen = digits + 1 + digits;

  const f = () => {
    let str = "";
    while (str.length < digits) {
      const index = Math.floor(Math.random() * arraySize);
      str += digitArray[index];
    }
    return Number.parseInt(str);
  };

  const seen = new Set();
  while (output.length < numExamples) {
    const a = f();
    const b = f();
    const sorted = b > a ? [a, b] : [b, a];
    const key = sorted[0] + "`" + sorted[1];
    if (seen.has(key)) {
      continue;
    }
    seen.add(key);

    // Pad the data with spaces such that it is always maxLen.
    const q = `${a}+${b}`;
    const query = q + " ".repeat(maxLen - q.length);
    let ans = (a + b).toString();
    // Answer can be of maximum size `digits + 1`.
    ans += " ".repeat(digits + 1 - ans.length);

    if (invert) {
      throw new Error("invert is not implemented yet");
    }
    output.push([query, ans]);
  }
  return output;
}

生成測試數據的方法，輸入是加法的位數和生成多少例子。對於兩位數的加法，輸入補齊爲5個字符，輸出補齊到3個字符，空位用空格。

generateData(2,10,false);

["24+38", "62 "]
["2+0  ", "2  "]
["86+62", "148"]
["36+91", "127"]
["66+51", "117"]
["47+40", "87 "]
["97+96", "193"]
["98+83", "181"]
["45+30", "75 "]
["88+75", "163"]

下一步須要把生成的數據轉化成張量：

function convertDataToTensors(data, charTable, digits) {
  const maxLen = digits + 1 + digits;
  const questions = data.map(datum => datum[0]);
  const answers = data.map(datum => datum[1]);
  return [
    charTable.encodeBatch(questions, maxLen),
    charTable.encodeBatch(answers, digits + 1)
  ];
}

生成的數據是一個2個元素的列表，第一個元素是問題張量，第二個元素是答案張量。

數據生成好了，下一步就是建立神經網絡模型：

function createAndCompileModel(
  layers,
  hiddenSize,
  rnnType,
  digits,
  vocabularySize
) {
  const maxLen = digits + 1 + digits;

  const model = tf.sequential();
  switch (rnnType) {
    case "SimpleRNN":
      model.add(
        tf.layers.simpleRNN({
          units: hiddenSize,
          recurrentInitializer: "glorotNormal",
          inputShape: [maxLen, vocabularySize]
        })
      );
      break;
    case "GRU":
      model.add(
        tf.layers.gru({
          units: hiddenSize,
          recurrentInitializer: "glorotNormal",
          inputShape: [maxLen, vocabularySize]
        })
      );
      break;
    case "LSTM":
      model.add(
        tf.layers.lstm({
          units: hiddenSize,
          recurrentInitializer: "glorotNormal",
          inputShape: [maxLen, vocabularySize]
        })
      );
      break;
    default:
      throw new Error(`Unsupported RNN type: '${rnnType}'`);
  }
  model.add(tf.layers.repeatVector({ n: digits + 1 }));
  switch (rnnType) {
    case "SimpleRNN":
      model.add(
        tf.layers.simpleRNN({
          units: hiddenSize,
          recurrentInitializer: "glorotNormal",
          returnSequences: true
        })
      );
      break;
    case "GRU":
      model.add(
        tf.layers.gru({
          units: hiddenSize,
          recurrentInitializer: "glorotNormal",
          returnSequences: true
        })
      );
      break;
    case "LSTM":
      model.add(
        tf.layers.lstm({
          units: hiddenSize,
          recurrentInitializer: "glorotNormal",
          returnSequences: true
        })
      );
      break;
    default:
      throw new Error(`Unsupported RNN type: '${rnnType}'`);
  }
  model.add(
    tf.layers.timeDistributed({
      layer: tf.layers.dense({ units: vocabularySize })
    })
  );
  model.add(tf.layers.activation({ activation: "softmax" }));
  model.compile({
    loss: "categoricalCrossentropy",
    optimizer: "adam",
    metrics: ["accuracy"]
  });
  return model;
}

這裏的幾個主要的參數是：

rnnType， RNN的網絡類型，這裏有三種，SimpleRNN，GRU和LSTM
hiddenSize，隱藏層的Size，決定了隱藏層神經單元的規模，
digits，參與加法運算的數位
vocabularySize，字符表的大小，咱們的例子裏應該是12，也就是sizeof（「0123456789+ 」）

網絡的構成以下圖, 圖中 digits=2，hiddenSize=128：

repeatVector層把第一個RNN層的輸入重複digits+1次，增長一個維數，輸出適配到要預測的size上。這裏是構建RNN網絡的一個須要設計的點。

以後跟着的是另外一個RNN層。

而後是一個有12（vocabularySize）個單元全聯接層，使用timeDistributed對RNN的輸出打包，獲得的輸出是的形狀爲 [digits+1,12] 。TimeDistributed層的做用就是把Dense層應用到128個具體的向量上，對每個向量進行了一個Dense操做。RNN之因此可以進行多對多的映射，也是利用了個這個功能。

最後是一個激活activate層，使用softmax。由於這個網絡本質上是一個分類，也就是把全部的輸入分類到 digit+1 * 12 的分類。表示的digit+1位的數字。也就是說兩個n位數字的加法，結果是n+1位數字。

最後一步，使用「Adam」算法做爲優化器，交叉熵做爲損失函數，編譯整個模型。

模型構建好了，接下來就能夠進行訓練了。

訓練的代碼以下：

class AdditionRNN {
  constructor(digits, trainingSize, rnnType, layers, hiddenSize) {
    // Prepare training data.
    const chars = '0123456789+ ';
    this.charTable = new CharacterTable(chars);
    console.log('Generating training data');
    const data = generateData(digits, trainingSize, false);
    const split = Math.floor(trainingSize * 0.9);
    this.trainData = data.slice(0, split);
    this.testData = data.slice(split);
    [this.trainXs, this.trainYs] =
        convertDataToTensors(this.trainData, this.charTable, digits);
    [this.testXs, this.testYs] =
        convertDataToTensors(this.testData, this.charTable, digits);
    this.model = createAndCompileModel(
        layers, hiddenSize, rnnType, digits, chars.length);
  }
  
  async train(iterations, batchSize, numTestExamples) {
    console.log("training started!");
    const lossValues = [];
    const accuracyValues = [];
    const examplesPerSecValues = [];
    for (let i = 0; i < iterations; ++i) {
      console.log("training iter " + i);
      const beginMs = performance.now();
      const history = await this.model.fit(this.trainXs, this.trainYs, {
        epochs: 1,
        batchSize,
        validationData: [this.testXs, this.testYs],
        yieldEvery: 'epoch'
      });
      const elapsedMs = performance.now() - beginMs;
      const examplesPerSec = this.testXs.shape[0] / (elapsedMs / 1000);
      const trainLoss = history.history['loss'][0];
      const trainAccuracy = history.history['acc'][0];
      const valLoss = history.history['val_loss'][0];
      const valAccuracy = history.history['val_acc'][0];
      
      document.getElementById('trainStatus').textContent =
          `Iteration ${i}: train loss = ${trainLoss.toFixed(6)}; ` +
          `train accuracy = ${trainAccuracy.toFixed(6)}; ` +
          `validation loss = ${valLoss.toFixed(6)}; ` +
          `validation accuracy = ${valAccuracy.toFixed(6)} ` +
          `(${examplesPerSec.toFixed(1)} examples/s)`;

      lossValues.push({'epoch': i, 'loss': trainLoss, 'set': 'train'});
      lossValues.push({'epoch': i, 'loss': valLoss, 'set': 'validation'});
      accuracyValues.push(
          {'epoch': i, 'accuracy': trainAccuracy, 'set': 'train'});
      accuracyValues.push(
          {'epoch': i, 'accuracy': valAccuracy, 'set': 'validation'});
      examplesPerSecValues.push({'epoch': i, 'examples/s': examplesPerSec});
    }
  }
}

AdditionRNN類實現了模型訓練的主要邏輯。

在構造函數重生成訓練數據，其中百分之九十的數據用於訓練，百分之十用於測試驗證。

在訓練中，循環調用model.fit方法進行訓練。

訓練好完畢，咱們就可使用該模型進行預測了。

const input = demo.charTable.encode("10+20",5).expandDims(0);
const result = model.predict(input);
result.print()
console.log("10+20 = " + demo.charTable.decode(result.as2D(result.shape[1], result.shape[2])));

Tensor
    [[[0.0010424, 0.0037433, 0.2403527, 0.4702294, 0.2035268, 0.0607058, 0.0166195, 0.0021113, 0.0012174, 0.0000351, 0.0000088, 0.0004075],
      [0.3456545, 0.0999702, 0.1198046, 0.0623895, 0.0079124, 0.0325381, 0.2000451, 0.0856998, 0.0255273, 0.0050597, 0.000007 , 0.0153919],
      [0.0002507, 0.0000023, 0.0000445, 0.0002062, 0.0000298, 0.0000679, 0.0000946, 0.0000056, 7e-7     , 2e-7     , 1e-7     , 0.9992974]]]
10+20 = 40

使用charTable的encode方法把「10+20」編碼轉換爲Tensor，由於輸入爲多個數據，因此用expandDims方法把它增長一個維度，變成只有一個數據的Batch

對於預測結果，只要看每個Tensor行中最大的數據，就能找到對應的預測數據了。例如上面的例子對應的結果是：「30空格」。固然此次模型的訓練數據比較小，沒能正確預測也很正常。

最後咱們看看這個RNN網絡到底好很差用。使用digits=2，hiddenSize=128，trainIterations=300，batchSize=128

在這個例子中，當訓練數據達到2000的時候，LSTM和GRU都能取得比較好的訓練結果。2000意味着大概20%的兩位數加法的數據。也就是說當掌握了大概20%的數據後，咱們就可以比較有把握的預測其它的兩位數的加法了。當訓練數據是100的時候（1%），SimpleRNN也竟然有43%的準確率，能夠說是至關不錯的模型了。

好了，可是爲何呢？爲何RNN能夠用來預測加法呢？這個和時間序列又有什麼關係呢？若是你和我有一樣的疑問，請閱讀這兩篇論文：LEARNING TO EXECUTE，Sequence to Sequence Learning with Neural Networks

參考：