「 volute 」樹莓派+Node.js造一個有靈魂的語音助手

時間 2020-10-27

標籤 node ios git github web 數據庫 axios 後端 api 跨域欄目 Node.js 简体版

原文原文鏈接

volute 是什麼?

volute(蝸殼)是一個使用 Raspberry Pi+Node.js 製做的語音助手.

什麼是樹莓派?

樹莓派（英語：Raspberry Pi）是基於 Linux 的單片機電腦，由英國樹莓派基金會開發，目的是以低價硬件及自由軟件促進學校的基本計算機科學教育。node

樹莓派每一代均使用博通（Broadcom）出產的 ARM 架構處理器，現在生產的機型內存在 2GB 和 8GB 之間，主要使用 SD 卡或者 TF 卡做爲存儲媒體，配備 USB 接口、HDMI 的視頻輸出（支持聲音輸出）和 RCA 端子輸出，內置 Ethernet/WLAN/Bluetooth 網絡連接的方式（依據型號決定），而且可以使用多種操做系統。產品線型號分爲 A 型、B 型、Zero 型和 ComputeModule 計算卡。ios

簡單的說,這是一臺能夠放到口袋裏的電腦!!

什麼是 Node.js?

Node.js 是一個能執行 Javascript 的環境,一個事件驅動 I/O 的 Javascript 環境,基於 Google 的 V8 引擎.

什麼是人機對話系統 ?

人機對話（Human-Machine Conversation）是指讓機器理解和運用天然語言實現人機通訊的技術。

對話系統大體可分爲 5 個基本模塊：語音識別（ASR）、天然語音理解（NLU）、對話管理（DM）、天然語言生成（NLG）、語音合成（TTS）。git

語音識別（ASR）:完成語音到文本的轉換，將用戶說話的聲音轉化爲語音。
天然語言理解（NLU）:完成對文本的語義解析，提取關鍵信息，進行意圖識別與實體識別。
對話管理（DM）:負責對話狀態維護、數據庫查詢、上下文管理等。
天然語言生成（NLG）:生成相應的天然語言文本。
語音合成（TTS）:將生成的文本轉換爲語音。

材料準備

樹莓派 4B 主板
樹莓派 5V3A TYPE C 接口
微型 USB 麥克風
迷你音箱
16G TF 卡
川宇讀卡器
杜邦線,外殼,散熱片...

樹莓派系統安裝及基礎配置

新的樹莓派不像你買的 Macbook Pro 同樣開機就能用 🐶,想要順利體驗樹莓派,還得一步一步來~github

燒錄操做系統

樹莓派沒有硬盤結構,僅有一個 micro SD 卡插槽用於存儲,所以要把操做系統裝到 micro SD 卡中。web

樹莓派支持許多操做系統,這裏選擇的是官方推薦的 Raspbian，這是一款基於 Debian Linux 的樹莓派專用系統，適用於樹莓派全部的型號。數據庫

安裝系統我用的是 Raspberry Pi Imager 工具爲樹莓派燒錄系統鏡像。axios

基礎配置

要對樹莓派進行配置,首先要啓動系統(咱們安裝的是系統鏡像,可免安裝直接進入),而後將樹莓派鏈接顯示器便可看到系統桌面,我這裏使用的是另外一種方法:後端

使用 IP Scanner 工具掃描出 Raspberry Pi 的 IP

掃描出 IP 後使用 VNC Viewer 工具鏈接進系統

也能夠直接 ssh 鏈接,而後經過 raspi-config 命令進行配置

配置網絡/分辨率/語言/輸入輸出音頻等參數

volute 實現思路

任務調度服務

const fs = require("fs");
const path = require("path");
const Speaker = require("speaker");
const { record } = require("node-record-lpcm16");
const XunFeiIAT = require("./services/xunfeiiat.service");
const XunFeiTTS = require("./services/xunfeitts.service");
const initSnowboy = require("./services/snowboy.service");
const TulingBotService = require("./services/tulingbot.service");
// 任務調度服務
const taskScheduling = {
  // 麥克風
  mic: null,
  speaker: null,
  detector: null,
  // 音頻輸入流
  inputStream: null,
  // 音頻輸出流
  outputStream: null,
  init() {
    // 初始化snowboy
    this.detector = initSnowboy({
      record: this.recordSound.bind(this),
      stopRecord: this.stopRecord.bind(this),
    });
    // 管道流,將麥克風接收到的流傳遞給snowboy
    this.mic.pipe(this.detector);
  },
  start() {
    // 監聽麥克風輸入流
    this.mic = record({
      sampleRate: 16000, // 採樣率
      threshold: 0.5,
      verbose: true,
      recordProgram: "arecord",
    }).stream();
    this.init();
  },
  // 記錄音頻輸入
  recordSound() {
    // 每次記錄前,先中止上次未播放完成的輸出流
    this.stopSpeak();
    console.log("start record");
    // 建立可寫流
    this.inputStream = fs.createWriteStream(
      path.resolve(__dirname, "./assets/input.wav"),
      {
        encoding: "binary",
      }
    );
    // 管道流,將麥克風接受到的輸入流 傳遞給 建立的可寫流
    this.mic.pipe(this.inputStream);
  },
  // 中止音頻輸入
  stopRecord() {
    if (this.inputStream) {
      console.log("stop record");
      // 解綁this.mac綁定的管道流
      this.mic.unpipe(this.inputStream);
      this.mic.unpipe(this.detector);
      process.nextTick(() => {
        // 銷燬輸入流
        this.inputStream.destroy();
        this.inputStream = null;
        // 從新初始化
        this.init();
        // 調用語音聽寫服務
        this.speech2Text();
      });
    }
  },
  // speech to text
  speech2Text() {
    // 實例化 語音聽寫服務
    const iatService = new XunFeiIAT({
      onReply: (msg) => {
        console.log("msg", msg);
        // 回調,調用聊天功能
        this.onChat(msg);
      },
    });
    iatService.init();
  },
  // 聊天->圖靈機器人
  onChat(text) {
    // 實例化聊天機器人
    TulingBotService.start(text).then((res) => {
      console.log(res);
      // 接收到聊天消息,調用語音合成服務
      this.text2Speech(res);
    });
  },
  // text to speech
  text2Speech(text) {
    // 實例化 語音合成服務
    const ttsService = new XunFeiTTS({
      text,
      onDone: () => {
        console.log("onDone");
        this.onSpeak();
      },
    });
    ttsService.init();
  },
  // 播放,音頻輸出
  onSpeak() {
    // 實例化speaker,用於播放語音
    this.speaker = new Speaker({
      channels: 1,
      bitDepth: 16,
      sampleRate: 16000,
    });
    // 建立可讀流
    this.outputStream = fs.createReadStream(
      path.resolve(__dirname, "./assets/output.wav")
    );
    // this is just to activate the speaker, 2s delay
    this.speaker.write(Buffer.alloc(32000, 10));
    // 管道流,將輸出流傳遞給speaker進行播放
    this.outputStream.pipe(this.speaker);
    this.outputStream.on("end", () => {
      this.outputStream = null;
      this.speaker = null;
    });
  },
  // 中止播放
  stopSpeak() {
    this.outputStream && this.outputStream.unpipe(this.speaker);
  },
};
taskScheduling.start();

熱詞喚醒 Snowboy

語音助手須要像市面上的設備同樣，須要喚醒。若是沒有喚醒步驟，一直作監聽的話，對存儲資源和網絡鏈接的需求是很是大的。api

Snowboy 是一款高度可定製的喚醒詞檢測引擎(Hotwords Detection Library)，能夠用於實時嵌入式系統，經過訓練熱詞以後，能夠離線運行，而且功耗很低。當前，它能夠運行在 Raspberry Pi、（Ubuntu）Linux 和 Mac OS X 系統上。跨域

const path = require("path");
const snowboy = require("snowboy");
const models = new snowboy.Models();

// 添加訓練模型
models.add({
  file: path.resolve(__dirname, "../configs/volute.pmdl"),
  sensitivity: "0.5",
  hotwords: "volute",
});

// 初始化 Detector 對象
const detector = new snowboy.Detector({
  resource: path.resolve(__dirname, "../configs/common.res"),
  models: models,
  audioGain: 1.0,
  applyFrontend: false,
});

/**
 * 初始化 initSnowboy
 * 實現思路:
 * 1. 監聽到熱詞,進行喚醒,開始錄音
 * 2. 錄音期間,有聲音時,重置silenceCount參數
 * 3. 錄音期間,未接受到聲音時,對silenceCount進行累加,當累加值大於3時,中止錄音
 */
function initSnowboy({ record, stopRecord }) {
  const MAX_SILENCE_COUNT = 3;
  let silenceCount = 0,
    speaking = false;
  /**
   * silence事件回調,沒聲音時觸發
   */
  const onSilence = () => {
    console.log("silence");
    if (speaking && ++silenceCount > MAX_SILENCE_COUNT) {
      speaking = false;
      stopRecord && stopRecord();
      detector.off("silence", onSilence);
      detector.off("sound", onSound);
      detector.off("hotword", onHotword);
    }
  };
  /**
   * sound事件回調,有聲音時觸發
   */
  const onSound = () => {
    console.log("sound");
    if (speaking) {
      silenceCount = 0;
    }
  };
  /**
   * hotword事件回調,監聽到熱詞時觸發
   */
  const onHotword = (index, hotword, buffer) => {
    if (!speaking) {
      silenceCount = 0;
      speaking = true;
      record && record();
    }
  };
  detector.on("silence", onSilence);
  detector.on("sound", onSound);
  detector.on("hotword", onHotword);
  return detector;
}

module.exports = initSnowboy;

語音聽寫科大訊飛 API

語音轉文字使用的是訊飛開放平臺的語音聽寫服務.它能夠將短音頻（≤60 秒）精準識別成文字，除中文普通話和英文外，支持 25 種方言和 12 個語種，實時返回結果，達到邊說邊返回的效果。

require("dotenv").config();
const fs = require("fs");
const WebSocket = require("ws");
const { resolve } = require("path");
const { createAuthParams } = require("../utils/auth");

class XunFeiIAT {
  constructor({ onReply }) {
    super();
    // websocket 鏈接
    this.ws = null;
    // 返回結果,解析後的消息文字
    this.message = "";
    this.onReply = onReply;
    // 須要進行轉換的輸入流 語音文件
    this.inputFile = resolve(__dirname, "../assets/input.wav");
    // 接口 入參
    this.params = {
      host: "iat-api.xfyun.cn",
      path: "/v2/iat",
      apiKey: process.env.XUNFEI_API_KEY,
      secret: process.env.XUNFEI_SECRET,
    };
  }
  // 生成websocket鏈接
  generateWsUrl() {
    const { host, path } = this.params;
    // 接口鑑權,參數加密
    const params = createAuthParams(this.params);
    return `ws://${host}${path}?${params}`;
  }
  // 初始化
  init() {
    const reqUrl = this.generateWsUrl();
    this.ws = new WebSocket(reqUrl);
    this.initWsEvent();
  }
  // 初始化websocket事件
  initWsEvent() {
    this.ws.on("open", this.onOpen.bind(this));
    this.ws.on("error", this.onError);
    this.ws.on("close", this.onClose);
    this.ws.on("message", this.onMessage.bind(this));
  }
  /**
   *  websocket open事件,觸發表示已成功創建鏈接
   */
  onOpen() {
    console.log("open");
    this.onPush(this.inputFile);
  }
  onPush(file) {
    this.pushAudioFile(file);
  }
  // websocket 消息接收 回調
  onMessage(data) {
    const payload = JSON.parse(data);
    if (payload.data && payload.data.result) {
      // 拼接消息結果
      this.message += payload.data.result.ws.reduce(
        (acc, item) => acc + item.cw.map((cw) => cw.w),
        ""
      );
      // status 2表示結束
      if (payload.data.status === 2) {
        this.onReply(this.message);
      }
    }
  }
  // websocket 關閉事件
  onClose() {
    console.log("close");
  }
  // websocket 錯誤事件
  onError(error) {
    console.log(error);
  }
  /**
   * 解析語音文件,將語音以二進制流的形式傳送給後端
   */
  pushAudioFile(audioFile) {
    this.message = "";
    // 發送須要的載體參數
    const audioPayload = (statusCode, audioBase64) => ({
      common:
        statusCode === 0
          ? {
              app_id: "5f6cab72",
            }
          : undefined,
      business:
        statusCode === 0
          ? {
              language: "zh_cn",
              domain: "iat",
              ptt: 0,
            }
          : undefined,
      data: {
        status: statusCode,
        format: "audio/L16;rate=16000",
        encoding: "raw",
        audio: audioBase64,
      },
    });
    const chunkSize = 9000;
    // 建立buffer,用於存儲二進制數據
    const buffer = Buffer.alloc(chunkSize);
    // 打開語音文件
    fs.open(audioFile, "r", (err, fd) => {
      if (err) {
        throw err;
      }

      let i = 0;
      // 以二進制流的形式遞歸發送
      function readNextChunk() {
        fs.read(fd, buffer, 0, chunkSize, null, (errr, nread) => {
          if (errr) {
            throw errr;
          }
          // nread表示文件流已讀完,發送傳輸結束標識(status=2)
          if (nread === 0) {
            this.ws.send(
              JSON.stringify({
                data: { status: 2 },
              })
            );

            return fs.close(fd, (err) => {
              if (err) {
                throw err;
              }
            });
          }

          let data;
          if (nread < chunkSize) {
            data = buffer.slice(0, nread);
          } else {
            data = buffer;
          }

          const audioBase64 = data.toString("base64");
          const payload = audioPayload(i >= 1 ? 1 : 0, audioBase64);
          this.ws.send(JSON.stringify(payload));
          i++;
          readNextChunk.call(this);
        });
      }

      readNextChunk.call(this);
    });
  }
}

module.exports = XunFeiIAT;

聊天機器人圖靈機器人 API

圖靈機器人 API V2.0 是基於圖靈機器人平臺語義理解、深度學習等核心技術，爲廣大開發者和企業提供的在線服務和開發接口。

目前 API 接口可調用聊天對話、語料庫、技能三大模塊的語料：

聊天對話是指平臺免費提供的近 10 億條公有對話語料，知足用戶對話娛樂需求；

語料庫是指用戶在平臺上傳的私有語料，僅供我的查看使用，幫助用戶最便捷的搭建專業領域次的語料。

技能服務是指平臺打包的 26 種實用服務技能。涵蓋生活、出行、購物等多個領域，一站式知足用戶需求。

require("dotenv").config();
const axios = require("axios");

// 太簡單了..懶得解釋 🐶

const TulingBotService = {
  requestUrl: "http://openapi.tuling123.com/openapi/api/v2",
  start(text) {
    return new Promise((resolve) => {
      axios
        .post(this.requestUrl, {
          reqType: 0,
          perception: {
            inputText: {
              text,
            },
          },
          userInfo: {
            apiKey: process.env.TULING_BOT_API_KEY,
            userId: process.env.TULING_BOT_USER_ID,
          },
        })
        .then((res) => {
          // console.log(JSON.stringify(res.data, null, 2));
          resolve(res.data.results[0].values.text);
        });
    });
  },
};

module.exports = TulingBotService;

語音合成科大訊飛 API

語音合成流式接口將文字信息轉化爲聲音信息，同時提供了衆多極具特點的發音人（音庫）供您選擇。

該語音能力是經過 Websocket API 的方式給開發者提供一個通用的接口。Websocket API 具有流式傳輸能力，適用於須要流式數據傳輸的 AI 服務場景。相較於 SDK，API 具備輕量、跨語言的特色；相較於 HTTP API，Websocket API 協議有原生支持跨域的優點。

require("dotenv").config();
const fs = require("fs");
const WebSocket = require("ws");
const { resolve } = require("path");
const { createAuthParams } = require("../utils/auth");

class XunFeiTTS {
  constructor({ text, onDone }) {
    super();
    this.ws = null;
    // 要轉換的文字
    this.text = text;
    this.onDone = onDone;
    // 轉換後的語音文件
    this.outputFile = resolve(__dirname, "../assets/output.wav");
    // 接口入參
    this.params = {
      host: "tts-api.xfyun.cn",
      path: "/v2/tts",
      appid: process.env.XUNFEI_APP_ID,
      apiKey: process.env.XUNFEI_API_KEY,
      secret: process.env.XUNFEI_SECRET,
    };
  }
  // 生成websocket鏈接
  generateWsUrl() {
    const { host, path } = this.params;
    const params = createAuthParams(this.params);
    return `ws://${host}${path}?${params}`;
  }
  // 初始化
  init() {
    const reqUrl = this.generateWsUrl();
    console.log(reqUrl);
    this.ws = new WebSocket(reqUrl);
    this.initWsEvent();
  }
  // 初始化websocket事件
  initWsEvent() {
    this.ws.on("open", this.onOpen.bind(this));
    this.ws.on("error", this.onError);
    this.ws.on("close", this.onClose);
    this.ws.on("message", this.onMessage.bind(this));
  }
  /**
   *  websocket open事件,觸發表示已成功創建鏈接
   */
  onOpen() {
    console.log("open");
    this.onSend();
    if (fs.existsSync(this.outputFile)) {
      fs.unlinkSync(this.outputFile);
    }
  }
  // 發送要轉換的參數信息
  onSend() {
    const frame = {
      // 填充common
      common: {
        app_id: this.params.appid,
      },
      // 填充business
      business: {
        aue: "raw",
        auf: "audio/L16;rate=16000",
        vcn: "xiaoyan",
        tte: "UTF8",
      },
      // 填充data
      data: {
        text: Buffer.from(this.text).toString("base64"),
        status: 2,
      },
    };
    this.ws.send(JSON.stringify(frame));
  }
  // 保存轉換後的語音結果
  onSave(data) {
    fs.writeFileSync(this.outputFile, data, { flag: "a" });
  }
  // websocket 消息接收 回調
  onMessage(data, err) {
    if (err) return;
    const res = JSON.parse(data);
    if (res.code !== 0) {
      this.ws.close();
      return;
    }
    // 接收消息結果並進行保存
    const audio = res.data.audio;
    const audioBuf = Buffer.from(audio, "base64");
    this.onSave(audioBuf);
    if (res.code == 0 && res.data.status == 2) {
      this.ws.close();
      this.onDone();
    }
  }
  onClose() {
    console.log("close");
  }
  onError(error) {
    console.log(error);
  }
}

module.exports = XunFeiTTS;