itchat我的練習語音與文本圖靈測試例程

時間 2019-12-08

標籤 itchat 我的練習語音文本圖靈測試例程简体版

原文原文鏈接

背景介紹

itchat是一個開源的微信我的號接口，使用python調用微信從未如此簡單。python

使用不到三十行的代碼，你就能夠完成一個可以處理全部信息的微信機器人。git

官方文檔參考https://itchat.readthedocs.io/zh/latest/github

最近要作一個自動應答機器人，得到用戶消息GUI+語義分析+機器學習給出答案。算法

準備工做

須要安裝ffmpeg(百度搜索官網，下載windows版解壓後把bin目錄添加到系統變量的path中)
pip安裝 pydub，SpeechRecognitionjson

pip install pydub
pip install SpeechRecognition

綁定消息

GUI這部分使用微信的itchat接口，安裝和新手教程能夠本身參考官方文檔。windows

綁定語音消息回覆的方式爲：api

@itchat.msg_register(RECORDING)
def tuling_reply(msg):

其中用的是RECORDING是由於以前代碼最開始有from itchat.content import *，不然就要使用itchat.content.RECORDINGbash

關於@修飾符的做用，網上百度就有，說下本身的思考：服務器

    @de
    def func1:
    ----- 等價於 ------
    func1 = de( func1 )微信

Python解釋器讀到函數修飾符「@」的時候，後面步驟會是這樣了：

1. 去調用de函數，de函數的入口參數就是那個叫「func1」的函數；

2. de函數被執行，入口參數的（也就是func1函數）會被調用（執行）；

換言之，修飾符帶的那個函數的入口參數，就是下面的那個整個的函數。

參考https://blog.csdn.net/972301/article/details/59537712和 https://blog.csdn.net/fwenzhou/article/details/8733857

因此咱們使用@的時候，itchat.msg_register這個函數就被執行了，咱們定義的tuling_reply做爲參數傳了進去，因此纔會讀取到消息就用這個函數處理消息

語音識別

因爲微信保存的語音消息都是mp3格式，看了一圈發現只有騰訊語音識別支持mp3，以前嘗試過騰訊一句話識別語音API，可是官方沒有最新的例程，而且竟然不一樣部分用的是不一樣版本的文檔說明，致使我鑑權一直失敗。到後來仔細研讀了下，本身寫了代碼，鑑權應該是經過了，可是返回的消息是x‘\98'這樣的一箇中文字符，而且解碼會失敗，這才發現多是由於騰訊的只支持中文，雖然我在這個隨筆的例子是中文語音識別，但我實際項目要作的是英文語音識別。不過在這中間也學到了一些東西，好比加密算法的使用，還有python3的二進制和字符串消息的轉換關係。

 1 import binascii
 2 import hashlib
 3 import hmac
 4 import urllib.parse
 5 import urllib.request
 6 import time
 7 import random
 8 import base64
 9 
10 def asr(msg):
11     msg['Text'](msg['FileName'])#保存mp3語音
12     timeData = str(int(time.time())) # 時間戳
13     nonceData = int(random.random()*10000) # Nonce，官網給的信息：隨機正整數，與 Timestamp 聯合起來， 用於防止重放攻擊
14     with open(msg['FileName'], 'rb') as f:
15         voiceData = f.read()#讀取mp3語音，得到byte數據，格式是b'\x..'
16     os.remove(msg['FileName'])#刪除mp3語音
17     DataLenData = len(voiceData)#讀取未base64編碼以前的文件長度
18     tmp = int(timeData)#time stamp
19     signDictData = {#須要注意的是字典的key值要按照ascii碼升序排序，並不必定是字典序，可使用sorted(signDictData.keys())來查看ascii碼排序結果
20         'Action' : actionData,
21         'Data': base64.b64encode(voiceData).decode('utf8'),#base64編碼，編碼後是二進制，再用decode解碼
22         # 'Data': voiceData,
23         'DataLen': DataLenData,
24         'EngSerViceType': EngSerViceTypeData,
25         'Nonce' : nonceData,
26         'ProjectId':0,
27         'Region': 'ap-shanghai',
28         'SecretId' : secretId,
29         # 'SignatureMethod': 'HmacSHA256',#加密算法可選，不指定這個參數默認是HmacSHA1加密
30         'SourceType': SourceTypeData,
31         'SubServiceType': SubServiceTypeData,
32         'Timestamp' : tmp,
33         'UsrAudioKey': UsrAudioKeyData,
34         'Version': versionData,
35         'VoiceFormat': VoiceFormatData
36     }
37     #   請求方法 + 請求主機 +請求路徑 + ? + 請求字符串
38     requestStr = "%s%s%s%s%s"%(requestMethod,uriData,"/","?",dictToStr(signDictData))
39     # signData = urllib.parse.quote(sign(secretKey,requestStr,'HmacSHA1'))
40     #生成簽名字符的時候必定是使用的沒有通過urlencode編碼的requestStr字符串，下面的加了encode的就是把字符串變成byte，sha1是算法，decode是把二進制解碼爲字符串。digest()是把hmac.new()的結果解析成字符串，而後通過base64編碼爲byte，再解碼爲字符串
41     signData = binascii.b2a_base64(hmac.new(secretKey.encode('utf-8'), requestStr.encode('utf-8'), hashlib.sha1).digest())[:-1].decode()
42     # 上述操做是實現簽名，下面即進行請求
43     # 先創建請求參數, 此處參數只在簽名時多了一個Signature
44     actionArgs = {
45         'Action' : actionData,
46         'Data': base64.b64encode(voiceData).decode('utf8'),
47         # 'Data': voiceData,
48         'DataLen': DataLenData,
49         'EngSerViceType': EngSerViceTypeData,
50         'Nonce' : nonceData,
51         'ProjectId':0,
52         'Region': 'ap-shanghai',
53         'SecretId' : secretId,
54         'SourceType': SourceTypeData,
55         'SubServiceType': SubServiceTypeData,
56         'Timestamp' : tmp,
57         'UsrAudioKey': UsrAudioKeyData,
58         'Version': versionData,
59         'VoiceFormat': VoiceFormatData,
60         "Signature": signData
61     }
62     # 根據uri構建請求的url
63     requestUrl = "https://%s/?"%(uriData)
64     # 將請求的url和參數進行拼接，使用urlencode會修改掉參數中的/和=等符號的表示方式
65     requestUrlWithArgs = requestUrl + urllib.parse.urlencode(actionArgs)
66 
67     # actionArgs = signDictData #這是深複製，兩個字典就是一個字典
68     # actionArgs["Signature"] = signData
69 
70     # # 根據uri構建請求的url
71     # requestUrl = "https://%s/?"%(uriData)
72     # # 將請求的url和參數進行拼接
73     # requestUrlWithArgs = requestUrl + dictToStr(actionArgs)
74 
75     # 得到response
76     responseData = urllib.request.urlopen(requestUrlWithArgs).read().decode("utf-8")# 根據uri構建
77     # return json.loads(responseData)["Response"]["Error"]["Message"] #處理錯誤消息
78     return json.loads(responseData)["Response"]["Result"]#處理正確消息

讀取語音文件和騰訊API語音識別

後來一直在找能不能用別的語音api，因爲百度的參考文檔最多，我在其中就發現你們爲了可以把音頻發到百度語音api上，就使用了pydub對原音頻文件進行了轉碼，這樣咱們就能夠發送wav格式的語音，因爲原本是想識別英文呢語音的，因此我仍是嘗試外國公司的api。

嘗試了微軟語音識別，7天免費的那個，官方文檔對於REST接口的參考太少了，而且都不是python的，這時候我在github上發現了一個SpeechRecognition項目，原來覺得是隻有谷歌語音識別的接口，嘗試了一下結果果真被牆了，用了代理以後仍是沒法訪問，而後我就看了github主頁的Transcribe an audio file,在裏面找到了不止一個接口，其中就有Microsoft Bing Voice Recognition的例程，調用很是簡單，只須要語音文件和密鑰，而且支持語音文件的格式轉碼，自動給你轉成對應必應api的語音參數格式，各位能夠本身進入r.recognize_bing()函數定義，在裏面詳細描述瞭如何使用必應語音服務，在這裏把原話複製下來供參考：

"""
Performs speech recognition on ``audio_data`` (an ``AudioData`` instance), using the Microsoft Bing Speech API.

The Microsoft Bing Speech API key is specified by ``key``. Unfortunately, these are not available without `signing up for an account <https://azure.microsoft.com/en-ca/pricing/details/cognitive-services/speech-api/>`__ with Microsoft Azure.

To get the API key, go to the `Microsoft Azure Portal Resources <https://portal.azure.com/>`__ page, go to "All Resources" > "Add" > "See All" > Search "Bing Speech API > "Create", and fill in the form to make a "Bing Speech API" resource. On the resulting page (which is also accessible from the "All Resources" page in the Azure Portal), go to the "Show Access Keys" page, which will have two API keys, either of which can be used for the `key` parameter. Microsoft Bing Speech API keys are 32-character lowercase hexadecimal strings.

The recognition language is determined by ``language``, a BCP-47 language tag like ``"en-US"`` (US English) or ``"fr-FR"`` (International French), defaulting to US English. A list of supported language values can be found in the `API documentation <https://docs.microsoft.com/en-us/azure/cognitive-services/speech/api-reference-rest/bingvoicerecognition#recognition-language>`__ under "Interactive and dictation mode".

Returns the most likely transcription if ``show_all`` is false (the default). Otherwise, returns the `raw API response <https://docs.microsoft.com/en-us/azure/cognitive-services/speech/api-reference-rest/bingvoicerecognition#sample-responses>`__ as a JSON dictionary.

Raises a ``speech_recognition.UnknownValueError`` exception if the speech is unintelligible. Raises a ``speech_recognition.RequestError`` exception if the speech recognition operation failed, if the key isn't valid, or if there is no internet connection.
"""

Bing語音識別使用說明

因此咱們只須要得到正確的密鑰，調用這個函數就能夠啦，要注意的是中文語音識別須要在傳入參數中設置language="zh-CN"

須要注意的是微軟一元試用雲服務的活動不支持必應語音識別這個模塊，須要訪問全球標準的網站才行，試用免費帳戶須要VISA或者master信用卡，也可使用具備office服務的公司帳戶登陸註冊，就不須要信用卡信息了。

代碼

全代碼以下：

# -*- coding: UTF-8 -*-
import requests
import itchat
import json
from itchat.content import *
import os
import speech_recognition as sr
from pydub import AudioSegment

def get_response_tuling(msg):
    # 這裏咱們就像在「3. 實現最簡單的與圖靈機器人的交互」中作的同樣
    # 構造了要發送給服務器的數據
    apiUrl = 'http://www.tuling123.com/openapi/api'
    data = {
        'key'    : '8edce3ce905a4c1dbb965e6b35c3834d',
        'info'   : msg,
        'userid' : 'wechat-robot',
    }
    try:
        r = requests.post(apiUrl, data=data).json()
        # 字典的get方法在字典沒有'text'值的時候會返回None而不會拋出異常
        return r.get('text')
    # 爲了防止服務器沒有正常響應致使程序異常退出，這裏用try-except捕獲了異常
    # 若是服務器沒能正常交互（返回非json或沒法鏈接），那麼就會進入下面的return
    except:
        # 將會返回一個None
        return

def asr(msg):
    #語音消息識別轉文字輸出
    msg['Text'](msg['FileName'])
    song = AudioSegment.from_mp3(msg['FileName'])
    song.export("tmp.wav", format="wav")
    r = sr.Recognizer()
    with sr.AudioFile('tmp.wav') as source:
        audio = r.record(source) # read the entire audio file
    os.remove('tmp.wav')
    os.remove(msg['FileName'])
    # recognize speech using Microsoft Bing Voice Recognition
    BING_KEY = "======修改爲你本身的密鑰======="  # Microsoft Bing Voice Recognition API keys 32-character lowercase hexadecimal strings
    try:
        text = r.recognize_bing(audio, key=BING_KEY,language="zh-CN")
        print("Microsoft Bing Voice Recognition thinks you said " + text)
        return text
    except sr.UnknownValueError:
        print("Microsoft Bing Voice Recognition could not understand audio")
    except sr.RequestError as e:
        print("Could not request results from Microsoft Bing Voice Recognition service; {0}".format(e))

@itchat.msg_register(TEXT)#由於以前把itchat.content所有import了，裏面有TEXT變量
def tuling_reply_text(msg):
    # 註冊文字消息獲取後的處理
    # 爲了保證在圖靈Key出現問題的時候仍舊能夠回覆，這裏設置一個默認回覆
    defaultReply = 'I received a: ' + msg['Text']
    return get_response_tuling(msg['Text']) or defaultReply

@itchat.msg_register(RECORDING)
def tuling_reply(msg):
    # 註冊語音消息獲取後的處理
    # 爲了保證在圖靈Key出現問題的時候仍舊能夠回覆，這裏設置一個默認回覆
    defaultReply = 'I received a: ' + msg['Type']

    # 若是圖靈Key出現問題，那麼reply將會是None
    asrMessage = asr(msg)
    return get_response_tuling(asrMessage) or defaultReply

# 爲了讓實驗過程更加方便（修改程序不用屢次掃碼），咱們使用熱啓動hotReload=True
itchat.auto_login(hotReload=True)
itchat.run()