使用pocketsphinx模塊,實現喚醒詞功能。html
win10 + Python3.6.2
地址:https://pypi.org/project/pocketsphinx/python
> pip install pocketsphinx C:\Users\qpf10>pip install pocketsphinx Collecting pocketsphinx Downloading https://files.pythonhosted.org/packages/52/53/30b12c3e4de918e32e73e9d635b4c9e1765512acc94ad0b51bfe960b54c9/pocketsphinx-0.1.15-cp36-cp36m-win_amd64.whl (29.1MB) 100% |████████████████████████████████| 29.1MB 104kB/s Installing collected packages: pocketsphinx Successfully installed pocketsphinx-0.1.15
Pocketsphinx是CMU Sphinx語音識別開源工具包的一部分。
這個包爲使用SWIG和Setuptools建立的CMU Sphinxbase和Pocketsphinx庫提供了一個python接口。ide
原文:It's an iterator class for continuous recognition or keyword search from a microphone.
有道翻譯:它是一個迭代器類,用於從麥克風連續識別或關鍵字搜索。工具
在pycharm中運行代碼,完美運行。.net
import os from pocketsphinx import LiveSpeech, get_model_path model_path = get_model_path() speech = LiveSpeech( verbose=False, sampling_rate=16000, buffer_size=2048, no_search=False, full_utt=False, hmm=os.path.join(model_path, 'en-us'), lm=os.path.join(model_path, 'en-us.lm.bin'), dic=os.path.join(model_path, 'cmudict-en-us.dict') ) for phrase in speech: print("phrase:", phrase) print(phrase.segments(detailed=True))
運行後,我說了兩句(發音不標準),一句hello,一句hello word,顯示結果:感受識別的頗有問題。。。雖然說不標準,可是不至於這個狀況。翻譯
Allocating 32 buffers of 2500 samples each phrase: i'm [('<s>', -7, 37837, 37890), ('<sil>', -6, 37891, 38010), ("i'm(2)", -913, 38011, 38064), ('[SPEECH]', -6069, 38065, 38070), ('</s>', 0, 38071, 38078)] phrase: hello or earth [('<s>', -5, 186767, 186778), ('hello', -9386, 186779, 186834), ('or', -3672, 186835, 186854), ('earth', -1192, 186855, 186904), ('</s>', 0, 186905, 186907)]
添加中文語言模型和中文聲學模型code
- 中文相關文件下載地址:
https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/Mandarin/
- 聲學模型:zh_broadcastnews_16k_ptm256_8000.tar.bz2
- 語言模型:zh_broadcastnews_64000_utf8.DMP
- 拼音字典:zh_broadcastnews_utf8.dic
- 拷貝到model文件夾下:
將文件放到PHTHON_HOME的pocketsphinx模塊包下,我這裏是在C:\Python36\Lib\site-packages\pocketsphinx\modelhtm
放代碼:沒有變化,只是加載的目錄文件變了。blog
import os from pocketsphinx import LiveSpeech, get_model_path model_path = get_model_path() speech = LiveSpeech( verbose=False, sampling_rate=16000, buffer_size=2048, no_search=False, full_utt=False, hmm=os.path.join(model_path, 'zh/zh_broadcastnews_16k_ptm256_8000'), lm=os.path.join(model_path, 'zh/zh_broadcastnews_64000_utf8.DMP'), dic=os.path.join(model_path, 'zh/zh_broadcastnews_utf8.dic') ) for phrase in speech: print("phrase:", phrase) print(phrase.segments(detailed=True))
運行後的結果,仍是很是的不許確。雖然說帶點東北口音,可是識別的仍是挺差的。。。說的,你好,你好嗎,滾。還說了挺多其餘的,都沒有很好的效果,就不貼了。
運行後,半天才反應過來開始識別,不知道是什麼緣由,剛開始我還覺得是程序監聽不到語音呢。接口
Allocating 32 buffers of 2500 samples each phrase: 尼 爾 奧 [('<s>', 2, 2645708, 2645714), ('尼', -357, 2645715, 2645771), ('爾(2)', -2, 2645772, 2645811), ('奧', -42088, 2645812, 2645853), ('</s>', 0, 2645854, 2645857)] phrase: 尼 爾 歐盟 [('<s>', -2, 2828757, 2828765), ('尼', -11911, 2828766, 2828782), ('爾(2)', -2519, 2828783, 2828837), ('歐盟', 0, 2828838, 2828868), ('</s>', 0, 2828869, 2828872)] phrase: 不一樣 [('<s>', 1, 3023056, 3023061), ('不一樣', -18424, 3023062, 3023128), ('</s>', 0, 3023129, 3023133)] 還會出現以下這種空的狀況- -!: phrase: [('<s>', -4, 6295811, 6295819), ('++incomplete++', 0, 6295820, 6295973), ('</s>', 0, 6295974, 6296015)]
這裏有個前提:我只把pocketsphinx當作喚醒詞來使用。
具體操做步驟
以小貝爲例,則keyword.txt中的內容以下: 小貝 小魏 巧倍 啊 呵呵 哈哈 麼麼噠
如: 1234.lm 1234.dic
例如: 小貝 x i ao b ei 小魏 x i ao w ei 巧倍 q i ao b ei 啊 a as . . .
import os from pocketsphinx import LiveSpeech, get_model_path model_path = get_model_path() speech = LiveSpeech( verbose=False, sampling_rate=16000, buffer_size=2048, no_search=False, full_utt=False, hmm=os.path.join(model_path, 'zh/zh_broadcastnews_16k_ptm256_8000'), lm=os.path.join(model_path, 'zh/1234.lm'), # 這個目錄位置本身設置 dic=os.path.join(model_path, 'zh/1234.dic') # 同上 ) for phrase in speech: print("phrase:", phrase) print(phrase.segments(detailed=True)) # 只要命中上述關鍵詞的內容,都算對 if str(phrase) in ["小貝", "小魏", "巧倍"]: print("正確識別喚醒詞")