-------------------------------------------------------------------------------------------------------------------------------------------------------------linux
運行getdata.sh,下載voxforge語音庫bash
修改cmd.sh queue.pl爲run.pl.服務器
install_srilm.shless
執行該腳本ide
按照網址下載srilm.tgz,而後運行install_srilm.sh測試
提示安裝ui
sudo ./install_sequitur.shthis
sudo apt-get install swigurl
最後,修改run.sh njobs = 10(cpu核心數)spa
運行成功。
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
默認模式離線文件解碼, online_demo/run.sh
離線在線解碼 , online_demo/run.sh --test-mode live
安裝錄音機,檢查錄音設備是否有問題
#!/bin/bash # Copyright 2012 Vassil Panayotov # Apache 2.0 # Note: you have to do 'make ext' in ../../../src/ before running this. # Set the paths to the binaries and scripts needed KALDI_ROOT=`pwd`/../../.. export PATH=$PWD/../s5/utils/:$KALDI_ROOT/src/onlinebin:$KALDI_ROOT/src/bin:$PATH data_file="online-data" data_url="http://sourceforge.net/projects/kaldi/files/online-data.tar.bz2" # Change this to "tri2a" if you like to test using a ML-trained model ac_model_type=tri2b_mmi # Alignments and decoding results are saved in this directory(simulated decoding only) decode_dir="./work" # Change this to "live" either here or using command line switch like: # --test-mode live test_mode="simulated" . parse_options.sh ac_model=${data_file}/models/$ac_model_type trans_matrix="" audio=${data_file}/audio if [ ! -s ${data_file}.tar.bz2 ]; then #下載語音數據,用於仿真測試用 echo "Downloading test models and data ..." wget -T 10 -t 3 $data_url; if [ ! -s ${data_file}.tar.bz2 ]; then echo "Download of $data_file has failed!" exit 1 fi fi if [ ! -d $ac_model ]; then #驗證模型是否存在 echo "Extracting the models and data ..." tar xf ${data_file}.tar.bz2 fi if [ -s $ac_model/matrix ]; then #設置轉移矩陣 trans_matrix=$ac_model/matrix fi case $test_mode in live)#實時在線解碼模式 echo echo -e " LIVE DEMO MODE - you can use a microphone and say something\n" echo " The (bigram) language model used to build the decoding graph was" echo " estimated on an audio book's text. The text in question is" echo " \"King Solomon's Mines\" (http://www.gutenberg.org/ebooks/2166)." echo " You may want to read some sentences from this book first ..." echo online-gmm-decode-faster --rt-min=0.5 --rt-max=0.7 --max-active=4000 \ --beam=12.0 --acoustic-scale=0.0769 $ac_model/model $ac_model/HCLG.fst \ $ac_model/words.txt '1:2:3:4:5' $trans_matrix;; simulated)#離線文件識別 echo echo -e " SIMULATED ONLINE DECODING - pre-recorded audio is used\n" echo " The (bigram) language model used to build the decoding graph was" echo " estimated on an audio book's text. The text in question is" echo " \"King Solomon's Mines\" (http://www.gutenberg.org/ebooks/2166)." echo " The audio chunks to be decoded were taken from the audio book read" echo " by John Nicholson(http://librivox.org/king-solomons-mines-by-haggard/)" echo echo " NOTE: Using utterances from the book, on which the LM was estimated" echo " is considered to be \"cheating\" and we are doing this only for" echo " the purposes of the demo." echo echo " You can type \"./run.sh --test-mode live\" to try it using your" echo " own voice!" echo mkdir -p $decode_dir # make an input .scp file > $decode_dir/input.scp for f in $audio/*.wav; do bf=`basename $f` bf=${bf%.wav} echo $bf $f >> $decode_dir/input.scp done online-wav-gmm-decode-faster --verbose=1 --rt-min=0.8 --rt-max=0.85\ --max-active=4000 --beam=12.0 --acoustic-scale=0.0769 \ scp:$decode_dir/input.scp $ac_model/model $ac_model/HCLG.fst \ $ac_model/words.txt '1:2:3:4:5' ark,t:$decode_dir/trans.txt \ ark,t:$decode_dir/ali.txt $trans_matrix;;# ali.txt記錄對齊的狀態與幀之間關係 trans.txt記錄解碼結果的數字 *) echo "Invalid test mode! Should be either \"live\" or \"simulated\"!"; exit 1;; esac # Estimate the error rate for the simulated decoding if [ $test_mode == "simulated" ]; then # Convert the reference transcripts from symbols to word IDs sym2int.pl -f 2- $ac_model/words.txt < $audio/trans.txt > $decode_dir/ref.txt #結合words.txt將trans.txt標記參考文本符號 轉成 int符號 # Compact the hypotheses belonging to the same test utterance cat $decode_dir/trans.txt |\ sed -e 's/^\(test[0-9]\+\)\([^ ]\+\)\(.*\)/\1 \3/' |\ gawk '{key=$1; $1=""; arr[key]=arr[key] " " $0; } END { for (k in arr) { print k " " arr[k]} }' > $decode_dir/hyp.txt
#將trans.txt變成相似ref.txt的格式,便於對比分析 # Finally compute WER compute-wer --mode=present ark,t:$decode_dir/ref.txt ark,t:$decode_dir/hyp.txt #將ref.txt與hyp.txt作對比,計算wer率 fi
Usage: online-gmm-decode-faster [options] <model-in><fst-in> <word-symbol-table> <silence-phones> [<lda-matrix-in>]
#模型 fst狀態機 詞符號表 靜音音素 lda-矩陣 Example: online-gmm-decode-faster --rt-min=0.3 --rt-max=0.5 --max-active=4000 --beam=12.0 --acoustic-scale=0.0769 model HCLG.fst words.txt '1:2:3:4:5' lda-matrix Options: --acoustic-scale : Scaling factor for acoustic likelihoods (float, default = 0.1) 聲學似然度 伸縮係數 --batch-size : Number of feature vectors processed w/o interruption (int, default = 27) 特徵矢量batch數目設定 --beam : Decoding beam. Larger->slower, more accurate. (float, default = 16) 解碼beam,beam越大,則越慢,越精確 --beam-delta : Increment used in decoder [obscure setting] (float, default = 0.5) 解碼器中的增量 --beam-update : Beam update rate (float, default = 0.01) beam更新速率 --cmn-window : Number of feat. vectors used in the running average CMN calculation (int, default = 600) cmn的窗,決定了feat的數目 --delta-order : Order of delta computation (int, default = 2) delta的階數 --delta-window : Parameter controlling window for delta computation (actual window size for each delta order is 1 + 2*delta-window-size) (int, default = 2) #delta控制窗 --hash-ratio : Setting used in decoder to control hash behavior (float, default = 2) 解碼器中控制hash的設置 --inter-utt-sil : Maximum # of silence frames to trigger new utterance (int, default = 50) #slience最大幀數,超過這個就會激活新的發音 --left-context : Number of frames of left context (int, default = 4)#左邊上下文的幀數 --max-active : Decoder max active states. Larger->slower; more accurate (int, default = 2147483647) #解碼器最大的有效狀態, 越大,則越慢,越準確 --max-beam-update : Max beam update rate (float, default = 0.05) #最大的beam的更新率 --max-utt-length : If the utterance becomes longer than this number of frames, shorter silence is acceptable as an utterance separator (int, default = 1500)#若是發音超過這個幀數,短時靜音是可接受的,做爲分割符 --min-active : Decoder min active states (don't prune if #active less than this). (int, default = 20)#解碼器的最小有效狀態 --min-cmn-window : Minumum CMN window used at start of decoding (adds latency only at start) (int, default = 100) #最小CMN窗 --num-tries : Number of successive repetitions of timeout before we terminate stream (int, default = 5)#再咱們終止流時,超時連續重複的數目 --right-context : Number of frames of right context (int, default = 4) #右上下文幀數 --rt-max : Approximate maximum decoding run time factor (float, default = 0.75)#近似最大解碼動態時間係數 --rt-min : Approximate minimum decoding run time factor (float, default = 0.7)# --update-interval : Beam update interval in frames (int, default = 3)#beam更新間隔 Standard options: --config : Configuration file to read (this option may be repeated) (string, default = "") --help : Print out usage message (bool, default = false) --print-args : Print the command line arguments (to stderr) (bool, default = true) --verbose : Verbose level (higher->more logging) (int, default = 0)
因爲使用的是服務器主板,因此接入的是usb音頻設備。
可是PortAudio沒有檢測成功。
因此從新安裝了下新版的PortAudio,修改/install_portaduio.sh裏面的版本,後來又成功了。
從新make ext.
1.首先檢查linux系統錄音功能是否可用:arecord命令,如arecord -d 10 test.wav,也可使用arecord -l查看當前的錄音設備,通常是都有的
2.檢查portaudio是否安裝成功。可使用tools/install_portaduio.sh安裝,若是以前安裝過一遍,必定要先進入tools/portaudio,而後make clean,不然沒有用。有些時候一些依賴沒有也會安裝,可是程序不可用,能夠進入tools/portaudio,而後./configure,一般狀況alsa顯示no,經過sudo apt-get install libasound-dev能夠解決