[toc]shell
kaldi中腳本東西比較多,一層嵌一層,不易閱讀。 本文以yesno爲例,直接使用kaldi編譯的工具,書寫簡易訓練步驟,方便學習kaldi工具的使用。 注意:轉載請註明出處。bash
##yesno訓練app
mkdir easy
,後續的操做將在easy文件夾中執行。./path
到easy
文件夾中,./path的做用是能直接調用工具,不用添加工具所在路徑,相似於設置環境變量。yesno/s5/./run.sh
生成訓練所需輸入。s5/data/train
下拷貝wav.scp
到easy目錄下做爲訓練輸入,由於wav.scp
是相對路徑也須要拷貝waves_yesno/
到easy下。s5/input
到easy目錄下。 準備數據結束,能夠寫本身的腳本了。###先給出總體腳本以下:工具
#!/bin/bash . ./path # feature extraction: # a series of light command [ compute-mfcc + copy-feats -> compute-cmvn-stats -> apply-cmvn -> add-deltas ] # the data flow transition [ wav -> mfcc.ark,scp -> cmvn.ark,scp -> delta.ark ] mkdir mfcc compute-mfcc-feats --verbose=2 --config="../conf/mfcc.conf" scp,p:wav.scp ark:- | copy-feats --compress=true ark:- ark,scp:mfcc/mfcc.ark,mfcc/mfcc.scp compute-cmvn-stats scp:mfcc/mfcc.scp ark:mfcc/cmvn.ark apply-cmvn ark:mfcc/cmvn.ark scp:mfcc/mfcc.scp ark:- | add-deltas ark:- ark:mfcc/delta.ark # prepare dict for lang: # input data [ lexicon_nosil.txt lexicon.txt phones.txt ] # output data [ lexicon.txt lexicon_words.txt nonsilence_phones.txt optional_silence.txt silence_phones.txt ] mkdir -p lang/dict cp input/lexicon_nosil.txt lang/dict/lexicon_words.txt cp input/lexicon.txt lang/dict/lexicon.txt cat input/phones.txt | grep -v SIL > lang/dict/nonsilence_phones.txt echo "SIL" > lang/dict/silence_phones.txt echo "SIL" > lang/dict/optional_silence.txt echo "Dictionary preparation succeeded" # generate [ topo ] for acoustic model utils/gen_topo.pl 3 5 2:3 1 > lang/lang/topo # from [lexicoin phone word] -> [L.fst word.txt] for [G.fst train.fst HCLG.fst] utils/prepare_lang.sh --position-dependent-phones false lang/dict "<SIL>" lang/local lang/lang # train monophic acoustic model # 1.from [topo 39] -> 0.mdl tree gmm-init-mono --train-feats=ark:mfcc/delta.ark lang/lang/topo 39 mono/0.mdl mono/tree # 2.from [L.fst 0.mdl tree word.txt text] -> train.fst # compile-train-graphs [options] <tree-in> <model-in> <lexicon-fst-in> <transcriptions-rspecifier> <graphs-wspecifier> compile-train-graphs mono/tree mono/0.mdl lang/lang/L.fst 'ark:sym2int.pl -f 2- lang/lang/words.txt text|' ark:lang/lang/graphs.fsts # 3.from [graphs.fst] equally align the train data -> [ euqal.ali ] # align-equal-compiled <graphs-rspecifier> <features-rspecifier> <alignments-wspecifier> align-equal-compiled ark:lang/lang/graphs.fsts ark:mfcc/delta.ark ark:mono/equal.ali # 4.from [equal.ali delta.ark mdl] -> [ 0.acc ] gmm-acc-stats-ali mono/0.mdl ark:mfcc/delta.ark ark:mono/equal.ali mono/0.acc # 5.from [0.mdl 0.acc] -> [ 1.mdl ] # parameter est: gmm-est mono/0.mdl mono/0.acc mono/1.mdl x=1 numliter=40 numgauss=11 while [ $x -lt $numliter ]; do # 6.from [1.mdl graphs.fst] align the data by new model -> [ 1.ali ] gmm-align-compiled --beam=6 --retry-beam=20 mono/$x.mdl ark:lang/lang/graphs.fsts ark:mfcc/delta.ark ark:mono/$x.ali # 4.from [equal.ali delta.ark mdl] -> [ 0.acc ] gmm-acc-stats-ali mono/$x.mdl ark:mfcc/delta.ark ark:mono/equal.ali mono/$x.acc # 5.from [x.mdl x.acc] -> [ x+1.mdl ] gmm-est --mix-up=$numgauss --power=0.25 mono/$x.mdl mono/$x.acc mono/$[$x+1].mdl numgauss=$[$numgauss+25] x=$[$x+1] done cp mono/$x.mdl mono/final.mdl # Graph compilation # from [input/task.arpabo word.txt] -> G.fst arpa2fst --disambig-symbol=#0 --read-symbol-table=lang/lang/words.txt input/task.arpabo lang/lang/G.fst fstisstochastic lang/lang/G.fst # from [final.mdl G.fst L.fst tree] -> HLCG.fst utils/mkgraph.sh lang/lang mono mono/graph
####首先進行特徵提取:學習
#!/bin/bash . ./path # 特徵提取: compute-mfcc-feats, copy-feats # 輸入爲:wav.scp 輸出爲:mfcc.ark,mfcc.scp compute-mfcc-feats --verbose=2 --config="../conf/mfcc.conf" scp,p:wav.scp ark:- | copy-feats --compress=true ark:- ark,scp:mfcc/mfcc.ark,mfcc/mfcc.scp # 計算均方歸一化矩陣: # 輸入爲:mfcc.ark,mfcc.scp 輸出爲:mfcc/cmvn.ark,mfcc/cmvn.scp compute-cmvn-stats scp:mfcc/mfcc.scp ark,scp:mfcc/cmvn.ark,mfcc/cmvn.scp # 計算一階二階差分: # 輸入爲:mfcc/cmvn.ark,mfcc/cmvn.scp 輸出爲:delta.ark apply-cmvn scp:mfcc/cmvn.scp scp:mfcc/mfcc.scp ark:- | add-deltas ark:- ark:mfcc/delta.ark
####而後,準備訓練所需的詞典,音素文件,詞文件等。 yesno裏準備好了,直接拷貝便可。spa
# prepare dict for lang: # input data [ lexicon_nosil.txt lexicon.txt phones.txt ] # output data [ lexicon.txt lexicon_words.txt nonsilence_phones.txt optional_silence.txt silence_phones.txt ] mkdir -p lang/dict cp input/lexicon_nosil.txt lang/dict/lexicon_words.txt cp input/lexicon.txt lang/dict/lexicon.txt cat input/phones.txt | grep -v SIL > lang/dict/nonsilence_phones.txt echo "SIL" > lang/dict/silence_phones.txt echo "SIL" > lang/dict/optional_silence.txt echo "Dictionary preparation succeeded"
####生成聲學拓撲結構。 生成 L.fst
word.txt
用來生成G.fst
train.fst
HCLG.fst
。其中utils/prepare_lang.sh
所需所有輸入爲上一步生成的dict
文件。.net
# generate [ topo ] for acoustic model utils/gen_topo.pl 3 5 2:3 1 > lang/lang/topo # from [lexicoin phone word] -> [L.fst word.txt] for [G.fst train.fst HCLG.fst] utils/prepare_lang.sh --position-dependent-phones false lang/dict "<SIL>" lang/local lang/lang
####訓練單音素模型code
# train monophic acoustic model # 1.from [topo 39] -> 0.mdl tree gmm-init-mono --train-feats=ark:mfcc/delta.ark lang/lang/topo 39 mono/0.mdl mono/tree # 2.from [L.fst 0.mdl tree word.txt text] -> train.fst # compile-train-graphs [options] <tree-in> <model-in> <lexicon-fst-in> <transcriptions-rspecifier> <graphs-wspecifier> compile-train-graphs mono/tree mono/0.mdl lang/lang/L.fst 'ark:sym2int.pl -f 2- lang/lang/words.txt text|' ark:lang/lang/graphs.fsts # 3.from [graphs.fst] equally align the train data -> [ euqal.ali ] # align-equal-compiled <graphs-rspecifier> <features-rspecifier> <alignments-wspecifier> align-equal-compiled ark:lang/lang/graphs.fsts ark:mfcc/delta.ark ark:mono/equal.ali # 4.from [equal.ali delta.ark mdl] -> [ 0.acc ] gmm-acc-stats-ali mono/0.mdl ark:mfcc/delta.ark ark:mono/equal.ali mono/0.acc # 5.from [0.mdl 0.acc] -> [ 1.mdl ] # parameter est: gmm-est mono/0.mdl mono/0.acc mono/1.mdl x=1 numliter=40 numgauss=11 while [ $x -lt $numliter ]; do # 6.from [1.mdl graphs.fst] align the data by new model -> [ 1.ali ] gmm-align-compiled --beam=6 --retry-beam=20 mono/$x.mdl ark:lang/lang/graphs.fsts ark:mfcc/delta.ark ark:mono/$x.ali # 4.from [equal.ali delta.ark mdl] -> [ 0.acc ] gmm-acc-stats-ali mono/$x.mdl ark:mfcc/delta.ark ark:mono/equal.ali mono/$x.acc # 5.from [x.mdl x.acc] -> [ x+1.mdl ] gmm-est --mix-up=$numgauss --power=0.25 mono/$x.mdl mono/$x.acc mono/$[$x+1].mdl numgauss=$[$numgauss+25] x=$[$x+1] done cp mono/$x.mdl mono/final.mdl
####最後合成語言模型:blog
# Graph compilation # from [input/task.arpabo word.txt] -> G.fst arpa2fst --disambig-symbol=#0 --read-symbol-table=lang/lang/words.txt input/task.arpabo lang/lang/G.fst fstisstochastic lang/lang/G.fst # from [final.mdl G.fst L.fst tree] -> HLCG.fst utils/mkgraph.sh lang/lang mono mono/graph
運行結果:圖片
##創建解碼腳本 解碼指令較簡單一個指令便可:
#Usage: gmm-latgen-faster [options] model-in (fst-in|fsts-rspecifier) features-rspecifier lattice-wspecifier [ words-wspecifier [alignments-wspecifier] ] gmm-latgen-faster --max-active=7000 --beam=13 --lattice-beam=6 --acoustic-scale=0.083333 \ --allow-partial=true --word-symbol-table=lang/lang/words.txt mono/final.mdl \ mono/graph/HCLG.fst ark:mfcc/delta.ark "ark:|gzip -c > result/lat.gz"
能夠獲得識別結果不是很好,不要緊,主要用這個例子來理解kaldi是怎麼樣使用工具的。
轉載請註明出處:https://blog.csdn.net/chinatelecom08/article/details/81392399