kaldi 運行voxforge例子

時間 2019-11-06

標籤 kaldi 運行 voxforge 例子简体版

原文原文鏈接

-------------------------------------------------------------------------------------------------------------------------------------------------------------linux

運行getdata.sh，下載voxforge語音庫bash

修改cmd.sh queue.pl爲run.pl.服務器

install_srilm.shless

執行該腳本ide

按照網址下載srilm.tgz，而後運行install_srilm.sh測試

提示安裝ui

sudo ./install_sequitur.shthis

sudo apt-get install swigurl

最後，修改run.sh njobs = 10(cpu核心數)spa

運行成功。

----------------------------------------------------------------------------------------------------------------------------------------------------------------------

默認模式離線文件解碼， online_demo/run.sh

離線在線解碼， online_demo/run.sh --test-mode live

安裝錄音機，檢查錄音設備是否有問題

添加PPA：

$ sudo add-apt-repository ppa:audio-recorder/ppa

安裝Audio Recorder：

$ sudo apt update

$ sudo apt install audio-recorder

安裝audacity，分析音頻文件

管理輸入輸出設備

sudo apt-get install pavucontrol

分析腳本run.sh

#!/bin/bash

# Copyright 2012 Vassil Panayotov
# Apache 2.0

# Note: you have to do 'make ext' in ../../../src/ before running this.

# Set the paths to the binaries and scripts needed
KALDI_ROOT=`pwd`/../../..
export PATH=$PWD/../s5/utils/:$KALDI_ROOT/src/onlinebin:$KALDI_ROOT/src/bin:$PATH

data_file="online-data"
data_url="http://sourceforge.net/projects/kaldi/files/online-data.tar.bz2"

# Change this to "tri2a" if you like to test using a ML-trained model
ac_model_type=tri2b_mmi

# Alignments and decoding results are saved in this directory(simulated decoding only)
decode_dir="./work"

# Change this to "live" either here or using command line switch like:
# --test-mode live
test_mode="simulated"

. parse_options.sh

ac_model=${data_file}/models/$ac_model_type
trans_matrix=""
audio=${data_file}/audio

if [ ! -s ${data_file}.tar.bz2 ]; then   #下載語音數據，用於仿真測試用
    echo "Downloading test models and data ..."
    wget -T 10 -t 3 $data_url;

    if [ ! -s ${data_file}.tar.bz2 ]; then
        echo "Download of $data_file has failed!"
        exit 1
    fi
fi

if [ ! -d $ac_model ]; then   #驗證模型是否存在
    echo "Extracting the models and data ..."
    tar xf ${data_file}.tar.bz2
fi

if [ -s $ac_model/matrix ]; then   #設置轉移矩陣
    trans_matrix=$ac_model/matrix
fi

case $test_mode in
    live)#實時在線解碼模式
        echo
        echo -e "  LIVE DEMO MODE - you can use a microphone and say something\n"
        echo "  The (bigram) language model used to build the decoding graph was"
        echo "  estimated on an audio book's text. The text in question is"
        echo "  \"King Solomon's Mines\" (http://www.gutenberg.org/ebooks/2166)."
        echo "  You may want to read some sentences from this book first ..."
        echo
        online-gmm-decode-faster --rt-min=0.5 --rt-max=0.7 --max-active=4000 \
           --beam=12.0 --acoustic-scale=0.0769 $ac_model/model $ac_model/HCLG.fst \
           $ac_model/words.txt '1:2:3:4:5' $trans_matrix;;

    simulated)#離線文件識別
        echo
        echo -e "  SIMULATED ONLINE DECODING - pre-recorded audio is used\n"
        echo "  The (bigram) language model used to build the decoding graph was"
        echo "  estimated on an audio book's text. The text in question is"
        echo "  \"King Solomon's Mines\" (http://www.gutenberg.org/ebooks/2166)."
        echo "  The audio chunks to be decoded were taken from the audio book read"
        echo "  by John Nicholson(http://librivox.org/king-solomons-mines-by-haggard/)"
        echo
        echo "  NOTE: Using utterances from the book, on which the LM was estimated"
        echo "        is considered to be \"cheating\" and we are doing this only for"
        echo "        the purposes of the demo."
        echo
        echo "  You can type \"./run.sh --test-mode live\" to try it using your"
        echo "  own voice!"
        echo
        mkdir -p $decode_dir
        # make an input .scp file
        > $decode_dir/input.scp
        for f in $audio/*.wav; do
            bf=`basename $f`
            bf=${bf%.wav}
            echo $bf $f >> $decode_dir/input.scp
        done
        online-wav-gmm-decode-faster --verbose=1 --rt-min=0.8 --rt-max=0.85\
            --max-active=4000 --beam=12.0 --acoustic-scale=0.0769 \
            scp:$decode_dir/input.scp $ac_model/model $ac_model/HCLG.fst \
            $ac_model/words.txt '1:2:3:4:5' ark,t:$decode_dir/trans.txt \
            ark,t:$decode_dir/ali.txt $trans_matrix;;# ali.txt記錄對齊的狀態與幀之間關係   trans.txt記錄解碼結果的數字

    *)
        echo "Invalid test mode! Should be either \"live\" or \"simulated\"!";
        exit 1;;
esac

# Estimate the error rate for the simulated decoding
if [ $test_mode == "simulated" ]; then
    # Convert the reference transcripts from symbols to word IDs
    sym2int.pl -f 2- $ac_model/words.txt < $audio/trans.txt > $decode_dir/ref.txt #結合words.txt將trans.txt標記參考文本符號  轉成   int符號

    # Compact the hypotheses belonging to the same test utterance
    cat $decode_dir/trans.txt |\
        sed -e 's/^\(test[0-9]\+\)\([^ ]\+\)\(.*\)/\1 \3/' |\
        gawk '{key=$1; $1=""; arr[key]=arr[key] " " $0; } END { for (k in arr) { print k " " arr[k]} }' > $decode_dir/hyp.txt
    #將trans.txt變成相似ref.txt的格式，便於對比分析

   # Finally compute WER
   compute-wer --mode=present ark,t:$decode_dir/ref.txt ark,t:$decode_dir/hyp.txt  #將ref.txt與hyp.txt作對比，計算wer率
fi

Usage: online-gmm-decode-faster [options] <model-in><fst-in> <word-symbol-table> <silence-phones> [<lda-matrix-in>]
                                          #模型      fst狀態機      詞符號表          靜音音素          lda-矩陣

Example: online-gmm-decode-faster --rt-min=0.3 --rt-max=0.5 --max-active=4000 --beam=12.0 --acoustic-scale=0.0769 model HCLG.fst words.txt '1:2:3:4:5' lda-matrix
Options:
  --acoustic-scale            : Scaling factor for acoustic likelihoods (float, default = 0.1)      聲學似然度 伸縮係數
  --batch-size                : Number of feature vectors processed w/o interruption (int, default = 27)  特徵矢量batch數目設定
  --beam                      : Decoding beam.  Larger->slower, more accurate. (float, default = 16)     解碼beam，beam越大，則越慢，越精確
  --beam-delta                : Increment used in decoder [obscure setting] (float, default = 0.5)   解碼器中的增量
  --beam-update               : Beam update rate (float, default = 0.01)  beam更新速率
  --cmn-window                : Number of feat. vectors used in the running average CMN calculation (int, default = 600)   cmn的窗，決定了feat的數目
  --delta-order               : Order of delta computation (int, default = 2)  delta的階數
  --delta-window              : Parameter controlling window for delta computation (actual window size for each delta order is 1 + 2*delta-window-size) (int, default = 2) #delta控制窗
  --hash-ratio                : Setting used in decoder to control hash behavior (float, default = 2) 解碼器中控制hash的設置
  --inter-utt-sil             : Maximum # of silence frames to trigger new utterance (int, default = 50)  #slience最大幀數，超過這個就會激活新的發音
  --left-context              : Number of frames of left context (int, default = 4)#左邊上下文的幀數
  --max-active                : Decoder max active states.  Larger->slower; more accurate (int, default = 2147483647) #解碼器最大的有效狀態，  越大，則越慢，越準確
  --max-beam-update           : Max beam update rate (float, default = 0.05) #最大的beam的更新率
  --max-utt-length            : If the utterance becomes longer than this number of frames, shorter silence is acceptable as an utterance separator (int, default = 1500)#若是發音超過這個幀數，短時靜音是可接受的，做爲分割符
  --min-active                : Decoder min active states (don't prune if #active less than this). (int, default = 20)#解碼器的最小有效狀態
  --min-cmn-window            : Minumum CMN window used at start of decoding (adds latency only at start) (int, default = 100) #最小CMN窗
  --num-tries                 : Number of successive repetitions of timeout before we terminate stream (int, default = 5)#再咱們終止流時，超時連續重複的數目
  --right-context             : Number of frames of right context (int, default = 4) #右上下文幀數
  --rt-max                    : Approximate maximum decoding run time factor (float, default = 0.75)#近似最大解碼動態時間係數
  --rt-min                    : Approximate minimum decoding run time factor (float, default = 0.7)#
  --update-interval           : Beam update interval in frames (int, default = 3)#beam更新間隔

Standard options:
  --config                    : Configuration file to read (this option may be repeated) (string, default = "")
  --help                      : Print out usage message (bool, default = false)
  --print-args                : Print the command line arguments (to stderr) (bool, default = true)
  --verbose                   : Verbose level (higher->more logging) (int, default = 0)

因爲使用的是服務器主板，因此接入的是usb音頻設備。

可是PortAudio沒有檢測成功。

因此從新安裝了下新版的PortAudio,修改/install_portaduio.sh裏面的版本，後來又成功了。

從新make ext.

1.首先檢查linux系統錄音功能是否可用：arecord命令，如arecord -d 10 test.wav，也可使用arecord -l查看當前的錄音設備，通常是都有的

2.檢查portaudio是否安裝成功。可使用tools/install_portaduio.sh安裝，若是以前安裝過一遍，必定要先進入tools/portaudio，而後make clean，不然沒有用。有些時候一些依賴沒有也會安裝，可是程序不可用，能夠進入tools/portaudio，而後./configure，一般狀況alsa顯示no，經過sudo apt-get install libasound-dev能夠解決

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。