實習期間做爲幫手打雜進行了一段時間的語音識別研究,內容是基於各類分類算法的語音的年齡段識別,總結一下大體框架,基本思想是:html
獲取語料庫
TIMITlinux
提取數據特徵,進行處理
MFCC/i-vector
LDA/PLDA/PCA算法
語料提取,基於分類算法進行分類
SVM/SVR/GMM/GBDT...shell
用到的工具備HTK(C,shell)/Kaldi(C++,shell)/LIBSVM(Python)/scikit-learn(Python)windows
TIMIT語料庫 http://www.cnblogs.com/welen/p/3782804.html框架
PS:工具
Kaldi中tools下有SPHERE文件轉換工具sph2pipe.execode
cd kaldi/kaldi-trunk/tools/sph2pipe_v2.5/
轉換方法orm
sph2pipe -f wav sourcefile targetfile
用re_sph2pipe.py腳本生成sph2pipe轉換文件htm
#encoding="utf-8" import os import os.path rootdir = "E:/vc/TIMIT" timitpath = "/home/zhangzd/kaldi/kaldi-trunk/TIMIT" sph2pipepath = "/home/zhangzd/kaldi/kaldi-trunk/tools/sph2pipe_v2.5/sph2pipe" f = open('E:/vc/data/mfcc/make_sph2pipe_file.txt','w') for root,dirs,files in os.walk(rootdir): for fn in files: if fn[len(fn)-3:len(fn)]=='WAV': sourcefile = timitpath+root[len(rootdir):]+"/"+fn targetfile = root[len(root)-5:len(root)]+"_"+fn s = sph2pipepath + " -f wav " + sourcefile+" "+targetfile+"\n" f.write(s.replace('\\','/')) f.close()
獲得的轉換文件make_sph2pipe_file.txt以下
/home/zhangzd/kaldi/kaldi-trunk/tools/sph2pipe_v2.5/sph2pipe -f wav /home/zhangzd/kaldi/kaldi-trunk/TIMIT/CONVERT/SA1.WAV NVERT_SA1.WAV /home/zhangzd/kaldi/kaldi-trunk/tools/sph2pipe_v2.5/sph2pipe -f wav /home/zhangzd/kaldi/kaldi-trunk/TIMIT/TIMIT/TEST/DR1/FAKS0/SA1.WAV FAKS0_SA1.WAV /home/zhangzd/kaldi/kaldi-trunk/tools/sph2pipe_v2.5/sph2pipe -f wav /home/zhangzd/kaldi/kaldi-trunk/TIMIT/TIMIT/TEST/DR1/FAKS0/SA2.WAV FAKS0_SA2.WAV ...
最後在linux下執行shell命令
#!bin/sh while read line do echo $line done make_sph2pipe_file.txt
PS:
f.write(s.replace('\\','/'))
是由於在windows下用\\
表示路徑,在linux下用/
表示
解析/home/zhangzd/kaldi/kaldi-trunk/egs/wsj/s5/steps/make_mfcc.sh
中提取特徵代碼爲
$cmd JOB=1:$nj $logdir/make_mfcc_${name}.JOB.log \ compute-mfcc-feats --verbose=2 --config=$mfcc_config \ scp,p:$logdir/wav_${name}.JOB.scp ark:- \| \ copy-feats --compress=$compress ark:- \ ark,scp:$mfccdir/raw_mfcc_$name.JOB.ark,$mfccdir/raw_mfcc_$name.JOB.scp \ || exit 1;
即生成MFCC命令爲
compute-mfcc-feats --verbose=2 --config=config.txt scp,p:scp.txt ark:-|copy-feats ark:- ark,scp:mfcc.ark,mfcc.scp
config.txt格式爲
--use-energy=false # only non-default option. ...
scp.txt格式爲
FAKS0_SA1 /home/zhangzd/kaldi/kaldi-trunk/src/test/FAKS0_SA1.WAV
mfcc.scp格式爲
FAKS0_SA1 /home/zhangzd/kaldi/kaldi-trunk/src/test/mfcc.ark
mfcc.ark會自動生成
HTK更爲簡單
HCopy -c config.txt -S scp.txt
config.txt格式爲
SOURCEFORMAT = WAV # Gives the format of the speech files TARGETKIND = MFCC_0_D_A # Identifier of the coefficients to use # Unit = 0.1 micro-second : WINDOWSIZE = 250000.0 # = 25 ms = length of a time frame TARGETRATE = 100000.0 # = 10 ms = frame periodicity NUMCEPS = 12 # Number of MFCC coeffs (here from c1 to c12) USEHAMMING = T # Use of Hamming function for windowing frames PREEMCOEF = 0.97 # Pre-emphasis coefficient NUMCHANS = 26 # Number of filterbank channels CEPLIFTER = 22 # Length of cepstral liftering ENORMALIZE = T
scp.txt格式爲
E:\vc\data\timit\FADG0_SA1.WAV E:\vc\data\mfcc\FADG0_SA1.mfcc E:\vc\data\timit\FADG0_SA2.WAV E:\vc\data\mfcc\FADG0_SA2.mfcc E:\vc\data\timit\FADG0_SI1279.WAV E:\vc\data\mfcc\FADG0_SI1279.mfcc ...