基於各類分類算法的說話人識別(年齡段識別)

基於各類分類算法的語音分類(年齡段識別)

概述

實習期間做爲幫手打雜進行了一段時間的語音識別研究,內容是基於各類分類算法的語音的年齡段識別,總結一下大體框架,基本思想是:html

  • 獲取語料庫
    TIMITlinux

  • 提取數據特徵,進行處理
    MFCC/i-vector
    LDA/PLDA/PCA算法

  • 語料提取,基於分類算法進行分類
    SVM/SVR/GMM/GBDT...shell

用到的工具備HTK(C,shell)/Kaldi(C++,shell)/LIBSVM(Python)/scikit-learn(Python)windows

獲取語料庫

TIMIT語料庫 http://www.cnblogs.com/welen/p/3782804.html框架

PS:工具

  • TIMIT的語料語音(即子文件夾下的WAV文件)是SPHERE文件,能夠用Kaldi轉換
  • TIMIT/DOC/SPKRINFO.TXT中爲speaker信息,做爲分類條件

提取數據特徵,進行處理

將SPHERE文件轉換爲WAV文件

Kaldi中tools下有SPHERE文件轉換工具sph2pipe.execode

cd kaldi/kaldi-trunk/tools/sph2pipe_v2.5/

轉換方法orm

sph2pipe -f wav sourcefile targetfile

用re_sph2pipe.py腳本生成sph2pipe轉換文件htm

#encoding="utf-8"
import os
import os.path
rootdir = "E:/vc/TIMIT"
timitpath = "/home/zhangzd/kaldi/kaldi-trunk/TIMIT"
sph2pipepath = "/home/zhangzd/kaldi/kaldi-trunk/tools/sph2pipe_v2.5/sph2pipe"
f = open('E:/vc/data/mfcc/make_sph2pipe_file.txt','w')
for root,dirs,files in os.walk(rootdir):
 for fn in files:
    if fn[len(fn)-3:len(fn)]=='WAV':
        sourcefile = timitpath+root[len(rootdir):]+"/"+fn
        targetfile = root[len(root)-5:len(root)]+"_"+fn
        s = sph2pipepath + " -f wav " + sourcefile+" "+targetfile+"\n"
        f.write(s.replace('\\','/'))
f.close()

獲得的轉換文件make_sph2pipe_file.txt以下

/home/zhangzd/kaldi/kaldi-trunk/tools/sph2pipe_v2.5/sph2pipe -f wav /home/zhangzd/kaldi/kaldi-trunk/TIMIT/CONVERT/SA1.WAV NVERT_SA1.WAV
/home/zhangzd/kaldi/kaldi-trunk/tools/sph2pipe_v2.5/sph2pipe -f wav /home/zhangzd/kaldi/kaldi-trunk/TIMIT/TIMIT/TEST/DR1/FAKS0/SA1.WAV FAKS0_SA1.WAV
/home/zhangzd/kaldi/kaldi-trunk/tools/sph2pipe_v2.5/sph2pipe -f wav /home/zhangzd/kaldi/kaldi-trunk/TIMIT/TIMIT/TEST/DR1/FAKS0/SA2.WAV FAKS0_SA2.WAV
...

最後在linux下執行shell命令

#!bin/sh
while read line
do
  echo $line
done make_sph2pipe_file.txt

PS:
f.write(s.replace('\\','/'))是由於在windows下用\\表示路徑,在linux下用/表示

在Kaldi中生成MFCC特徵

解析/home/zhangzd/kaldi/kaldi-trunk/egs/wsj/s5/steps/make_mfcc.sh
中提取特徵代碼爲

$cmd JOB=1:$nj $logdir/make_mfcc_${name}.JOB.log \
  compute-mfcc-feats --verbose=2 --config=$mfcc_config \
   scp,p:$logdir/wav_${name}.JOB.scp ark:- \| \
    copy-feats --compress=$compress ark:- \
    ark,scp:$mfccdir/raw_mfcc_$name.JOB.ark,$mfccdir/raw_mfcc_$name.JOB.scp \
    || exit 1;

即生成MFCC命令爲

compute-mfcc-feats --verbose=2 --config=config.txt scp,p:scp.txt ark:-|copy-feats ark:- ark,scp:mfcc.ark,mfcc.scp

config.txt格式爲

--use-energy=false   # only non-default option.
...

scp.txt格式爲

FAKS0_SA1 /home/zhangzd/kaldi/kaldi-trunk/src/test/FAKS0_SA1.WAV

mfcc.scp格式爲

FAKS0_SA1 /home/zhangzd/kaldi/kaldi-trunk/src/test/mfcc.ark

mfcc.ark會自動生成

HTK中生成MFCC特徵

HTK更爲簡單

HCopy -c config.txt -S scp.txt

config.txt格式爲

SOURCEFORMAT = WAV               # Gives the format of the speech files
TARGETKIND = MFCC_0_D_A       # Identifier of the coefficients to use

# Unit = 0.1 micro-second :
WINDOWSIZE = 250000.0          # = 25 ms = length of a time frame
TARGETRATE = 100000.0          # = 10 ms = frame periodicity

NUMCEPS = 12               # Number of MFCC coeffs (here from c1 to c12)
USEHAMMING = T           # Use of Hamming function for windowing frames
PREEMCOEF = 0.97                # Pre-emphasis coefficient
NUMCHANS = 26                 # Number of filterbank channels
CEPLIFTER = 22                   # Length of cepstral liftering
ENORMALIZE = T

scp.txt格式爲

E:\vc\data\timit\FADG0_SA1.WAV E:\vc\data\mfcc\FADG0_SA1.mfcc
E:\vc\data\timit\FADG0_SA2.WAV E:\vc\data\mfcc\FADG0_SA2.mfcc
E:\vc\data\timit\FADG0_SI1279.WAV E:\vc\data\mfcc\FADG0_SI1279.mfcc
...

其餘

  • i-vector
  • vad
相關文章
相關標籤/搜索