機器學習之隱馬爾科夫模型(HMM)原理及Python實現 (大章節)

HMM

隱馬爾可夫模型(hidden Markov model, HMM)是可用於標註問題的統計學模型,是生成模型。html

本章節內容參考李航博士的《統計學習方法》
本章節添加了一些結論性結果的推導過程。python

1. 從一個天然語言處理的例子開始

例若有三個個句子:
句子一:我/名詞 看見/動詞 貓/名詞
句子二:貓/名詞 是/動詞 可愛的/形容詞
句子三:我/名詞 是/動詞 可愛的/形容詞
通常只能觀察到具體的詞,因此像"我 看見 貓 …"是觀測集合,而詞性如"名詞 動詞 形容詞 …"是狀態序列web

Q Q 是全部可能的狀態集合, V V 是全部可能的觀測集合:算法

Q = { q 1 , q 2 , . . . , q N } , V = { v 1 , v 2 , . . . , v M } Q = \{q_1, q_2, ..., q_N\}, V=\{v_1, v_2, ..., v_M\} 編程

其中, N是可能的狀態數,M是可能的觀測數。app

例如: Q = { } V = { } N = 3 , M = 5 Q=\{名詞,動詞,形容詞 \},V=\{我, 看見, 貓, 是,可愛的\},N=3, M=5 dom

I I 是長度爲 T T 的狀態序列, O O 是對應的觀測序列:svg

I = { i 1 , i 2 , . . . , i T } , O = { o 1 , o 2 , . . . , o T } I = \{i_1, i_2,..., i_T \}, O=\{o_1, o_2,..., o_T\} 函數

例如: I = ( ) O = ( ) I=(名詞,動詞,名詞), O=(我,看見,貓) 學習

A A 是狀態轉移矩陣:

A = [ a i j ] N N (1) A=[a_{ij}]_{N*N} \tag1

其中,

a i j = p ( i t + 1 = q j i t = q i ) , i = 1 , 2 , . . . , N ; j = 1 , 2 , . . . , N (2) a_{ij} = p(i_{t+1}=q_j|i_t=q_i), i=1,2,...,N; j=1,2,...,N \tag2

例如:

轉態轉移機率 名詞 動詞 形容詞
名詞 0 1 0
動詞 1/3 0 2/3
形容詞 1/3 1/3 1/3

B B 是觀測機率矩陣,也就是發射矩陣:

B = [ b j ( k ) ] N M (3) B=[b_j(k)]_{N*M} \tag3

其中,

b j ( k ) = p ( o t = v k i t = q j ) , k = 1 , 2 , . . . , M ; j = 1 , 2 , . . . , N (4) b_j(k) = p(o_t=v_k|i_t=q_j), k=1,2,...,M; j=1,2,...,N \tag4

例如:

觀測矩陣機率 看見 可愛的
名詞 1 0 1 0 0
動詞 0 1 0 1 0
形容詞 0 0 0 0 1

π \pi 是初始狀態機率向量:

π = ( π i ) (5) \pi = (\pi_i) \tag5

其中,

π i = p ( i 1 = q i ) , i = 1 , 2 , . . . , N (6) \pi_i = p(i_1 = q_i), i = 1,2,...,N \tag6

A , B A,B π \pi 是HMM的參數,用 λ \lambda 表示:

λ = ( A , B , π ) (7) \lambda = (A,B,\pi) \tag7

例如:

名詞 動詞 形容詞
1 0 0

隱馬爾可夫的三個基本問題
1.機率計算問題。給定模型 λ = ( A , B , π ) \lambda=(A,B,\pi) 和觀測序列 O = ( o 1 , o 2 , . . . , o T ) O=(o_1,o_2,...,o_T) ,計算在已知模型參數的狀況下,觀測序列的機率,即 p ( O λ ) p(O|\lambda)
2.學習問題。已知觀測序列 O = ( o 1 , o 2 , . . . , o T ) O=(o_1,o_2,...,o_T) ,估計模型參數 λ = ( A , B , π ) \lambda=(A,B,\pi) ,使 p ( O λ ) p(O|\lambda) 最大。
3.預測問題,也稱解碼問題。已知模型 λ = ( A , B , π ) \lambda=(A,B,\pi) O = ( o 1 , o 2 , . . . , o T ) O=(o_1,o_2,...,o_T) ,求條件機率最大 p ( I O ) p(I|O) 最大的狀態序列 I = ( i 1 , i 2 , . . . , i T ) I=(i_1,i_2,...,i_T)

2. 機率預測問題

機率問題預測用直接計算法,計算複雜度高,能夠採用動態規劃形式的前向和後向算法下降計算複雜度。
爲了表示方便,記:

( o 1 : t ) = ( o 1 , o 2 , . . . , o n ) ; ( o t : T ) = ( o t , o t + 1 , . . . , o T ) (o_{1:t} )= (o_1,o_2,...,o_n); (o_{t_:T})=(o_t,o_{t+1},...,o_T)

2.1 前向算法

接下來就是解前向機率 p ( i t , o 1 : t λ ) p(i_t,o_{1:t}|\lambda)

p ( i t , o 1 : t λ ) = i t 1 p ( i t 1 , i t , o 1 : t 1 , o t λ ) = i t 1 p ( o t i t 1 , i t , o 1 : t 1 , λ ) p ( i t i t 1 , o 1 : t 1 , λ ) p ( i t 1 , o 1 : t 1 λ ) \begin{aligned} p(i_t,o_{1:t}|\lambda) &=\sum_{i_{t-1}} p(i_{t-1},i_t,o_{1:t-1},o_t|\lambda) \\ &=\sum_{i_{t-1}} p(o_t|i_{t-1},i_t,o_{1:t-1},\lambda)p(i_t|i_{t-1},o_{1:t-1},\lambda)p(i_{t-1},o_{1:t-1}|\lambda) \end{aligned}

由隱馬爾科夫的條件獨立性假設可得:

p ( o t i t 1 , i t , o 1 : t 1 , λ ) = p ( o t i t , λ ) p(o_t|i_{t-1},i_t,o_{1:t-1},\lambda) = p(o_t|i_t,\lambda)

p ( i t i t 1 , o 1 : t 1 , λ ) = p ( i t i t 1 , λ ) p(i_t|i_{t-1},o_{1:t-1},\lambda)=p(i_t|i_{t-1},\lambda)

p ( i t , o 1 : t λ ) = i t 1 p ( o t i t , λ ) p ( i t i t 1 , λ ) p ( i t 1 , o 1 : t 1 λ ) = [ i t 1 p ( i t 1 , o 1 : t 1 λ ) p ( i t i t 1 , λ ) ] p ( o t i t , λ ) p(i_t,o_{1:t}|\lambda)=\sum_{i_{t-1}} p(o_t|i_t,\lambda) p(i_t|i_{t-1},\lambda)p(i_{t-1},o_{1:t-1}|\lambda)=[\sum_{i_{t-1} } p(i_{t-1},o_{1:t-1}|\lambda) p(i_t|i_{t-1},\lambda)] p(o_t|i_t,\lambda)

設:

α t + 1 ( i ) = p ( o 1 : t + 1 , i t + 1 = q i λ ) (8) \alpha_{t+1}(i) = p(o_{1:t+1},i_{t+1}=q_i|\lambda) \tag8

且:

p ( i t + 1 = q i i t = q j , λ ) ] = a j i p(i_{t+1}=q_i|i_t=q_j,\lambda)] = a_{ji}

p ( o t + 1 i t + 1 , λ ) = b i ( o t + 1 ) p(o_{t+1}|i_{t+1},\lambda)=b_i(o_{t+1})

則:

α t + 1 ( i ) = [ j = 1 N α t ( j ) a j i ] b i ( o t + 1 ) (9) \alpha_{t+1}(i)=[\sum_{j=1}^N \alpha_t(j)a_{ji}]b_i(o_{t+1}) \tag9

因此前向算法就可迭代進行。

前向算法:
1.初值

α 1 ( i ) = π i b i ( o 1 ) \alpha_1(i) = \pi_ib_i(o_1)

2.遞推 t = 1 , 2 , . . . , T 1 t=1,2,...,T-1

α t + 1 ( i ) = [ j = 1 N α t ( j ) a j i ] b i ( o t + 1 ) \alpha_{t+1}(i)=[\sum_{j=1}^N \alpha_t(j)a_{ji}]b_i(o_{t+1})

3.終止
p ( O λ ) = i = 1 N α T ( i ) p(O|\lambda) = \sum_{i=1}^N \alpha_T(i)

2.2 後向算法

後向算法解決後向機率 p ( o t + 1 : T i t , λ ) p(o_{t+1:T}|i_t, \lambda) :

p ( o t + 1 : T i t , λ ) = i t + 1 p ( i t + 1 , o t + 1 , o t + 2 : T i t , λ ) = i t + 1 p ( o t + 2 : T i t + 1 , i t , o t + 1 , λ ) p ( o t + 1 i t + 1 , i t , λ ) p ( i t + 1 i t , λ ) \begin{aligned} p(o_{t+1:T}|i_t, \lambda) &= \sum_{i_{t+1}} p(i_{t+1},o_{t+1},o_{t+2:T} | i_t, \lambda) \\ &= \sum_{i_{t+1}} p(o_{t+2:T}|i_{t+1}, i_t, o_{t+1}, \lambda) p(o_{t+1}|i_{t+1}, i_t, \lambda) p(i_{t+1}|i_t,\lambda)\\ \end{aligned}

由隱馬爾科夫的條件獨立假設得:

p ( o t + 2 : T i t + 1 , i t , o t + 1 , λ ) = p ( o t + 2 : T i t + 1 , λ ) p(o_{t+2:T}|i_{t+1}, i_t, o_{t+1}, \lambda)=p(o_{t+2:T}|i_{t+1}, \lambda)

p ( o t + 1 i t + 1 , i t , λ ) = p ( o t + 1 i t + 1 , λ ) p(o_{t+1}|i_{t+1}, i_t, \lambda) = p(o_{t+1}|i_{t+1}, \lambda)

設:

β t ( i ) = p ( o t + 1 : T i t = q i , λ ) (10) \beta_t(i) = p(o_{t+1:T}|i_t=q_i, \lambda) \tag{10}

又:

p ( i t + 1 = q j i t = q i , λ ) = a i j p(i_{t+1}=q_j|i_t=q_i,\lambda) = a_{ij}

p ( o t + 1 i t + 1 = q j , λ ) = b j ( o t + 1 ) p(o_{t+1}|i_{t+1}=q_j, \lambda) = b_j(o_{t+1})

則:

β t ( i ) = j = 1 N a i j b j ( o t + 1 ) β t + 1 ( i ) (11) \beta_t(i) = \sum_{j=1}^N a_{ij} b_j(o_{t+1}) \beta_{t+1}(i) \tag{11}

後向算法:
(1)

β T ( i ) = 1 \beta_T (i) = 1

(2) 對t=T-1,T-2,…,1

β t ( i ) = j = 1 N a i j b j ( o t + 1 ) β t + 1 ( i ) \beta_t(i) = \sum_{j=1}^N a_{ij} b_j(o_{t+1}) \beta_{t+1}(i)

(3)

p ( O λ ) = i = 1 N π i b i ( o 1 ) β 1 ( i ) p(O|\lambda) = \sum_{i=1}^N \pi_i b_i(o_1) \beta_1(i)

2.3 一些機率與指望值

這兩個指望值都是後面EM算法用到的中間參量
1.計算 t t 時刻處於狀態 q i q_i 的機率。
機率計算問題是計算 p ( O λ ) p(O|\lambda) ,則有:

p ( O λ ) = i t p ( O , i t λ ) p(O|\lambda)=\sum_{i_t}p(O,i_t|\lambda)

依據隱馬爾科夫的獨立性假設:

p ( o t + 1 : T i t , o 1 : t , λ ) = p ( o t + 1 : T i t , λ ) p(o_{t+1:T}|i_t,o_{1:t}, \lambda) = p(o_{t+1:T}|i_t, \lambda)

因此:

p ( O λ ) = i t p ( O , i t λ ) = i t p ( o t + 1 : T i t , o 1 : t , λ ) p ( i t , o 1 : t λ ) = i t p ( o t + 1 : T i t , λ ) p ( i t , o 1 : t λ ) \begin{aligned} p(O|\lambda) &=\sum_{i_t}p(O,i_t|\lambda) \\ &=\sum_{i_t} p(o_{t+1:T}|i_t,o_{1:t}, \lambda) p(i_t,o_{1:t}|\lambda) \\ &=\sum_{i_t} p(o_{t+1:T}|i_t, \lambda) p(i_t,o_{1:t}|\lambda) \\ \end{aligned}

又有:

α t ( i ) = p ( o 1 : t , i t = q i λ ) (12) \alpha_t(i) = p(o_{1:t},i_t=q_i|\lambda) \tag{12}

β t ( i ) = p ( o t + 1 : T i t = q i , λ ) (13) \beta_t(i) = p(o_{t+1:T}|i_t=q_i, \lambda) \tag{13}

故:

p ( O , i t = q i λ ) = p ( o t + 1 : T i t = q i , λ ) p ( i t = q i , o 1 : t λ ) = α t ( i ) β t ( i ) p(O,i_t=q_i|\lambda) = p(o_{t+1:T}|i_t=q_i, \lambda) p(i_t=q_i,o_{1:t}|\lambda) = \alpha_t(i) \beta_t(i)

p ( O λ ) = i t α t ( i ) β t ( i ) p(O|\lambda) = \sum_{i_t} \alpha_t(i) \beta_t(i)

設:

γ t ( i ) = p ( i t = q i O , λ ) \gamma_t(i) = p(i_t=q_i|O,\lambda)

因而能夠獲得:

γ t ( i ) = p ( i t = q i O , λ ) = p ( i t = q i , O λ ) p ( O λ ) = α t ( i ) β t ( i ) j = 1 N α t ( j ) β t ( j ) (14) \gamma_t(i) = p(i_t=q_i|O,\lambda) = \frac {p(i_t=q_i,O|\lambda)}{p(O|\lambda)} = \frac {\alpha_t(i) \beta_t(i)}{\sum_{j=1}^N \alpha_t(j) \beta_t(j)} \tag{14}

2.計算計算 t t 時刻處於狀態 q i q_i 且計算 t + 1 t+1 時刻處於狀態 q j q_j 的機率

p ( O λ ) = i t i t + 1 p ( O , i t , i t + 1 λ ) = i t i t + 1 p ( o 1 : t , o t + 1 , o t + 2 : T , i t , i t + 1 λ ) = i t i t + 1 p ( o t + 2 : T o 1 : t , o t + 1 , i t , i t + 1 , λ ) p ( o t + 1 o 1 : t , i t , i t + 1 , λ ) p ( i t + 1 i t , o 1 : t , λ ) p ( i t , o 1 : t λ ) \begin{aligned} p(O|\lambda) &=\sum_{i_t} \sum_{i_{t+1}} p(O,i_t, i_{t+1}|\lambda) \\ &=\sum_{i_t} \sum_{i_{t+1}} p(o_{1:t},o_{t+1},o_{t+2:T},i_t, i_{t+1}|\lambda) \\ &=\sum_{i_t} \sum_{i_{t+1}} p(o_{t+2:T}|o_{1:t},o_{t+1},i_t, i_{t+1},\lambda)p(o_{t+1}|o_{1:t},i_t,i_{t+1},\lambda) p(i_{t+1}|i_t,o_{1:t},\lambda) p(i_t,o_{1:t}|\lambda) \\ \end{aligned}

由隱馬爾科夫的獨立性假設可得:

p ( O λ ) = i t i t + 1 p ( o t + 2 : T i t + 1 , λ ) p ( o t + 1 i t + 1 , λ ) p ( i t + 1 i t , λ ) p ( i t , o 1 : t λ ) p(O|\lambda) = \sum_{i_t} \sum_{i_{t+1}} p(o_{t+2:T}| i_{t+1},\lambda)p(o_{t+1}|i_{t+1},\lambda) p(i_{t+1}|i_t,\lambda) p(i_t,o_{1:t}|\lambda)

設:

ξ t ( i , j ) = p ( i t = q i , i t + 1 = q j O , λ ) \xi_t(i,j)=p(i_t=q_i,i_{t+1}=q_j|O,\lambda)

又有公式(2)(4)(12)(13)

得:

ξ t ( i , j ) = p ( i t = q i , i t + 1 = q j O , λ ) p ( O λ ) = α t ( i ) a i j b j ( o t + 1 ) β t + 1 ( j ) i = 1 N j = 1 N α t ( i ) a i j b j ( o t + 1 ) β t + 1 ( j ) (15) \xi_t(i,j) = \frac {p(i_t=q_i,i_{t+1}=q_j|O,\lambda)}{p(O|\lambda)} =\frac {\alpha_t(i) a_{ij} b_j(o_{t+1}) \beta_{t+1}(j)} {\sum_{i=1}^N \sum_{j=1}^N \alpha_t(i) a_{ij} b_j(o_{t+1}) \beta_{t+1}(j)} \tag{15}

3. 學習問題
3.1 監督學習

若是有標記好狀態序列的樣本,那就太好辦了,直接將接個矩陣統計的各個維度定義後進行統計就能夠了。統計過程當中注意機率之和爲一的約束。

3.2 無監督學習

若是沒有標記狀態序列的樣本,能夠用Baum-Welch算法(EM算法)實現。

已知:包含 S S 個長度爲 T T 的觀測序列的觀測序列 { O 1 , O 2 , . . . , O S } \{O_1,O_2,...,O_S \}
目標:學習隱馬爾可夫模型的參數 λ = ( A , B , π ) \lambda=(A,B,\pi)

記觀測數據 O O ,隱數據 I I ,那麼隱馬爾可夫模型能夠表示爲:

p ( O λ ) = I p ( O I , λ ) p ( I λ ) p(O|\lambda) = \sum_I p(O|I,\lambda) p(I|\lambda)

E步:

由於對 λ \lambda 而言, 1 / p ( O λ ) 1/p(O| \overline \lambda) 是常數項,因此

Q ( λ , λ ) = E I [ log p ( O , I λ ) O , λ ] = I log p ( O , I λ ) p ( I O , λ ) = I log p ( O , I λ ) p ( I , O λ ) p ( O λ ) = I log p ( O , I λ ) p ( I , O λ ) \begin{aligned} Q(\lambda,\overline \lambda) &= E_I[\log p(O,I|\lambda)|O, \overline \lambda] \\ &= \sum_I \log p(O,I|\lambda) p(I|O,\overline \lambda) \\ &= \sum_I \log p(O,I|\lambda) \frac {p(I,O|\overline \lambda)}{p(O| \overline \lambda)} \\ &= \sum_I \log p(O,I|\lambda) p(I,O|\overline \lambda) \\ \end{aligned}

將機率計算問題2.1小姐中前向算法的遞歸公式展開就能夠獲得:

p ( O , I λ ) = π i 1 b i 1 ( o 1 ) a i 1 i 2 b i 2 ( o 2 ) . . . a i T 1 i T b i T ( o T ) = π i 1 [ t = 1 T 1 a i t i t + 1 ] [ t = 1 T b i t ( o t ) ] p(O,I|\lambda) = \pi_{i_1} b_{i_1}(o_1) a_{i_1i_2} b_{i_2}(o_2) ... a_{i_{T-1}i_T} b_{iT}(o_T) = \pi_{i_1} [\prod_{t=1}^{T-1} a_{i_ti_{t+1}}][\prod_{t=1}^T b_{it}(o_t)]

因而:

Q ( λ , λ ) = I log π i 1 p ( O , I λ ) + I ( t = 1 T 1 a i t i t + 1 ) p ( O , I λ ) + I ( t = 1 T b i t ( o t ) ) p ( O , I λ ) (16) Q(\lambda, \overline \lambda) = \sum_I \log \pi_{i_1} p(O, I| \overline \lambda) + \sum_I (\sum_{t=1}^{T-1} a_{i_ti_{t+1}}) p(O, I| \overline \lambda) + \sum_I (\sum_{t=1}^T b_{it}(o_t)) p(O, I| \overline \lambda) \tag{16}

特此說明隱變量
隱馬爾可夫模型的隱變量就是觀測序列對應的狀態序列,因此隱變量能夠用(14)式的變量表示
後面在M步中更新模型參數的時候也用到了(15)式,是否是就說明隱變量是兩個,其實不是的,這兒只是爲了表示的方便和算法的方便。
也就是在E步中,用 γ \gamma ξ \xi 表示隱變量,只是爲了編程和表示的便利,這兩個變量在E步中信息是重複的。

M步:

1.求解 π i \pi_i
由(15)式可得:

L ( π i 1 ) = I log π i 1 p ( O , I λ ) = i N log π i 1 p ( O , i 1 = i λ ) L(\pi_{i_1}) = \sum_I \log \pi_{i_1} p(O, I| \overline \lambda) = \sum_{i}^N \log \pi_{i_1} p(O, i_1=i| \overline \lambda)

又由於 π i \pi_i 知足約束條件 i = 1 N π i 1 = 1 \sum_{i=1}^N \pi_{i_1}=1 ,利用拉格朗日乘子法,寫出拉格朗日函數:

i = 1 N log π i p ( O , i 1 = i λ ) + γ ( i = 1 N π i 1 ) \sum_{i=1}^N \log \pi_{i} p(O, i_1=i| \overline \lambda) + \gamma(\sum_{i=1}^N \pi_{i} - 1)

對其求偏導而且令其結果爲0得:

π i [ i = 1 N log π i p ( O , i = i λ ) + γ ( i 1 = 1 N π i 1 ) ] = 0 (17) \frac {\partial} {\partial \pi_i} [\sum_{i=1}^N \log \pi_{i} p(O, i=i| \overline \lambda) + \gamma(\sum_{i_1=1}^N \pi_{i} - 1)]=0 \tag{17}

得:

p ( O , i 1 = i λ ) + γ π i = 0 p(O, i_1=i| \overline \lambda) + \gamma \pi_i=0

獲得:

π i = p ( O , i 1 = i λ ) λ \pi_i = \frac {p(O, i_1=i| \overline \lambda)} {-\lambda}

帶入 i = 1 N π i 1 = 1 \sum_{i=1}^N \pi_{i_1}=1 的:

λ = i = 1 N p ( O , i 1 = i λ ) = p ( o λ ) -\lambda = \sum_{i=1}^N p(O, i_1=i| \overline \lambda) = p(o|\overline \lambda)

求得並有公式(14):

π i = p ( O , i 1 = i λ ) p ( o λ ) = γ 1 ( i ) (18) \pi_i = \frac {p(O, i_1=i| \overline \lambda)} {p(o|\overline \lambda)} = \gamma_1(i) \tag{18}

2.求解 a i j a_{ij} :

L ( a i j ) = I ( t = 1 T 1 a i t i t + 1 ) p ( O , I λ ) = i = 1 N ( t = 1 T 1 a i t i t + 1 ) ( j = 1 N p ( O , i t = i , i t + 1 = j λ ) ) = i = 1 N j = 1 N t = 1 T 1 a i j p ( O , i t = i , i t + 1 = j λ ) L(a_{ij})=\sum_I (\sum_{t=1}^{T-1} a_{i_ti_{t+1}}) p(O, I| \overline \lambda) = \sum_{i=1}^N (\sum_{t=1}^{T-1} a_{i_ti_{t+1}}) ( \sum_{j=1}^N p(O, i_t=i, i_{t+1}=j| \overline \lambda) ) \\ = \sum_{i=1}^N \sum_{j=1}^N \sum_{t=1}^{T-1} a_{ij} p(O, i_t=i, i_{t+1}=j| \overline \lambda)

應用約束條件 j = 1 N a i j = 1 \sum_{j=1}^N a_{ij} = 1 ,用拉格朗日乘子法能夠求出:

i = 1 N j = 1 N t = 1 T 1 a i j p ( O , i t = i , i t + 1 = j λ ) + λ ( j = 1 N a i j 1 ) \sum_{i=1}^N \sum_{j=1}^N \sum_{t=1}^{T-1} a_{ij} p(O, i_t=i, i_{t+1}=j| \overline \lambda) + \lambda(\sum_{j=1}^N a_{ij} - 1)

對上式求騙到並等於0獲得:

a i j [ i = 1 N j = 1 N t = 1 T 1 a i j p ( O , i t = i , i t + 1 = j λ ) + λ ( j = 1 N a i j 1 ) ] = 0 \frac {\partial}{\partial a_{ij}} [\sum_{i=1}^N \sum_{j=1}^N \sum_{t=1}^{T-1} a_{ij} p(O, i_t=i, i_{t+1}=j| \overline \lambda) + \lambda(\sum_{j=1}^N a_{ij} - 1)] = 0

獲得:

t = 1 T 1 p ( O , i t = i , i t + 1 = j λ ) + λ a i j = 0 \sum_{t=1}^{T-1} p(O, i_t=i, i_{t+1}=j| \overline \lambda) + \lambda a_{ij} = 0

因此:

a i j = t = 1 T 1 p ( O , i t = i , i t + 1 = j λ ) λ a_{ij} = \frac {\sum_{t=1}^{T-1} p(O, i_t=i, i_{t+1}=j| \overline \lambda)}{- \lambda}

將上式帶入 j = 1 N a i j = 1 \sum_{j=1}^N a_{ij} = 1

λ = j = 1 N t = 1 T 1 p ( O , i t = i , i t + 1 = j λ ) = t = 1 T 1 p ( O , i t = i λ ) - \lambda = \sum_{j=1}^N \sum_{t=1}^{T-1} p(O, i_t=i, i_{t+1}=j| \overline \lambda) = \sum_{t=1}^{T-1} p(O, i_t=i| \overline \lambda)

故得:

a i j = t = 1 T 1 p ( O , i t = i , i t + 1 = j λ ) t = 1 T 1 p ( O , i t = i λ ) = t = 1 T 1 p ( O , i t = i , i t + 1 = j λ ) / p ( o λ ) t = 1 T 1 p ( O , i t = i λ ) / p ( o λ ) a_{ij} = \frac {\sum_{t=1}^{T-1} p(O, i_t=i, i_{t+1}=j| \overline \lambda)}{\sum_{t=1}^{T-1} p(O, i_t=i| \overline \lambda) } = \frac {\sum_{t=1}^{T-1} p(O, i_t=i, i_{t+1}=j| \overline \lambda) / p(o|\overline \lambda)} {\sum_{t=1}^{T-1} p(O, i_t=i| \overline \lambda) / p(o|\overline \lambda) }

將(14)和(15)帶入的:

a i j = t = 1 T 1 ξ t ( i , j ) t = 1 T 1 γ t ( i ) (19) a_{ij} = \frac {\sum_{t=1}^{T-1} \xi_t(i,j)} {\sum_{t=1}^{T-1} \gamma_t(i) } \tag{19}

3.求解 b j k b_j{k} :

L ( b j k ) = I ( t = 1 T b i t ( o t ) ) p ( O , I λ ) = j = 1 N t = 1 T b j ( o t ) p ( O , i t = j λ ) L(b_j{k}) = \sum_I (\sum_{t=1}^T b_{it}(o_t)) p(O, I| \overline \lambda) = \sum_{j=1}^N \sum_{t=1}^T b_{j}(o_t) p(O, i_t=j| \overline \lambda)

在約束條件 k = 1 M b j ( k ) = 1 \sum_{k=1}^M b_j(k) = 1 的拉格朗日乘子法:

j = 1 N t = 1 T b j ( o t ) p ( O , i t = j λ ) + λ ( k = 1 M b j ( k ) 1 ) \sum_{j=1}^N \sum_{t=1}^T b_{j}(o_t) p(O, i_t=j| \overline \lambda) + \lambda(\sum_{k=1}^M b_j(k) - 1)

對其求偏導得:

b j ( k ) [ j = 1 N t = 1 T b j ( o t ) p ( O , i t = j λ ) + λ ( k = 1 M b j ( k ) 1 ) ] = 0 \frac {\partial}{\partial b_j(k)} [\sum_{j=1}^N \sum_{t=1}^T b_{j}(o_t) p(O, i_t=j| \overline \lambda) + \lambda(\sum_{k=1}^M b_j(k) - 1)] = 0

由於只有在 o t = v k o_t=v_k 時偏導纔不會等於0,以 I ( o t = v k ) I(o_t=v_k) 表示,則:

t = 1 T p ( O , i t = j λ ) I ( o t = v k ) + λ b j ( o t ) I ( o t = v k ) = 0 \sum_{t=1}^T p(O, i_t=j| \overline \lambda) I(o_t=v_k) + \lambda b_{j}(o_t)I(o_t=v_k) = 0

b j ( o t ) I ( o t = v k ) b_{j}(o_t)I(o_t=v_k) 能夠寫做 b j ( k ) b_{j}(k) ,故:

b j ( k ) = t = 1 T p ( O , i t = j λ ) I ( o t = v k ) λ b_{j}(k) = \frac {\sum_{t=1}^T p(O, i_t=j| \overline \lambda) I(o_t=v_k)} {- \lambda}

將上式帶入 k = 1 M b j ( k ) = 1 \sum_{k=1}^M b_j(k) = 1 得:

λ = k = 1 M t = 1 T p ( O , i t = j λ ) I ( o t = v k ) = t = 1 T p ( O , i t = j λ ) - \lambda = \sum_{k=1}^M \sum_{t=1}^T p(O, i_t=j| \overline \lambda) I(o_t=v_k) = \sum_{t=1}^T p(O, i_t=j| \overline \lambda)

獲得:

b j ( k ) = t = 1 T p ( O , i t = j λ ) I ( o t = v k ) t = 1 T p ( O , i t = j λ ) b_{j}(k) = \frac {\sum_{t=1}^T p(O, i_t=j| \overline \lambda) I(o_t=v_k)} {\sum_{t=1}^T p(O, i_t=j| \overline \lambda)}

又有(14)式可得:

b j ( k ) = t = 1 , o t = v k T γ t ( j ) t = 1 T γ t ( j ) (20) b_{j}(k) = \frac {\sum_{t=1,o_t=v_k}^T \gamma_t(j)} {\sum_{t=1}^T \gamma_t(j)} \tag{20}

EM算法總結:
E步:

γ t ( i ) = p ( i t = q i O , λ ) = p ( i t = q i , O λ ) p ( O λ ) = α t ( i ) β t ( i ) j = 1 N α t ( j ) β t ( j ) \gamma_t(i) = p(i_t=q_i|O,\lambda) = \frac {p(i_t=q_i,O|\lambda)}{p(O|\lambda)} = \frac {\alpha_t(i) \beta_t(i)}{\sum_{j=1}^N \alpha_t(j) \beta_t(j)}

ξ t ( i , j ) = p ( i t = q i , i t + 1 = q j O , λ ) p ( O λ ) = α t ( i ) a i j b j ( o t + 1 ) β t + 1 ( j ) i = 1 N j = 1 N α t ( i ) a i j b j ( o t + 1 ) β t + 1 ( j ) \xi_t(i,j) = \frac {p(i_t=q_i,i_{t+1}=q_j|O,\lambda)}{p(O|\lambda)} =\frac {\alpha_t(i) a_{ij} b_j(o_{t+1}) \beta_{t+1}(j)} {\sum_{i=1}^N \sum_{j=1}^N \alpha_t(i) a_{ij} b_j(o_{t+1}) \beta_{t+1}(j)}

M步:
π i = p ( O , i 1 = i λ ) p ( o λ ) = γ 1 ( i ) \pi_i = \frac {p(O, i_1=i| \overline \lambda)} {p(o|\overline \lambda)} = \gamma_1(i)

a i j = t = 1 T 1 ξ t ( i , j ) t = 1 T 1 γ t ( i ) a_{ij} = \frac {\sum_{t=1}^{T-1} \xi_t(i,j)} {\sum_{t=1}^{T-1} \gamma_t(i) }

b j ( k ) = t = 1 , o t = v k T γ t ( j ) t = 1 T γ t ( j ) b_{j}(k) = \frac {\sum_{t=1,o_t=v_k}^T \gamma_t(j)} {\sum_{t=1}^T \gamma_t(j)}

4. 預測問題(解碼問題)

用維特比算法進行求解:
已知:模型 λ = ( A , B , π ) \lambda=(A,B,\pi) O = ( o 1 , o 2 , . . . , o T ) O=(o_1,o_2,...,o_T)
求:條件機率最大 p ( I O , λ ) p(I|O,\lambda) 最大的狀態序列 I = ( i 1 , i 2 , . . . , i T ) I=(i_1,i_2,...,i_T)
由於 p ( O ) p(O) 是一個定值,因此:

max I p ( I O , λ ) = max I p ( I , O λ ) / p ( O λ ) = max I p ( I , O λ ) \max_I p(I|O,\lambda) = \max_I p(I, O|\lambda) / p(O|\lambda) = \max_I p(I, O|\lambda)

定義在時刻 t t 狀態爲 i i 的全部單個路徑 ( i 1 , i 2 , . . . , i t ) (i_1,i_2,...,i_t) 中機率最大值爲:

δ t ( i ) = max i 1 , i 2 , . . . , i t 1 p ( i t = i , i t 1 : i 1 , o t : 1 λ ) \delta_t(i) = \max_{i_1,i_2,...,i_{t-1}} p(i_t=i, i_{t-1:i_1},o_{t:1}|\lambda)

遞推推導:

p ( i t + 1 = i , i t : 1 , o t + 1 : 1 λ ) = p ( i t + 1 = i , i t , i t 1 : 1 , o t + 1 , o t : 1 λ ) = p ( o t + 1 i t + 1 = i , i t , o t : 1 , λ ) p ( i t + 1 = i i t , i t 1 : 1 , o t : 1 , λ ) p ( i t , i t 1 : 1 , o t : 1 λ ) = p ( o t + 1 i t + 1 = i , λ ) p ( i t + 1 = i i t , λ ) p ( i t , i t 1 : 1 , o t : 1 λ ) \begin{aligned} &p(i_{t+1}=i,i_{t:1},o_{t+1:1}| \lambda) \\ &=p(i_{t+1}=i,i_t,i_{t-1:1},o_{t+1},o_{t:1}| \lambda) \\ &= p(o_{t+1}|i_{t+1}=i,i_t,o_{t:1},\lambda) p(i_{t+1}=i|i_t,i_{t-1:1},o_{t:1}, \lambda) p(i_t,i_{t-1:1},o_{t:1}|\lambda) \\ &= p(o_{t+1}|i_{t+1}=i,\lambda) p(i_{t+1}=i|i_t,\lambda) p(i_t,i_{t-1:1},o_{t:1}|\lambda) \\ \end{aligned}

故:

δ t + 1 ( i ) = max i 1 , i 2 , . . . , i t 1 p ( i t + 1 = i , i t : 1 , o t + 1 : 1 λ ) = max 1 j N [ δ t ( j ) a j i ] b i ( o t + 1 ) (21) \delta_{t+1}(i) = \max_{i_1,i_2,...,i_{t-1}} p(i_{t+1}=i,i_{t:1},o_{t+1:1}| \lambda) = \max_{1 \le j \le N} [\delta _t(j) a_{ji}] b_i(o_{t+1}) \tag{21}

定義在時刻 t t 狀態爲 i i 的全部單個路徑 ( i 1 , i 2 , . . . , i t 1 ) (i_1,i_2,...,i_{t-1}) 中機率最大的第 t 1 t-1

相關文章
相關標籤/搜索