理解高斯混合模型中指望最大化的M-Step

在本篇文章中將解釋高斯混合模型（GMM）的關鍵部分背後的數學原理，即指望最大化（EM），以及如何將這些概念轉換爲Python。這個故事的重點是EM或M-Step。算法

注意：這不是有關端到端GMM算法的全面說明。要進行更深刻的研究，請參閱咱們之前翻譯的文章。數組

指望最大化

GMM中有一系列步驟，一般稱爲「指望最大化」，簡稱「 EM」。要解釋如何理解EM數學，請首先考慮您可能要處理的模型。微信

樣本由圖形上的點表示。這些點造成一些不一樣的斑點。每一個斑點都有一箇中心，每一個點都與每一個斑點的中心相距必定距離。給定GMM模型數據，目標一般是根據最接近的中心按其樣本點標記其餘樣本。有些點距離一個或多箇中心幾乎相等，所以，咱們但願基於某種機率來標記點。
機器學習

EM用到的符號

要學習如何學習機器學習算法，您一輩子中須要一些希臘語。由於算法中符號基本上都是以希臘文表示的。儘管可能會想掩蓋基礎知識，可是對單個希臘字母的簡單掌握能夠幫助您理解算法中的重要概念。ide

算法可能會使人生畏且使人困惑。例如，乍看之下，高度集中的希臘符號有時足以令人窒息。可是不要浪費時間，咱們在這裏只要考慮如今要使用的符號便可學習

除此之外，咱們也有一些英文字母在EM中表明GMM的意思。一般，英文字母圍繞着希臘字母，就像小領航魚圍着大鯊魚遊動。就像小魚同樣，英文字母有一個重要的做用，它爲如何解釋算法提供了指導。
優化

M-Step的數學解釋

如今咱們已經隔離了方程的每一個組成部分，讓咱們經過檢查M-Step，將它們組合成一些經常使用的數學短語，這些短語對於用EM語言進行對話很重要。this

簇，高斯，字母J或K，有時還包括C：一般都是同一件事-若是咱們有3個簇，那麼您可能會聽到「每一個高斯」，「每一個j」，「每一個高斯j」或「對於每一個K組件」-這些都是談論相同3個簇的不一樣方法。在數據方面，咱們能夠繪製（x，y）樣本/點的數組，並查看它們如何造成簇。url

 # a 2D array of samples [features and targets] 
 # the last column, targets [0,1,2], represent three clusters
 # the first two columns are the points that make up our features
 # each feature is just a set of points (x,y) in 2D space
 # each row is a sample and cluster label
 
 [[-7.72642091 -8.39495682 2. ]
  [ 5.45339605 0.74230537 1. ]
  [-2.97867201 9.55684617 0. ]
  [ 6.04267315 0.57131862 1. ] ...]

軟分類（Soft Assignments），機率，響應度（Responsibility）：聚類的一個主要思想是咱們但願爲每一個樣本找到一個數字，以告訴咱們樣本屬於哪一個聚類。在GMM中，對於咱們評估的每一個樣本，咱們可能會返回表明「每一個高斯j的響應度」，每一個「軟分類」或每一個「機率」的值。spa

這些階段一般都是關於同一件事的，但響應度與機率之間存在關鍵區別。

 # an array of assignment data about the 2D array of samples
 # each column represents a cluster
 # each row represents data about each sample
 # in each row, we have the probability that a sample belongs to one of three clusters - it adds up to 1 (as it should)
 # but the sum of each column is a big number number (not 1)
 
 print(assignments)
 # sample output: an array of assignment data
 [[1.00000000e+000 2.82033618e-118 1.13001412e-070]
  [9.21706438e-074 1.00000000e+000 3.98146031e-029]
  [4.40884339e-099 5.66602768e-053 1.00000000e+000]...]
  
  print(np.sum(assignments[0])
 # sample output: the sum across each row is 1
 1
 
 print(np.sum(assignments[:, 0])
 # sample output: the sum in each col is a big number that varies
 # Little Gamma: the really small numbers in each column
 # Big Gamma: the sum of each column, or 33.0 in this sample33.0

大寫伽瑪，小寫伽瑪，J，N，x和i：EM中的核心任務是爲每一個羣集優化三組參數，或者「對於每一個j，優化w（𝓌），mew（𝜇 ）和方差（𝜎）。」換句話說，羣集的權重（𝓌），羣集的中心點（𝜇）和羣集的方差（𝜎）是多少？

對於權重（𝓌），咱們將「大寫伽瑪」除以特徵總數。從更早的時候開始，咱們就知道每一個聚類j的大寫伽瑪只是將給定聚類的每一個樣本的分配值相加的結果（該數字之和不等於1）。以下圖所示

對於EM期間高斯的權重參數，請考慮一些簡單的事情，例如添加數字列表，而後將其除以樣本總數。

對於mew （𝜇），不是像咱們以前那樣將全部小寫伽瑪加到一個小寫伽瑪中，而是對每一個聚類j和每一個樣本i將小寫伽瑪與特徵x進行矩陣乘法。以下圖所示

請記住，mew只是每一個簇的中心點-若是咱們有3個簇，而咱們的樣本都是x，y座標，那麼mew將是3個x，y座標的數組，每一個簇一個。

 # for figure 4 - mew (mu)
 # same array of assignment data as before
 # each column is a cluster of little gammas
 
 print(assignments)
 [[1.00000000e+000 2.82033618e-118 1.13001412e-070]
  [9.21706438e-074 1.00000000e+000 3.98146031e-029]
  [4.40884339e-099 5.66602768e-053 1.00000000e+000]...]
  
  # the little gammas of cluster 0 is just column 0
 [[1.00000000e+000 ]
  [9.21706438e-074 ]
  [4.40884339e-099 ]...]
  
  # same array of sample data as before
 # the first two columns are the x,y coordinates
 # the last column is the cluster label of the sample
 
 print(features)
 [[-7.72642091 -8.39495682 2. ]
  [ 5.45339605 0.74230537 1. ]
  [-2.97867201 9.55684617 0. ]
  [ 6.04267315 0.57131862 1. ] ...]
  
  # for features, we just need its points
 [[-7.72642091 -8.39495682 ]
  [ 5.45339605 0.74230537 ]
  [-2.97867201 9.55684617 ]
  [ 6.04267315 0.57131862 ] ...]
  
  # if using numpy (np) for matrix multiplication 
 # for cluster 0 ...
 
 big_gamma = np.sum(assignments[:, 0]
 mew = np.matmul(assignments[:, 0], features) / big_gamma
 
 # returns an array of mew
 [[-2.66780392 8.93576069]
  [-6.95170962 -6.67621669]
  [ 4.49951001 1.93892013]]

對於方差（𝜎），請考慮到如今，咱們有了點和中心點-隨着方差的出現，咱們基本上正在評估每一個樣本的點（每一個i的x）到每一個羣集的中心點（每一個i的mew）的距離。用EM語言來講，有些人可能會說「 x_i減去mew_i乘以Big Gamma j。」

 # for figure 5 - variance
 # a sampling of variance for cluster 0 of n clusters
 # given arrays for features and assignments...
 
 x_i = features
 big_gamma = np.sum(assignments[:, 0]
 mew = np.matmul(assignments[:, 0], features) / big_gamma
 
 numerator = np.matmul(assignments[:, 0], (x_i - mew) ** 2)
 
 variance = numerator / big_gamma
 
 # returns an array of variance
 [[0.6422345 1.06006186]
  [0.65254746 0.9274831 ]
  [0.95031461 0.92519751]]

以上步驟都是關於EM中的M-Step或最大化-全部關於權值、mew和方差的都是關於優化的;可是，初始賦值數組呢?咱們如何獲得每一個樣本的機率數組這是EM中的E-Step，也就是指望。

在E-Step中，咱們嘗試用貝葉斯規則猜出每一個點的分配-這會產生一組值，這些值指示每一個點對高斯的響應度或機率。最初會與猜想值（後驗值）相差很遠，可是在經過E-Step和M-Step循環以後，這些猜想會變得更好，更接近客觀的地面真理。

GMM算法重複M-Step 和 E-Step直到收斂。例如，收斂性多是迭代的最大次數，或者當每輪猜想之間的差別變得很是小時。但願最終的結果是，數據中的每一個樣本都有一個軟分配的標籤。

總結

在這篇文章中，我介紹了M-Step的高斯混合模型算法的指望最大化階段的導航部分的理解。雖然從表面上看，數學彷佛太複雜而沒法處理，但咱們能夠經過理解其各個部分來處理其複雜性。例如，一些關鍵的理解，如發音的希臘符號和應用它們的操做與NumPy是重要的，以掌握整體概念。

做者：Justin Chae

原文地址：https://towardsdatascience.com/unlock-m-step-from-em-in-gmm-dd9a32a0aa6f

deephub翻譯組

本文分享自微信公衆號 - DeepHub IMBA（deephub-imba）。
若有侵權，請聯繫 support@oschina.cn 刪除。
本文參與「OSC源創計劃」，歡迎正在閱讀的你也加入，一塊兒分享。