第四節–樸素貝葉斯(Naive Bayes)法

第四節–樸素貝葉斯(Naive Bayes)法

樸素貝葉斯(Naive Bayes,NB)法是基於貝葉斯定理與特徵條件獨立假設的分類方法.對於給定的訓練數據集,首先基於特徵條件獨立假設學習輸入/輸出的聯合機率分佈;而後基於此模型,對給定的輸入x,利用貝葉斯定理求出後驗機率最大的輸出y.html

NB包括如下算法:python

  • 高斯樸素貝葉斯(Gaussian Naive Bayes)–適用於正態分佈
  • 伯努利樸素貝葉斯(Bernoulli Naive Bayes)–適用於二項分佈
  • 多項式樸素貝葉斯(Multinomial Navie Bayes)

樸素貝葉斯法的優缺點:web

  • 優勢:學習和預測的效率高,且易於實現;在數據較少的狀況下仍然有效,能夠處理分類問題
  • 缺點:分類效果不必定很高,特徵獨立性假設會是樸素貝葉斯變得簡單,可是會犧牲必定的分類準確率

一.樸素貝葉斯法的學習與分類

1.基本方法

設輸入空間 X R n \mathcal{X} \subseteq \mathbf{R}^{n} 爲n維向量的集合,輸出空間爲類標記集合 y = { c 1 , c 2 ,   , c x } y=\left\{c_{1},\right.c_{2}, \cdots, c_{x} \} ,輸入爲特徵向量 x X x \in \mathcal{X} ,輸出爲類標記(class label) y Y y \in \mathcal{Y} ,X是定義在輸入空間 X \mathcal{X} 上的隨機向量,Y是定義在輸出空間 Y \mathcal{Y} 上的隨機變量. P ( X , Y ) P(X, Y) 是X和Y的聯合機率分佈.訓練數據集:
T = { ( x 1 , y 1 ) , ( x 2 , y 2 ) ,   , ( x N , y N ) } T=\left\{\left(x_{1}, y_{1}\right),\left(x_{2}, y_{2}\right), \cdots,\left(x_{N}, y_{N}\right)\right\} 算法

P ( X , Y ) P(X, Y) 獨立分佈產生app

樸素貝葉斯法經過訓練數據集學習聯合機率分佈 P ( X , Y ) P(X, Y) .具體地,學習如下先驗機率分佈及條件機率分佈.先驗機率分佈:
P ( Y = c k ) , k = 1 , 2 ,   , K P\left(Y=c_{k}\right), \quad k=1,2, \cdots, K ide

條件機率分佈:
P ( X = x Y = c k ) = P ( X ( 1 ) = x ( 1 ) ,   , X ( n ) = x ( n ) Y = c k ) , k = 1 , 2 ,   , K P\left(X=x | Y=c_{k}\right)=P\left(X^{(1)}=x^{(1)}, \cdots, X^{(n)}=x^{(n)} | Y=c_{k}\right), \quad k=1,2, \cdots, K svg

因而學習到聯合機率分佈 P ( X , Y ) P(X, Y) 函數

條件機率分佈 P ( X = x Y = c k ) P\left(X=x | Y=c_{k}\right) 有指數級數量的參數,其估計實際是不可行的.事實上,假設 x ( j ) x^{(j)} 可取值有 S j S_{j} 個, j = 1 , 2 ,   , n j=1,2, \cdots, n ,Y可取值有K個,那麼參數個數 K j = 1 n S j K \prod_{j=1}^{n} S_{j} 學習

樸素貝葉斯法對條件機率分佈做了條件獨立性的假設.因爲這是一個較強的假設,樸素貝葉斯法也由此得名.具體地,條件獨立性假設是:
P ( X = x Y = c k ) = P ( X ( 1 ) = x ( 1 ) ,   , X ( n ) = x ( n ) Y = c k ) = j = 1 n P ( X ( j ) = x ( j ) Y = c k ) \begin{aligned} P\left(X=x | Y=c_{k}\right) &=P\left(X^{(1)}=x^{(1)}, \cdots, X^{(n)}=x^{(n)} | Y=c_{k}\right) \\ &=\prod_{j=1}^{n} P\left(X^{(j)}=x^{(j)} | Y=c_{k}\right) \end{aligned} ui

樸素貝葉斯法實際上學習到生成數據的機制,因此屬於生成模型.條件獨立假設等因而說用於分類的特徵在類肯定的條件下都是條件獨立的.這一假設使樸素貝葉斯法變得簡單,但有時會犧牲必定的分類準確率

樸素貝葉斯法分類時,對給定的輸入x,經過學習到的模型計算後驗機率分佈 P ( Y = c k X = x ) P\left(Y=c_{k} | X=x\right) ,將後驗機率最大的類做爲x的類輸出,後驗機率計算根據貝葉斯定理進行:
P ( Y = c k X = x ) = P ( X = x Y = c k ) P ( Y = c k ) k P ( X = x Y = c k ) P ( Y = c k ) P\left(Y=c_{k} | X=x\right)=\frac{P\left(X=x | Y=c_{k}\right) P\left(Y=c_{k}\right)}{\sum_{k} P\left(X=x | Y=c_{k}\right) P\left(Y=c_{k}\right)}

將上面兩式聯合:
P ( Y = c k X = x ) = P ( Y = c k ) j P ( X ( j ) = x ( j ) Y = c k ) k P ( Y = c k ) j P ( X ( j ) = x ( j ) Y = c k ) , k = 1 , 2 ,   , K P\left(Y=c_{k} | X=x\right)=\frac{P\left(Y=c_{k}\right) \prod_{j} P\left(X^{(j)}=x^{(j)} | Y=c_{k}\right)}{\sum_{k} P\left(Y=c_{k}\right) \prod_{j} P\left(X^{(j)}=x^{(j)} | Y=c_{k}\right)}, \quad k=1,2, \cdots, K

這是樸素貝葉斯法分類的基本公式,因而樸素貝葉斯分類器可表示爲:
y = f ( x ) = arg max c k P ( Y = c k ) j P ( X ( j ) = x ( j ) Y = c k ) k P ( Y = c k ) j P ( X ( j ) = x ( j ) Y = c k ) y=f(x)=\arg \max _{c_{k}} \frac{P\left(Y=c_{k}\right) \prod_{j} P\left(X^{(j)}=x^{(j)} | Y=c_{k}\right)}{\sum_{k} P\left(Y=c_{k}\right) \prod_{j} P\left(X^{(j)}=x^{(j)} | Y=c_{k}\right)}

注意到,在上式中分母對全部 c k c_{k} 都是相同的,因此:
y = arg max c k P ( Y = c k ) j P ( X ( j ) = x ( j ) Y = c k ) y=\arg \max _{c_{k}} P\left(Y=c_{k}\right) \prod_{j} P\left(X^{(j)}=x^{(j)} | Y=c_{k}\right)

2.後驗機率最大化的含義

樸素貝葉斯法將實例分到後驗機率最大的類中,這等價於指望風險最小化.假設選擇0-1損失函數:
L ( Y , f ( X ) ) = { 1 , Y f ( X ) 0 , Y = f ( X ) L(Y, f(X))=\left\{\begin{array}{ll}{1,} & {Y \neq f(X)} \\ {0,} & {Y=f(X)}\end{array}\right.

式中 f ( X ) f(X) 是分類決策函數.這時指望風險函數爲:
R e x p ( f ) = E [ L ( Y , f ( X ) ) ] R_{\mathrm{exp}}(f)=E[L(Y, f(X))]

指望是對聯合分佈 P ( X , Y ) P(X, Y) 取的.由此取條件指望:
R e x p ( f ) = E X k = 1 K [ L ( c k , f ( X ) ) ] P ( c k X ) R_{\mathrm{exp}}(f)=E_{X} \sum_{k=1}^{K}\left[L\left(c_{k}, f(X)\right)\right] P\left(c_{k} | X\right)

爲了使指望風險最小化,只需對 X = x X=x 逐個極小化,由此獲得:
f ( x ) = arg min y y k = 1 K L ( c k , y ) P ( c k X = x ) = arg min y y k = 1 K P ( y c k X = x ) = arg min y Y ( 1 P ( y = c k X = x ) ) = arg max y y P ( y = c k X = x ) \begin{aligned} f(x) &=\arg \min _{y \in y} \sum_{k=1}^{K} L\left(c_{k}, y\right) P\left(c_{k} | X=x\right) \\ &=\arg \min _{y \in y} \sum_{k=1}^{K} P\left(y \neq c_{k} | X=x\right) \\ &=\arg \min _{y \in \mathcal{Y}}\left(1-P\left(y=c_{k} | X=x\right)\right) \\ &=\arg \max _{y \in y} P\left(y=c_{k} | X=x\right) \end{aligned}

這樣一來,根據指望風險最小化準則就獲得了後驗機率最大化準則:
f ( x ) = arg max c k P ( c k X = x ) f(x)=\arg \max _{c_{k}} P\left(c_{k} | X=x\right)

即樸素貝葉斯法所採用的原理

二.樸素貝葉斯法的參數估計

1.極大似然估計

在樸素貝葉斯法中,學習意味着估計 P ( Y = c k ) P\left(Y=c_{k}\right) P ( X ( j ) = x ( j ) Y = c k ) P\left(X^{(j)}=x^{(j)} | Y=c_{k}\right) .能夠應用極大似然估計相應的機率.先驗機率 P ( Y = c k ) P\left(Y=c_{k}\right) 的極大似然估計是:
P ( Y = c k ) = i = 1 N I ( y i = c k ) N , k = 1 , 2 ,   , K P\left(Y=c_{k}\right)=\frac{\sum_{i=1}^{N} I\left(y_{i}=c_{k}\right)}{N}, k=1,2, \cdots, K

設第j個特徵 x ( j ) x^{(j)} 可能取值的集合爲 { a j 1 , a j 2 ,   , a j S j } \left\{a_{j 1}, a_{j 2}, \cdots, a_{j S_{j}}\right\} .條件機率 P ( X ( j ) = a j l Y = c k ) P\left(X^{(j)}=a_{j l} | Y=c_{k}\right) 的極大似然估計是:

P ( X ( j ) = a j l Y = c k ) = i = 1 N I ( x i ( j ) = a j l y i = c k ) i = 1 N I ( y i = c k ) P\left(X^{(j)}=a_{j l} | Y=c_{k}\right)=\frac{\sum_{i=1}^{N} I\left(x_{i}^{(j)}=a_{j l} y_{i}=c_{k}\right)}{\sum_{i=1}^{N} I\left(y_{i}=c_{k}\right)}

j = 1 , 2 ,   , n ; l = 1 , 2 ,   , S j : k = 1 , 2 ,   , K j=1,2, \cdots, n ; l=1,2, \cdots, S_{j} : k=1,2, \cdots, K

式中, x i ( j ) x_{i}^{(j)} 是第i個樣本的第j個特徵; a j l a_{j l} 是第j個特徵可能取的第l個值:I爲指示函數

2.學習與分類算法

輸入:訓練數據 T = { ( x 1 , y 1 ) , ( x 2 , y 2 ) ,   , ( x N , y N ) } T=\left\{\left(x_{1}, y_{1}\right),\left(x_{2}, y_{2}\right), \cdots,\left(x_{N}, y_{N}\right)\right\} ,其中 x i = ( x i ( 1 ) , x i ( 2 ) ,   , x i ( n ) ) T x_{i}=\left(x_{i}^{(1)}, x_{i}^{(2)}, \cdots, x_{i}^{(n)}\right)^{\mathrm{T}} , x i ( j ) x_{i}^{(j)} 是第i個樣本的第j個特徵, x i ( j ) { a j 1 , a j 2 ,   , a j s j } x_{i}^{(j)} \in\left\{a_{j 1}, a_{j 2}, \cdots, a_{j s_{j}}\right\} , a j l a_{j l} 是第j個特徵可能取的第l個值, j = 1 , 2 ,   , n , l = 1 , 2 ,   , S j , y i { c 1 , c 2 ,   , c K } j=1,2, \cdots, n, \quad l=1,2, \cdots, S_{j}, \quad y_{i} \in\left\{c_{1}, c_{2}, \cdots, c_{K}\right\} ,實例x;

輸出:實例x的分類

  1. 計算先驗機率及條件機率
    P ( Y = c k ) = i = 1 N I ( y i = c k ) N , k = 1 , 2 ,   , K P ( X ( j ) = a j l Y = c k ) = i = 1 N I ( x i ( j ) = a j l , y i = c k ) i = 1 N I ( y i = c k ) j = 1 , 2 ,   , n ; l = 1 , 2 ,   , S j ; k = 1 , 2 ,   , K \begin{array}{l}{P\left(Y=c_{k}\right)=\frac{\sum_{i=1}^{N} I\left(y_{i}=c_{k}\right)}{N}, \quad k=1,2, \cdots, K} \\ {P\left(X^{(j)}=a_{j l} | Y=c_{k}\right)=\frac{\sum_{i=1}^{N} I\left(x_{i}^{(j)}=a_{j l}, y_{i}=c_{k}\right)}{\sum_{i=1}^{N} I\left(y_{i}=c_{k}\right)}} \\ {j=1,2, \cdots, n ; \quad l=1,2, \cdots, S_{j} ; \quad k=1,2, \cdots, K}\end{array}

  2. 對於給定的實例 x = ( x ( 1 ) , x ( 2 ) ,   , x ( n ) ) T x=\left(x^{(1)}, x^{(2)}, \cdots, x^{(n)}\right)^{\mathrm{T}} ,計算:
    P ( Y = c k ) j = 1 n P ( X ( j ) = x ( j ) Y = c k ) , k = 1 , 2 ,   , K P\left(Y=c_{k}\right) \prod_{j=1}^{n} P\left(X^{(j)}=x^{(j)} | Y=c_{k}\right), \quad k=1,2, \cdots, K

  3. 肯定實例x的類:
    y = arg max c k P ( Y = c k ) j = 1 n P ( X ( j ) = x ( j ) Y = c k ) y=\arg \max _{c_{k}} P\left(Y=c_{k}\right) \prod_{j=1}^{n} P\left(X^{(j)}=x^{(j)} | Y=c_{k}\right)

實例1:經過下表的訓練數據學習一個樸素貝葉斯分類器並肯定 x = ( 2 , S ) T x=(2, S)^{T} 的類標記y.表中 X ( 1 ) , X ( 2 ) X^{(1)}, X^{(2)} 爲特徵,取值的集合分別爲 A 1 = { 1 , 2 , 3 } , A 2 = { S , M , L } A_{1}=\{1,2,3\}, A_{2}=\{S, M, L\} ,Y爲類標記, Y C = { 1 , 1 } Y \in C=\{1,-1\}

1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
X ( 1 ) X^{(1)} 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3
X ( 2 ) X^{(2)} S M M S S S M M L L L M M L L
Y -1 -1 1 1 -1 -1 -1 1 1 1 1 1 1 1 -1
from IPython.display import Image
Image(filename="./data/4_2.png",width=500)

在這裏插入圖片描述

3.貝葉斯估計

用極大似然估計可能會出現所要估計的機率值爲0的狀況.這時會影響到後驗機率的計算結果.是分類產生誤差.解決這一問題的方法是採用貝葉斯估計,具體地,條件機率的貝葉斯估計是:
P λ ( X ( j ) = a j l Y = c k ) = i = 1 N I ( x i ( j ) = a j l , y i = c k ) + λ i = 1 N I ( y i = c k ) + S j λ P_{\lambda}\left(X^{(j)}=a_{j l} | Y=c_{k}\right)=\frac{\sum_{i=1}^{N} I\left(x_{i}^{(j)}=a_{j l}, y_{i}=c_{k}\right)+\lambda}{\sum_{i=1}^{N} I\left(y_{i}=c_{k}\right)+S_{j} \lambda}

式中 λ 0 \lambda \geqslant 0 ,等價於在隨機變量各個取值的頻數上賦予一個正數 λ > 0 \lambda>0 .當 λ = 0 \lambda=0 時就是極大似然估計.常取 λ = 1 \lambda=1 ,這時稱爲拉普拉斯平滑(Laplace smoothing).顯然對任何 l = 1 , 2 ,   , S j , k = 1 , 2 ,   , K l=1,2, \cdots, S_{j}, \quad k=1,2, \cdots, K ,有:
P λ ( X ( j ) = a j l Y = c k ) > 0 i = 1 s j P ( X ( j ) = a j l Y = c k ) = 1 \begin{array}{l}{P_{\lambda}\left(X^{(j)}=a_{j l} | Y=c_{k}\right)>0} \\ {\sum_{i=1}^{s_{j}} P\left(X^{(j)}=a_{j l} | Y=c_{k}\right)=1}\end{array}

一樣,先驗機率的貝葉斯估計是:
P λ ( Y = c k ) = i = 1 N I ( y i = c k ) + λ N + K λ P_{\lambda}\left(Y=c_{k}\right)=\frac{\sum_{i=1}^{N} I\left(y_{i}=c_{k}\right)+\lambda}{N+K \lambda}

實例2:對實例1,按照拉普拉斯平滑估計機率,即取 λ = 1 \lambda=1

Image(filename="./data/4_1.png",width=500)

在這裏插入圖片描述

三.代碼實現

%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

from collections import Counter
import math
def load_data():
    iris=load_iris()
    df=pd.DataFrame(iris.data,columns=iris.feature_names)
    df["label"]=iris.target
    df.columns=["sepal lenght","sepal width","petal length","petal width","label"]
    data=np.array(df.iloc[:100,:])
    return data[:,:-1],data[:,-1]
X,y=load_data()
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.3)
X_test[0],y_test[0]
(array([4.5, 2.3, 1.3, 0.3]), 0.0)

1.自定義GaussianNB

特徵的可能性被假設爲高斯

機率密度函數:
P ( x i y k ) = 1 2 π σ y k 2 exp ( ( x i μ y k ) 2 2 σ y k 2 ) P\left(x_{i} | y_{k}\right)=\frac{1}{\sqrt{2 \pi \sigma_{y k}^{2}}} \exp \left(-\frac{\left(x_{i}-\mu_{y k}\right)^{2}}{2 \sigma_{y k}^{2}}\right)

數學指望(mean): μ \mu ,方差: σ 2 = ( X μ ) 2 N \sigma^{2}=\frac{\sum(X-\mu)^{2}}{N}

class NaiveBayes(object):
    def __init__(self):
        self.model=None
        
    # 數學指望
    @staticmethod
    def mean(X):
        return sum(X)/float(len(X))
    
    # 標準差(方差)
    def stdev(self,X):
        avg=self.mean(X)
        return math.sqrt(sum([pow(x-avg,2) for x in X])/float(len(X)))
    
    #機率密度函數
    def gaussian_probability(self,x,mean,stdev):
        exponent=math.exp(-(math.pow(x-mean,2)/(2*math.pow(stdev,2))))
        return (1/(math.sqrt(x*math.pi)*stdev))*exponent
    
    # 處理X_train
    def summarize(self,train_data):
        summaries=[(self.mean(i),self.stdev(i)) for i in zip(*train_data)]
        return summaries
    
    # 分類別求出數學指望和標準差
    def fit(self,X,y):
        labels=list(set(y))
        data={label:[] for label in labels}
        for f,label in zip(X,y):
            data[label].append(f)
        self.model={label:self.summarize(value) for label,value in data.items()}
        return "GaussianNB train done"
    
    # 計算機率
    def calculate_probabilities(self,input_data):
        probabilities={}
        for label,value in self.model.items():
            probabilities[label]=1
            for i in range(len(value)):
                mean,stdev=value[i]
                probabilities[label]*=self.gaussian_probability(input_data[i],mean,stdev)
        return probabilities
    
    # 類別
    def predict(self,X_test):
        label=sorted(self.calculate_probabilities(X_test).items(),key=lambda x:x[-1])[-1][0]
        return label
    
    def score(self,X_test,y_test):
        right=0
        for X,y in zip(X_test,y_test):
            label=self.predict(X)
            if label==y:
                right+=1
        return right/float(len(X_test))
model=NaiveBayes()
model.fit(X_train,y_train)
'GaussianNB train done'
print(model.predict([4.4,3.2,1.3,0.2]))
0.0
model.score(X_test,y_test)
1.0

2.sklearn Naive_Bayes

from sklearn.naive_bayes import GaussianNB
clf=GaussianNB()
clf.fit(X_train,y_train)
GaussianNB(priors=None, var_smoothing=1e-09)
clf.score(X_test,y_test)
1.0
clf.predict([[4.4,3.2,1.3,0.2]])
array([0.])
相關文章
相關標籤/搜索