機器學習-模型評價指標

時間 2019-11-17

標籤機器學習模型評價指標简体版

原文原文鏈接

對於二分類問題，它的樣本只有正樣本和負樣本兩類。測試樣本中，正樣本被分類器斷定爲正樣本的數量記爲TP(true positive)，被斷定爲負樣本的數量記爲FP(false negative)。負樣本被分類器斷定爲負樣本的數量記爲TN(true negative),被斷定爲正樣本的數量記爲FP(false positive)。如圖所示，A,B兩組樣本總數量各爲100。測試

精度定義： TP/(TP+FP)spa

召回率定義：TP/(TP+FN)code

虛景率： 1 - TP/(TP+FP)orm

真陽率：TPR =TP/(TP +FN)blog

假陽率：FPR = FP/(FP+TN)ip

ROS曲線的橫軸爲假陽率，縱軸爲真陽率。utf-8

一個好的分類曲線應該讓假陽率低，真陽率高，理想狀況下應該是接近於y=1 的直線，即讓曲線下的面積儘量的大。ci

例子：資源

生成兩組正態分佈樣本，兩組樣本對應的標籤分別表示正樣本，和負樣本；資源連接以下：it

連接：https://pan.baidu.com/s/1X4hHygzSQHB3f8_kepxE8A
提取碼：6uvg

# -*- coding: utf-8 -*-
"""
Spyder Editor

This is a temporary script file.
"""
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

def floatrange(start,stop,steps):
    return [start+float(i)*(stop-start)/(float(steps)-1) for i in range(steps)]

"""讀取數據"""
data = np.loadtxt('data.txt')

""""計算不一樣類別的正態參數"""
totalCount = len(data[:,0])
positiveCount =np.sum(data[:,1])
negativeCount = totalCount - positiveCount

#正標本均值，方差
positiveIndex= np.where(data[:,1] ==1)
positiveSum = np.sum(data[positiveIndex,0])
positive_u =positiveSum / positiveCount
positive_derta =np.sqrt(np.sum(np.square(data[positiveIndex,0] - positive_u )) / positiveCount)

#負標本均值，方差
negativeIndex= np.where(data[:,1] ==0)
negativeSum = np.sum(data[negativeIndex,0])
negative_u =negativeSum / negativeCount
negative_derta =np.sqrt(np.sum(np.square(data[negativeIndex,0] - negative_u )) / negativeCount)

#機率密度 曲線繪製
x = floatrange(2,25,1000)
print(positive_u,positive_derta)
pd = np.exp(-1.0*np.square(x-positive_u) / (2*np.square(positive_derta))) /(positive_derta*np.sqrt(2*np.pi))
nd = np.exp(-1.0*np.square(x-negative_u) / (2*np.square(negative_derta))) /(negative_derta*np.sqrt(2*np.pi))
plt.figure(1)
plt.plot(x,pd,'r')   
plt.plot(x,nd,'b') 
    

#機率分佈構建
positiveFun = stats.norm(positive_u,positive_derta)
negativeFun = stats.norm(negative_u,negative_derta)

positiveValue = positiveFun.cdf(x)
negativeValue = negativeFun.cdf(x)


#真陽率，假陽率
positiveRate = 1 -positiveFun.cdf(x)
negativeRate = 1 -negativeFun.cdf(x)

#閥值
disvalue =positiveFun.cdf(x) +1 -negativeFun.cdf(x)
minvalue = np.min(disvalue)
index = np.where(disvalue == minvalue)
indexvalue =int(index[0])

xvalue = x[indexvalue]

#混淆矩陣
positivevalue = 1 -positiveFun.cdf(xvalue)
negativevalue = 1 -negativeFun.cdf(xvalue)
v00= int(positivevalue * positiveCount)
v01= positiveCount -v00
v10 =int(negativevalue* negativeCount)
v11 =negativeCount -v10
print("disvalue:",xvalue)
print("positiverate:",positivevalue,"negativerate:",negativevalue)
print(v00,",",v01)
print(v10,",",v11)


xdis = [xvalue,xvalue] 
ydis = [0,0.2]  
plt.plot(xdis,ydis,'g')
"""ros 曲線"""
plt.figure(2)
plt.plot(negativeRate,positiveRate,'r')