機器學習-模型評價指標

 

對於二分類問題,它的樣本只有正樣本和負樣本兩類。測試樣本中,正樣本被分類器斷定爲正樣本的數量記爲TP(true positive),被斷定爲負樣本的數量記爲FP(false negative)。負樣本被分類器斷定爲負樣本的數量記爲TN(true negative),被斷定爲正樣本的數量記爲FP(false positive)。如圖所示,A,B兩組樣本總數量各爲100。測試

精度定義: TP/(TP+FP)spa

召回率定義:TP/(TP+FN)code

虛景率: 1 - TP/(TP+FP)orm

真陽率:TPR =TP/(TP +FN)blog

假陽率:FPR = FP/(FP+TN)ip

ROS曲線的橫軸爲假陽率,縱軸爲真陽率。utf-8

一個好的分類曲線應該讓假陽率低,真陽率高,理想狀況下應該是接近於y=1 的直線,即讓曲線下的面積儘量的大。ci

例子:資源

生成兩組正態分佈樣本,兩組樣本對應的標籤分別表示正樣本,和負樣本;資源連接以下:it

連接:https://pan.baidu.com/s/1X4hHygzSQHB3f8_kepxE8A
提取碼:6uvg

# -*- coding: utf-8 -*-
"""
Spyder Editor

This is a temporary script file.
"""
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

def floatrange(start,stop,steps):
    return [start+float(i)*(stop-start)/(float(steps)-1) for i in range(steps)]

"""讀取數據"""
data = np.loadtxt('data.txt')

""""計算不一樣類別的正態參數"""
totalCount = len(data[:,0])
positiveCount =np.sum(data[:,1])
negativeCount = totalCount - positiveCount

#正標本均值,方差
positiveIndex= np.where(data[:,1] ==1)
positiveSum = np.sum(data[positiveIndex,0])
positive_u =positiveSum / positiveCount
positive_derta =np.sqrt(np.sum(np.square(data[positiveIndex,0] - positive_u )) / positiveCount)

#負標本均值,方差
negativeIndex= np.where(data[:,1] ==0)
negativeSum = np.sum(data[negativeIndex,0])
negative_u =negativeSum / negativeCount
negative_derta =np.sqrt(np.sum(np.square(data[negativeIndex,0] - negative_u )) / negativeCount)

#機率密度 曲線繪製
x = floatrange(2,25,1000)
print(positive_u,positive_derta)
pd = np.exp(-1.0*np.square(x-positive_u) / (2*np.square(positive_derta))) /(positive_derta*np.sqrt(2*np.pi))
nd = np.exp(-1.0*np.square(x-negative_u) / (2*np.square(negative_derta))) /(negative_derta*np.sqrt(2*np.pi))
plt.figure(1)
plt.plot(x,pd,'r')   
plt.plot(x,nd,'b') 
    

#機率分佈構建
positiveFun = stats.norm(positive_u,positive_derta)
negativeFun = stats.norm(negative_u,negative_derta)

positiveValue = positiveFun.cdf(x)
negativeValue = negativeFun.cdf(x)


#真陽率,假陽率
positiveRate = 1 -positiveFun.cdf(x)
negativeRate = 1 -negativeFun.cdf(x)

#閥值
disvalue =positiveFun.cdf(x) +1 -negativeFun.cdf(x)
minvalue = np.min(disvalue)
index = np.where(disvalue == minvalue)
indexvalue =int(index[0])

xvalue = x[indexvalue]

#混淆矩陣
positivevalue = 1 -positiveFun.cdf(xvalue)
negativevalue = 1 -negativeFun.cdf(xvalue)
v00= int(positivevalue * positiveCount)
v01= positiveCount -v00
v10 =int(negativevalue* negativeCount)
v11 =negativeCount -v10
print("disvalue:",xvalue)
print("positiverate:",positivevalue,"negativerate:",negativevalue)
print(v00,",",v01)
print(v10,",",v11)


xdis = [xvalue,xvalue] 
ydis = [0,0.2]  
plt.plot(xdis,ydis,'g')
"""ros 曲線"""
plt.figure(2)
plt.plot(negativeRate,positiveRate,'r')

運行結果以下所示:

 

相關文章
相關標籤/搜索