對於二分類問題,它的樣本只有正樣本和負樣本兩類。測試樣本中,正樣本被分類器斷定爲正樣本的數量記爲TP(true positive),被斷定爲負樣本的數量記爲FP(false negative)。負樣本被分類器斷定爲負樣本的數量記爲TN(true negative),被斷定爲正樣本的數量記爲FP(false positive)。如圖所示,A,B兩組樣本總數量各爲100。測試
精度定義: TP/(TP+FP)spa
召回率定義:TP/(TP+FN)code
虛景率: 1 - TP/(TP+FP)orm
真陽率:TPR =TP/(TP +FN)blog
假陽率:FPR = FP/(FP+TN)ip
ROS曲線的橫軸爲假陽率,縱軸爲真陽率。utf-8
一個好的分類曲線應該讓假陽率低,真陽率高,理想狀況下應該是接近於y=1 的直線,即讓曲線下的面積儘量的大。ci
例子:資源
生成兩組正態分佈樣本,兩組樣本對應的標籤分別表示正樣本,和負樣本;資源連接以下:it
連接:https://pan.baidu.com/s/1X4hHygzSQHB3f8_kepxE8A
提取碼:6uvg
# -*- coding: utf-8 -*- """ Spyder Editor This is a temporary script file. """ import numpy as np import matplotlib.pyplot as plt from scipy import stats def floatrange(start,stop,steps): return [start+float(i)*(stop-start)/(float(steps)-1) for i in range(steps)] """讀取數據""" data = np.loadtxt('data.txt') """"計算不一樣類別的正態參數""" totalCount = len(data[:,0]) positiveCount =np.sum(data[:,1]) negativeCount = totalCount - positiveCount #正標本均值,方差 positiveIndex= np.where(data[:,1] ==1) positiveSum = np.sum(data[positiveIndex,0]) positive_u =positiveSum / positiveCount positive_derta =np.sqrt(np.sum(np.square(data[positiveIndex,0] - positive_u )) / positiveCount) #負標本均值,方差 negativeIndex= np.where(data[:,1] ==0) negativeSum = np.sum(data[negativeIndex,0]) negative_u =negativeSum / negativeCount negative_derta =np.sqrt(np.sum(np.square(data[negativeIndex,0] - negative_u )) / negativeCount) #機率密度 曲線繪製 x = floatrange(2,25,1000) print(positive_u,positive_derta) pd = np.exp(-1.0*np.square(x-positive_u) / (2*np.square(positive_derta))) /(positive_derta*np.sqrt(2*np.pi)) nd = np.exp(-1.0*np.square(x-negative_u) / (2*np.square(negative_derta))) /(negative_derta*np.sqrt(2*np.pi)) plt.figure(1) plt.plot(x,pd,'r') plt.plot(x,nd,'b') #機率分佈構建 positiveFun = stats.norm(positive_u,positive_derta) negativeFun = stats.norm(negative_u,negative_derta) positiveValue = positiveFun.cdf(x) negativeValue = negativeFun.cdf(x) #真陽率,假陽率 positiveRate = 1 -positiveFun.cdf(x) negativeRate = 1 -negativeFun.cdf(x) #閥值 disvalue =positiveFun.cdf(x) +1 -negativeFun.cdf(x) minvalue = np.min(disvalue) index = np.where(disvalue == minvalue) indexvalue =int(index[0]) xvalue = x[indexvalue] #混淆矩陣 positivevalue = 1 -positiveFun.cdf(xvalue) negativevalue = 1 -negativeFun.cdf(xvalue) v00= int(positivevalue * positiveCount) v01= positiveCount -v00 v10 =int(negativevalue* negativeCount) v11 =negativeCount -v10 print("disvalue:",xvalue) print("positiverate:",positivevalue,"negativerate:",negativevalue) print(v00,",",v01) print(v10,",",v11) xdis = [xvalue,xvalue] ydis = [0,0.2] plt.plot(xdis,ydis,'g') """ros 曲線""" plt.figure(2) plt.plot(negativeRate,positiveRate,'r')
運行結果以下所示: