ROC(Receiver Operating Characteristic)曲線和AUC常被用來評價一個二值分類器(binary classifier)的優劣,對二者的簡單介紹見這裏。這篇博文簡單介紹ROC和AUC的特色,以及更爲深刻地,討論如何做出ROC曲線圖以及計算AUC。python
正如咱們在這個ROC曲線的示例圖中看到的那樣,ROC曲線的橫座標爲false positive rate(FPR),縱座標爲true positive rate(TPR)。下圖中詳細說明了FPR和TPR是如何定義的。github
接下來咱們考慮ROC曲線圖中的四個點和一條線。第一個點,(0,1),即FPR=0, TPR=1,這意味着FN(false negative)=0,而且FP(false positive)=0。Wow,這是一個完美的分類器,它將全部的樣本都正確分類。第二個點,(1,0),即FPR=1,TPR=0,相似地分析能夠發現這是一個最糟糕的分類器,由於它成功避開了全部的正確答案。第三個點,(0,0),即FPR=TPR=0,即FP(false positive)=TP(true positive)=0,能夠發現該分類器預測全部的樣本都爲負樣本(negative)。相似的,第四個點(1,1),分類器實際上預測全部的樣本都爲正樣本。通過以上的分析,咱們能夠斷言,ROC曲線越接近左上角,該分類器的性能越好。算法
In signal detection theory, a receiver operating characteristic (ROC), or simply ROC curve, is a graphical plot which illustrates the performance of a binary classifier system as its discrimination threshold is varied.性能
問題在於「as its discrimination threashold is varied」。如何理解這裏的「discrimination threashold」呢?咱們忽略了分類器的一個重要功能「機率輸出」,即表示分類器認爲某個樣本具備多大的機率屬於正樣本(或負樣本)。經過更深刻地瞭解各個分類器的內部機理,咱們總能想辦法獲得一種機率輸出。一般來講,是將一個實數範圍經過某個變換映射到(0,1)區間3。測試
假如咱們已經獲得了全部樣本的機率輸出(屬於正樣本的機率),如今的問題是如何改變「discrimination threashold」?咱們根據每一個測試樣本屬於正樣本的機率值從大到小排序。下圖是一個示例,圖中共有20個測試樣本,「Class」一欄表示每一個測試樣本真正的標籤(p表示正樣本,n表示負樣本),「Score」表示每一個測試樣本屬於正樣本的機率4。ui
AUC(Area Under Curve)被定義爲ROC曲線下的面積,顯然這個面積的數值不會大於1。又因爲ROC曲線通常都處於y=x這條直線的上方,因此AUC的取值範圍在0.5和1之間。使用AUC值做爲評價標準是由於不少時候ROC曲線並不能清晰的說明哪一個分類器的效果更好,而做爲一個數值,對應AUC更大的分類器效果更好。
那麼AUC值的含義是什麼呢?根據(Fawcett, 2006),AUC的值的含義是:
The AUC value is equivalent to the probability that a randomly chosen positive example is ranked higher than a randomly chosen negative example.
既然已經這麼多評價標準,爲何還要使用ROC和AUC呢?由於ROC曲線有個很好的特性:當測試集中的正負樣本的分佈變化的時候,ROC曲線可以保持不變。在實際的數據集中常常會出現類不平衡(class imbalance)現象,即負樣本比正樣本多不少(或者相反),並且測試數據中的正負樣本的分佈也可能隨着時間變化。下圖是ROC曲線和Precision-Recall曲線5的對比:
說明,文中除了第一張圖來自Wikipedia外,其餘的圖都來自論文(Fawcett, 2006)6截圖.
- def cal_rate(result, thres):
- all_number = len(result[0])
- # print all_number
- TP = 0
- FP = 0
- FN = 0
- TN = 0
- for item in range(all_number):
- disease = result[0][item]
- if disease >= thres:
- disease = 1
- if disease == 1:
- if result[1][item] == 1:
- TP += 1
- else:
- FP += 1
- else:
- if result[1][item] == 0:
- TN += 1
- else:
- FN += 1
- # print TP+FP+TN+FN
- accracy = float(TP+FP) / float(all_number)
- if TP+FP == 0:
- precision = 0
- else:
- precision = float(TP) / float(TP+FP)
- TPR = float(TP) / float(TP+FN)
- TNR = float(TN) / float(FP+TN)
- FNR = float(FN) / float(TP+FN)
- FPR = float(FP) / float(FP+TN)
- # print accracy, precision, TPR, TNR, FNR, FPR
- return accracy, precision, TPR, TNR, FNR, FPR
- #prob是樣本正確率的array,label則是樣本label的array
- threshold_vaule = sorted(prob)
- threshold_num = len(threshold_vaule)
- accracy_array = np.zeros(threshold_num)
- precision_array = np.zeros(threshold_num)
- TPR_array = np.zeros(threshold_num)
- TNR_array = np.zeros(threshold_num)
- FNR_array = np.zeros(threshold_num)
- FPR_array = np.zeros(threshold_num)
- # calculate all the rates
- for thres in range(threshold_num):
- accracy, precision, TPR, TNR, FNR, FPR = cal_rate((prob,label), threshold_vaule[thres])
- accracy_array[thres] = accracy
- precision_array[thres] = precision
- TPR_array[thres] = TPR
- TNR_array[thres] = TNR
- FNR_array[thres] = FNR
- FPR_array[thres] = FPR
- AUC = np.trapz(TPR_array, FPR_array)
- threshold = np.argmin(abs(FNR_array - FPR_array))
- EER = (FNR_array[threshold]+FPR_array[threshold])/2
- plt.plot(FPR_array, TPR_array)
- plt.title('roc')
- plt.xlabel('FPR_array')
- plt.ylabel('TPR_array')
- import numpy as np
- import cv2
- import os
- import matplotlib.pyplot as plt
- import re
- '''''
- calculate each rate
- '''
- def cal_rate(result, num, thres):
- all_number = len(result[0])
- # print all_number
- TP = 0
- FP = 0
- FN = 0
- TN = 0
- for item in range(all_number):
- disease = result[0][item,num]
- if disease >= thres:
- disease = 1
- if disease == 1:
- if result[1][item,num] == 1:
- TP += 1
- else:
- FP += 1
- else:
- if result[1][item,num] == 0:
- TN += 1
- else:
- FN += 1
- # print TP+FP+TN+FN
- accracy = float(TP+FP) / float(all_number)
- if TP+FP == 0:
- precision = 0
- else:
- precision = float(TP) / float(TP+FP)
- TPR = float(TP) / float(TP+FN)
- TNR = float(TN) / float(FP+TN)
- FNR = float(FN) / float(TP+FN)
- FPR = float(FP) / float(FP+TN)
- # print accracy, precision, TPR, TNR, FNR, FPR
- return accracy, precision, TPR, TNR, FNR, FPR
- disease_class = ['Atelectasis','Cardiomegaly','Effusion','Infiltration','Mass','Nodule','Pneumonia','Pneumothorax']
- style = ['r-','g-','b-','y-','r--','g--','b--','y--']
- '''''
- plot roc and calculate AUC/ERR, result: (prob, label)
- '''
- prob = np.random.rand(100,8)
- label = np.where(prob>=0.5,prob,0)
- label = np.where(label<0.5,label,1)
- count = np.count_nonzero(label)
- label = np.zeros((100,8))
- label[1:20,:]=1
- print label
- print prob
- print count
- for clss in range(len(disease_class)):
- threshold_vaule = sorted(prob[:,clss])
- threshold_num = len(threshold_vaule)
- accracy_array = np.zeros(threshold_num)
- precision_array = np.zeros(threshold_num)
- TPR_array = np.zeros(threshold_num)
- TNR_array = np.zeros(threshold_num)
- FNR_array = np.zeros(threshold_num)
- FPR_array = np.zeros(threshold_num)
- # calculate all the rates
- for thres in range(threshold_num):
- accracy, precision, TPR, TNR, FNR, FPR = cal_rate((prob,label), clss, threshold_vaule[thres])
- accracy_array[thres] = accracy
- precision_array[thres] = precision
- TPR_array[thres] = TPR
- TNR_array[thres] = TNR
- FNR_array[thres] = FNR
- FPR_array[thres] = FPR
- # print TPR_array
- # print FPR_array
- AUC = np.trapz(TPR_array, FPR_array)
- threshold = np.argmin(abs(FNR_array - FPR_array))
- EER = (FNR_array[threshold]+FPR_array[threshold])/2
- print ('disease %10s threshold : %f' % (disease_class[clss],threshold))
- print ('disease %10s accracy : %f' % (disease_class[clss],accracy_array[threshold]))
- print ('disease %10s EER : %f AUC : %f' % (disease_class[clss],EER, -AUC))
- plt.plot(FPR_array, TPR_array, style[clss], label=disease_class[clss])
- plt.title('roc')
- plt.xlabel('FPR_array')
- plt.ylabel('TPR_array')
- plt.legend()