機器學習實戰-學習筆記-第二章

2.1節python

1.切換到工做目錄算法

2.在工做目錄下新建一個python腳本文件kNN.py,內容以下:app

from numpy import *
import operator

def createDataSet():
    group = array([[1.0,1.1],[1.0,1.0],[0,0],[0,0.1]])
    labels = ['A','A','B','B']
    return group, labels



def classify0(inX, dataSet, labels, k):
    dataSetSize = dataSet.shape[0]
    diffMat = tile(inX, (dataSetSize, 1)) - dataSet
    sqDiffMat = diffMat**2
    sqDistances = sqDiffMat.sum(axis=1)
    distances = sqDistances**0.5
    sortedDistIndicies = distances.argsort()
    classCount={}
    for i in range(k):
        voteILabel = labels[sortedDistIndicies[i]]
        classCount[voteILabel] = classCount.get(voteILabel, 0) + 1
    sortedClassCount = sorted(classCount.iteritems(), key=operator.itemgetter(1), reverse=True)
    return sortedClassCount[0][0]
    

而後進入Python REPL:函數

F:\studio\MachineLearningInAction\ch02>python
Python 2.7.10 |Anaconda 2.3.0 (64-bit)| (default, May 28 2015, 16:44:52) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://binstar.org
>>> from numpy import *
>>> import operator
>>> import kNN
>>> group, labels = kNN.createDataSet()
>>> group
array([[ 1. ,  1.1],
       [ 1. ,  1. ],
       [ 0. ,  0. ],
       [ 0. ,  0.1]])
>>> labels
['A', 'A', 'B', 'B']
>>> kNN.classify0([0,0], group, labels, 3)
'B'
>>>

 

2.2節測試

修改kNN.py文件,增長以下內容:spa

def getLabelID(labelName):
    if (labelName == "largeDoses"): return 3
    elif (labelName == "smallDoses"): return 2
    else: return 1

    
def file2matrix(filename):
    fr = open(filename)
    arrayOLines = fr.readlines()
    numberOfLines = len(arrayOLines)
    returnMat = zeros((numberOfLines, 3))
    classLabelVector = []
    index = 0
    for line in arrayOLines:
        line = line.strip()
        listFromLine = line.split('\t')
        returnMat[index,:] = listFromLine[0:3]
        classLabelVector.append(getLabelID(listFromLine[-1]))
        index += 1
    return returnMat,classLabelVector

注意這裏和原書的代碼不同,這裏我定義了一個從labelName到labelID的函數,因爲測試數據中的label是名稱。code

 

輸入以下腳本orm

>>> from numpy import *
>>> import operator
>>> import kNN
>>> datingDataMat,datingLabels = kNN.file2matrix('datingTestSet.txt')
>>> datingLabels[0:20]
[3, 2, 1, 1, 1, 1, 3, 3, 1, 3, 1, 1, 2, 1, 1, 1, 1, 1, 2, 3]
>>> import matplotlib
>>> import matplotlib.pyplot as plt
>>> fig = plt.figure()
>>> ax = fig.add_subplot(111)
>>> ax.scatter(datingDataMat[:,1],datingDataMat[:,2])
<matplotlib.collections.PathCollection object at 0x0000000002E19278>
>>> plt.show()

注意最後一句plt.show(),若是沒有這一句是沒有顯示的。blog

>>> ax.scatter(datingDataMat[:,1],datingDataMat[:,2],15*array(datingLabels), 15*array(datingLabels))
<matplotlib.collections.PathCollection object at 0x00000000035A9908>
>>> plt.show()
>>>
from numpy import *
import operator
import kNN
datingDataMat,datingLabels = kNN.file2matrix('datingTestSet.txt')
import matplotlib
import matplotlib.pyplot as plt
fig = plt.figure()
ax = fig.add_subplot(111)
ax.scatter(datingDataMat[:,0],datingDataMat[:,1],15*array(datingLabels), 15*array(datingLabels))
plt.show()

接着對數據進行規範化:ip

reload(kNN)
normMat, ranges, minVals = kNN.autoNorm(datingDataMat)
normMat
ranges
minVals

結果以下:

F:\studio\MachineLearningInAction\ch02>python
Python 2.7.10 |Anaconda 2.3.0 (64-bit)| (default, May 28 2015, 16:44:52) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://binstar.org
>>> from numpy import *
>>> import operator
>>> import kNN
>>> datingDataMat,datingLabels = kNN.file2matrix('datingTestSet.txt')
>>> normMat, ranges, minVals = kNN.autoNorm(datingDataMat)
>>> normMat
array([[ 0.44832535,  0.39805139,  0.56233353],
       [ 0.15873259,  0.34195467,  0.98724416],
       [ 0.28542943,  0.06892523,  0.47449629],
       ...,
       [ 0.29115949,  0.50910294,  0.51079493],
       [ 0.52711097,  0.43665451,  0.4290048 ],
       [ 0.47940793,  0.3768091 ,  0.78571804]])
>>> ranges
array([  9.12730000e+04,   2.09193490e+01,   1.69436100e+00])
>>> minVals
array([ 0.      ,  0.      ,  0.001156])
>>>

測試算法:向kNN.py中增長以下內容:

def datingClassTest():
    hoRatio = 0.10
    datingDataMat, datingLabels = file2matrix('datingTestSet.txt')
    normMat, ranges, minVals = autoNorm(datingDataMat)
    m = normMat.shape[0]
    numTestVecs = int(m*hoRatio)
    errorCount = 0.0
    for i in range(numTestVecs):
        classifierResult = classify0(normMat[i,:],normMat[numTestVecs:m,:],datingLabels[numTestVecs:m],3)
        print "the classifier came back with: %d,  the real answer is: %d" %(classifierResult, datingLabels[i])
        if (classifierResult != datingLabels[i]): 
            errorCount += 1.0
            print "ERROR: the classifier came back with: %d,  the real answer is: %d" %(classifierResult, datingLabels[i])
    print "the total error rate is: %f" %(errorCount/float(numTestVecs))
    

在Python REPL中運行腳本

>>> reload(kNN)
<module 'kNN' from 'kNN.py'>
>>> kNN.datingClassTest()
the classifier came back with: 3,  the real answer is: 3
the classifier came back with: 2,  the real answer is: 2
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 3,  the real answer is: 3
the classifier came back with: 3,  the real answer is: 3
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 3,  the real answer is: 3
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 2,  the real answer is: 2
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 2,  the real answer is: 2
the classifier came back with: 3,  the real answer is: 3
the classifier came back with: 2,  the real answer is: 2
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 1,  the real answer is: 2
ERROR: the classifier came back with: 1,  the real answer is: 2
the classifier came back with: 3,  the real answer is: 3
the classifier came back with: 2,  the real answer is: 2
the classifier came back with: 3,  the real answer is: 3
the classifier came back with: 2,  the real answer is: 2
the classifier came back with: 3,  the real answer is: 3
the classifier came back with: 2,  the real answer is: 2
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 3,  the real answer is: 3
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 3,  the real answer is: 3
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 2,  the real answer is: 2
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 2,  the real answer is: 2
the classifier came back with: 3,  the real answer is: 3
the classifier came back with: 3,  the real answer is: 3
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 2,  the real answer is: 2
the classifier came back with: 3,  the real answer is: 3
the classifier came back with: 3,  the real answer is: 3
the classifier came back with: 3,  the real answer is: 3
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 2,  the real answer is: 2
the classifier came back with: 2,  the real answer is: 2
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 3,  the real answer is: 3
the classifier came back with: 2,  the real answer is: 2
the classifier came back with: 2,  the real answer is: 2
the classifier came back with: 2,  the real answer is: 2
the classifier came back with: 2,  the real answer is: 2
the classifier came back with: 3,  the real answer is: 3
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 2,  the real answer is: 2
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 2,  the real answer is: 2
the classifier came back with: 2,  the real answer is: 2
the classifier came back with: 2,  the real answer is: 2
the classifier came back with: 2,  the real answer is: 2
the classifier came back with: 2,  the real answer is: 2
the classifier came back with: 3,  the real answer is: 3
the classifier came back with: 2,  the real answer is: 2
the classifier came back with: 3,  the real answer is: 3
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 2,  the real answer is: 2
the classifier came back with: 3,  the real answer is: 3
the classifier came back with: 2,  the real answer is: 2
the classifier came back with: 2,  the real answer is: 2
the classifier came back with: 3,  the real answer is: 1
ERROR: the classifier came back with: 3,  the real answer is: 1
the classifier came back with: 3,  the real answer is: 3
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 3,  the real answer is: 3
the classifier came back with: 3,  the real answer is: 3
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 2,  the real answer is: 2
the classifier came back with: 3,  the real answer is: 3
the classifier came back with: 3,  the real answer is: 1
ERROR: the classifier came back with: 3,  the real answer is: 1
the classifier came back with: 3,  the real answer is: 3
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 2,  the real answer is: 2
the classifier came back with: 2,  the real answer is: 2
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 3,  the real answer is: 3
the classifier came back with: 2,  the real answer is: 3
ERROR: the classifier came back with: 2,  the real answer is: 3
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 2,  the real answer is: 2
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 3,  the real answer is: 3
the classifier came back with: 3,  the real answer is: 3
the classifier came back with: 2,  the real answer is: 2
the classifier came back with: 1,  the real answer is: 1
the classifier came back with: 3,  the real answer is: 1
ERROR: the classifier came back with: 3,  the real answer is: 1
the total error rate is: 0.050000
相關文章
相關標籤/搜索