KNN K近鄰分類算法思想:算法
假設咱們有訓練數據<x,y>, x表示某個已經分類的樣本,y表示其所屬的分類,如今要對樣本z進行分類。app
步驟以下:orm
(1)計算 z 與全部 x 的距離 D,距離計算公式有多種,如歐氏距離。排序
(2)按照距離 D 將已分類的樣本進行排序。ip
(3)選取前 K 個樣本,統計這些樣本所屬的類別,出現次數最多的即爲z所屬類別。get
主要方法:it
def file2matrix(fileName):
fr = open(fileName)
lines = fr.readlines()
numofLines = len(lines)
returnMatrix = zeros((numofLines,3)) //生成矩陣
classLabelVector = []
index=0
for line in lines:
line = line.strip(); //去除回車
listFromLine = line.split('\t')
returnMatrix[index,:] = listFromLine[0:3] //對矩陣進行賦值
classLabelVector.append(int(listFromLine[-1]))
index = index +1
return returnMatrix,classLabelVectorclass
def classify(inX,dataSet,labels,k):
dataSetSize = dataSet.shape[0]
diffMat = tile(inX, (dataSetSize,1))-dataSet
sqDiffMat = diffMat**2;
sqDistence = sqDiffMat.sum(axis=1)
distances = sqDistence**0.5
sortedDistances = distances.argsort()
classCount={}
for i in range(k):
votelabel = labels[sortedDistances[i]]
classCount[votelabel] = classCount.get(votelabel,0)+1
sortedClassCount = sorted(classCount.iteritems(),key = operator.itemgetter(1),reverse=True)
return sortedClassCount[0][0]file
def autoNorm(dataset):
minvalue = dataset.min(0)
maxvalue = dataset.max(0)
ranges = maxvalue-minvalue
m = dataset.shape[0]
normdataset = dataset-tile(minvalue,(m,1))
normdataset = normdataset/tile(ranges,(m,1))
return normdataset,ranges,minvalue方法