PCA主要是用來數據降維,將高緯度的特徵映射到低維度,具體可學習線性代數。html
這裏,咱們使用sklearn中的PCA.算法
from sklearn.decomposition import PCA X = np.array([[-1, -1, 1, -3], [-2, -1, 1, -3], [-3, -2, 1, -3], [1, 1, 1, -3], [2, 1, 1, -3], [3, 2, -1, -3]]) pca = PCA(n_components=4) pca.fit(X) print(pca.explained_variance_ratio_) #各成分百分比 print(pca.explained_variance_) #各成分值 pca = PCA(n_components=1) #原來是4維,如今降至1維 XX = pca.fit_transform(X) print(XX)
結果:dom
[0.94789175 0.04522847 0.00687978 0. ] [8.21506183 0.39198011 0.05962472 0. ] [[-1.42149543] [-2.2448796 ] [-3.60382274] [ 1.29639085] [ 2.11977502] [ 3.85403189]]
其實,直接看數據也能發現。例如,最後一維沒變化因此百分比爲0,最後一維只有一點點變化因此百分比也很小,它們對結果的影響很小,在降維時能夠去掉。工具
classifier = KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski', metric_params=None, n_jobs=1, n_neighbors=10, p=2, weights='uniform')
超參數須要本身嘗試。學習
from sklearn.neighbors import KNeighborsClassifier from sklearn.linear_model import LogisticRegression from sklearn.ensemble import RandomForestClassifier from sklearn.tree import DecisionTreeClassifier from sklearn.svm import SVC from sklearn.naive_bayes import GaussianNB from sklearn import metrics classifier = LogisticRegression(random_state = 0) #0.78 #classifier = KNeighborsClassifier(algorithm='kd_tree',n_neighbors = 5, metric = 'minkowski', p = 2, weights='uniform') #0.839 #classifier = SVC(kernel = 'linear', random_state = 0) #0.81 #classifier = SVC(kernel = 'rbf', random_state = 0) #0.77 #classifier = GaussianNB() #0.77 #classifier = DecisionTreeClassifier(criterion = 'entropy', random_state = 0) #0.64 #classifier = RandomForestClassifier(n_estimators = 10, criterion = 'entropy', random_state = 0) #0.83 classifier.fit(X_stard, Y_stard) YY_pred = classifier.predict(X_pred) result_NMI=metrics.normalized_mutual_info_score(YY_pred, Y_pred) print("result_NMI:",result_NMI) #3,1,minkowski 3,1,manhattan
sklearn調參有一個工具gridsearchcv,它存在的意義就是自動調參,只要把參數輸進去,就能夠對算法進行相應的調優,找到合適的參數。spa
### KNN from sklearn.neighbors import KNeighborsClassifier from sklearn.model_selection import GridSearchCV clf = KNeighborsClassifier() n_neighbors = list(range(1,10)) weights = ['uniform','distance'] algorithm_options = ['auto','ball_tree','kd_tree','brute'] leaf_range = list(range(1,10)) p = list(range(1,10)) param_grid = [{'n_neighbors': n_neighbors, 'weights': weights, 'algorithm': algorithm_options, 'leaf_size': leaf_range, 'p':p}] grid_search = GridSearchCV(clf, param_grid=param_grid, cv=10) grid_search.fit(X_pred, Y_pred) grid_search.best_score_, grid_search.best_estimator_, grid_search.best_params_
結果:.net
(0.9675572519083969, KNeighborsClassifier(algorithm='auto', leaf_size=1, metric='minkowski', metric_params=None, n_jobs=None, n_neighbors=7, p=2, weights='uniform'), {'algorithm': 'auto', 'leaf_size': 1, 'n_neighbors': 7, 'p': 2, 'weights': 'uniform'})
參考連接:rest
1. http://www.javashuo.com/article/p-dfvngklx-np.htmlcode
2. https://www.makcyun.top/2019/06/15/Machine_learning08.htmlcomponent