sklearn學習筆記

時間 2019-11-13
標籤 sklearn 學習筆記简体版
原文原文鏈接
1。監督學習
1.1。廣義線性模型
1.1.1。普通最小二乘法 
class sklearn.linear_model.LinearRegression(fit_intercept=True, normalize=False, copy_X=True, n_jobs=1)
1.1.1.1。普通最小二乘法複雜性 O(np2)
1.1.2。嶺迴歸 linear_model.Ridge
1.1.2.1。脊的複雜性 O(np2)
1.1.2.2。設置正則化參數：廣義交叉驗證linear_model.RidgeCV
1.1.3。套索 linear_model.Lasso
1.1.3.1。正則化參數設置
1.1.3.1.1。使用交叉驗證
1.1.3.1.2。基於信息標準的模型選擇
1.1.3.1.3。支持向量機正則化參數的比較
1.1.4。多任務的套索
1.1.5。彈性網
1.1.6。多任務彈性網
1.1.7。最小角迴歸
1.1.8。LARS-Lasso
1.1.8.1。數學公式
1.1.9。正交匹配追蹤（OMP）
1.1.10。貝葉斯迴歸
1.1.10.1。貝葉斯嶺迴歸
1.1.10.2。自動相關性斷定
1.1.11。Logistic迴歸
class sklearn.linear_model.LogisticRegression(penalty=’l2’, dual=False, tol=0.0001, C=1.0, fit_intercept=True, intercept_scaling=1, class_weight=None, random_state=None, solver=’liblinear’, max_iter=100, multi_class=’ovr’, verbose=0, warm_start=False, n_jobs=1)
penalty：在調參時若是咱們主要的目的只是爲了解決過擬合，通常penalty選擇L2正則化就夠了。
    可是若是選擇L2正則化發現仍是過擬合，即預測效果差的時候，就能夠考慮L1正則化。
    另外，若是模型的特徵很是多，咱們但願一些不重要的特徵係數歸零，從而讓模型係數稀疏化的話，也可使用L1正則化。
    penalty參數的選擇會影響咱們損失函數優化算法的選擇。即參數solver的選擇，若是是L2正則化，那麼4種可選的算法{‘newton-cg’, ‘lbfgs’, ‘liblinear’, ‘sag’}均可以選擇。可是若是penalty是L1正則化的話，就只能選擇‘liblinear’了。這是由於L1正則化的損失函數不是連續可導的，而{‘newton-cg’, ‘lbfgs’,‘sag’}這三種優化算法時都須要損失函數的一階或者二階連續導數。而‘liblinear’並無這個依賴。
solver:決定了咱們對邏輯迴歸損失函數的優化方法，有4種算法能夠選擇，分別是：
    a) liblinear：使用了開源的liblinear庫實現，內部使用了座標軸降低法來迭代優化損失函數。
    b) lbfgs：擬牛頓法的一種，利用損失函數二階導數矩陣即海森矩陣來迭代優化損失函數。
    c) newton-cg：也是牛頓法家族的一種，利用損失函數二階導數矩陣即海森矩陣來迭代優化損失函數。
    d) sag：即隨機平均梯度降低
    liblinear支持L1和L2，只支持one-vs-rest(OvR)作多分類，「lbfgs」, 「sag」 「newton-cg」只支持L2，支持one-vs-rest(OvR)和many-vs-many(MvM)作多分類。
multi_class:決定了咱們分類方式的選擇，有 ovr和multinomial兩個值能夠選擇，默認是 ovr。
    ovr即前面提到的one-vs-rest(OvR)，而multinomial即前面提到的many-vs-many(MvM)。若是是二元邏輯迴歸，ovr和multinomial並無任何區別，區別主要在多元邏輯迴歸上。
    OvR的思想很簡單，不管你是多少元邏輯迴歸，咱們均可以看作二元邏輯迴歸。具體作法是，對於第K類的分類決策，咱們把全部第K類的樣本做爲正例，除了第K類樣本之外的全部樣本都做爲負例，而後在上面作二元邏輯迴歸，獲得第K類的分類模型。其餘類的分類模型得到以此類推。
    而MvM則相對複雜，這裏舉MvM的特例one-vs-one(OvO)做講解。若是模型有T類，咱們每次在全部的T類樣本里面選擇兩類樣本出來，不妨記爲T1類和T2類，把全部的輸出爲T1和T2的樣本放在一塊兒，把T1做爲正例，T2做爲負例，進行二元邏輯迴歸，獲得模型參數。咱們一共須要T(T-1)/2次分類。
    從上面的描述能夠看出OvR相對簡單，但分類效果相對略差（這裏指大多數樣本分佈狀況，某些樣本分佈下OvR可能更好）。而MvM分類相對精確，可是分類速度沒有OvR快。1.1.12。隨機梯度降低- SGD
class_weight:能夠選擇balanced讓類庫本身計算類型權重，或者咱們本身以字典形式輸入各個類型的權重，
    當class_weight爲balanced時，類權重計算方法以下：n_samples / (n_classes * np.bincount(y))，n_samples爲樣本數，n_classes爲類別數量，np.bincount(y)會輸出每一個類的樣本數，例如y=[1,0,0,1,1],則np.bincount(y)=[2,3]
sample_weight:class_weight是樣本平衡的狀況下使用，若是樣本不均衡，在fit數據時使用fit(X, y[, sample_weight])來本身調節每一個樣本權重。在scikit-learn作邏輯迴歸時，若是上面兩種方法都用到了，那麼樣本的真正權重是class_weight*sample_weight.
C：正則化參數
max_iter:迭代次數
1.1.13。感知器
class sklearn.linear_model.Perceptron(penalty=None, alpha=0.0001, fit_intercept=True, max_iter=None, tol=None, shuffle=True, verbose=0, eta0=1.0, n_jobs=1, random_state=0, class_weight=None, warm_start=False, n_iter=None)
1.1.14。被動攻擊的算法
1.1.15。穩健迴歸：離羣值和建模偏差
1.1.15.1。不一樣的場景和有用的概念
1.1.15.2。方法：隨機抽樣一致
1.1.15.2.1。算法細節
1.1.15.3。泰爾森估計：廣義中值估計
1.1.15.3.1。的理論思考
1.1.15.4。胡貝爾的迴歸
1.1.15.5。筆記
1.1.16。多項式迴歸：用基函數展開線性模型



1.2。線性和二次判別分析
1.2.1。基於線性判別分析的降維
1.2.2。數學公式的LDA和QDA分類
1.2.3。LDA降維的數學公式
class sklearn.discriminant_analysis.LinearDiscriminantAnalysis(solver=’svd’, shrinkage=None, priors=None, n_components=None, store_covariance=False, tol=0.0001)
1.2.4。收縮
1.2.5。估計算法



1.3。核嶺迴歸



1.4。支持向量機
1.4.1。分類
1.4.1.1。多類分類
class sklearn.svm.LinearSVC(penalty=’l2’, loss=’squared_hinge’, dual=True, tol=0.0001, C=1.0, multi_class=’ovr’, fit_intercept=True, intercept_scaling=1, class_weight=None, verbose=0, random_state=None, max_iter=1000)
只有linearsvm使用multi_class="crammer_singer"來達到一對一分類器
1.4.1.2。分數和機率
1.4.1.3。不平衡的問題
1.4.2。迴歸
1.4.3。密度估計，新穎性檢測 （2.7裏有詳解）
1.4.4。複雜性
1.4.5。實際使用技巧
1.4.6。核函數
1.4.6.1。自定義內核
1.4.6.1.1。使用Python函數做爲內核
1.4.6.1.2。利用Gram矩陣
kernel='precomputed'在fit方法中設置並傳遞Gram矩陣而不是X。此時，必須提供全部訓練向量和測試向量之間的內核值。
clf = svm.SVC(kernel='precomputed')
gram = np.dot(X, X.T)
clf.fit(gram, y)
1.4.6.1.3。徑向基函數核參數
利用模型選擇中的gridsearchcv進行c和gamme參數的選擇
1.4.7。數學公式
1.4.7.1。SVC
class sklearn.svm.SVC(C=1.0, kernel=’rbf’, degree=3, gamma=’auto’, coef0=0.0, shrinking=True, probability=False, tol=0.001, cache_size=200, class_weight=None, verbose=False, max_iter=-1, decision_function_shape=’ovr’, random_state=None)
SVC參數解釋 
（1）C: 目標函數的懲罰係數C，用來平衡分類間隔margin和錯分樣本的，default C = 1.0； 
（2）kernel：參數選擇有RBF, Linear, Poly, Sigmoid, 默認的是"RBF"; 
（3）degree：if you choose 'Poly' in param 2, this is effective, degree決定了多項式的最高次冪； 
（4）gamma：核函數的係數('Poly', 'RBF' and 'Sigmoid'), 默認是gamma = 1 / n_features; 
（5）coef0：核函數中的獨立項，'RBF' and 'Poly'有效； 
（6）probablity: 可能性估計是否使用(true or false)； 
（7）shrinking：是否進行啓發式； 
（8）tol（default = 1e - 3）: svm結束標準的精度; 
（9）cache_size: 制定訓練所須要的內存（以MB爲單位）； 
（10）class_weight: 每一個類所佔據的權重，不一樣的類設置不一樣的懲罰參數C, 缺省的話自適應； 
（11）verbose: 跟多線程有關，不大明白啥意思具體； 
（12）max_iter: 最大迭代次數，default = 1， if max_iter = -1, no limited; 
（13）decision_function_shape ： ‘ovo’ 一對一, ‘ovr’ 多對多  or None 無, default=None 
（14）random_state ：用於機率估計的數據重排時的僞隨機數生成器的種子。
（15）decision_function是樣本對於不一樣類的分數 
 ps：7,8,9通常不考慮。
decision_function(X)    Distance of the samples X to the separating hyperplane.
fit(X, y[, sample_weight])    Fit the SVM model according to the given training data.
get_params([deep])    Get parameters for this estimator.
predict(X)    Perform classification on samples in X.
score(X, y[, sample_weight])    Returns the mean accuracy on the given test data and labels.
set_params(**params)    Set the parameters of this estimator. 
1.4.7.2。nusvc
class sklearn.svm.NuSVC(nu=0.5, kernel=’rbf’, degree=3, gamma=’auto’, coef0=0.0, shrinking=True, probability=False, tol=0.001, cache_size=200, class_weight=None, verbose=False, max_iter=-1, decision_function_shape=’ovr’, random_state=None)
1.4.7.3。SVR
class sklearn.svm.SVR(kernel=’rbf’, degree=3, gamma=’auto’, coef0=0.0, tol=0.001, C=1.0, epsilon=0.1, shrinking=True, cache_size=200, verbose=False, max_iter=-1)參數裏沒有class_weight，屬性有sample_weight設置C值
1.4.8。實施細則



1.5。隨機梯度降低法
1.5.1。分類
class sklearn.linear_model.SGDClassifier(loss=’hinge’, penalty=’l2’, alpha=0.0001, l1_ratio=0.15, fit_intercept=True, max_iter=None, tol=None, shuffle=True, verbose=0, epsilon=0.1, n_jobs=1, random_state=None, learning_rate=’optimal’, eta0=0.0, power_t=0.5, class_weight=None, warm_start=False, average=False, n_iter=None)
1.5.2。迴歸
class sklearn.linear_model.SGDRegressor(loss=’squared_loss’, penalty=’l2’, alpha=0.0001, l1_ratio=0.15, fit_intercept=True, max_iter=None, tol=None, shuffle=True, verbose=0, epsilon=0.1, random_state=None, learning_rate=’invscaling’, eta0=0.01, power_t=0.25, warm_start=False, average=False, n_iter=None)
1.5.3。稀疏數據的隨機梯度降低
1.5.4。複雜性
1.5。實際使用技巧
1.5.6。數學公式
1.5.6.1。SGD
1.5.7。實施細則



1.6。最近的鄰居
1.6.1。無監督的近鄰
1.6.1.1。尋找最近的鄰居
from sklearn.neighbors import NearestNeighbors
import numpy as np
X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
nbrs = NearestNeighbors(n_neighbors=2, algorithm='ball_tree').fit(X)
distances, indices = nbrs.kneighbors(X)
1.6.1.2。KDTree和BallTree Classes
1.6.2。最近鄰分類
class sklearn.neighbors.KNeighborsClassifier(n_neighbors=5, weights=’uniform’, algorithm=’auto’, leaf_size=30, p=2, metric=’minkowski’, metric_params=None, n_jobs=1, **kwargs)
KNeighborsClassifier.fit(X,y)
1.6.3。最近鄰迴歸
class sklearn.neighbors.KNeighborsRegressor(n_neighbors=5, weights=’uniform’, algorithm=’auto’, leaf_size=30, p=2, metric=’minkowski’, metric_params=None, n_jobs=1, **kwargs)
最近鄰迴歸是用在標籤值是連續取值的場景智商的，而不是離散取值，而是用最近鄰迴歸進行查詢的點，最後獲得的結果是其全部最近鄰居的平均值。
1.6.4。最近鄰居算法
1.6.4.1。蠻力
1.6.4.2。k-d樹
 algorithm = 'kd_tree'
1.6.4.3。球樹
1.6.4.4。最近鄰算法的選擇
1.6.4.5。影響leaf_size
1.6.5。最近的質心分類器
1.6.5.1。最近萎縮的重心



1.7。高斯過程
1.7.1。高斯過程迴歸（GPR）
1.7.2。探地雷達的例子
1.7.2.1。噪聲水平估計的探地雷達
1.7.2.2。探地雷達與Kernel Ridge迴歸的比較
1.7.2.3。探地雷達在冒納羅亞CO2數據
1.7.3。高斯過程分類（GPC）
1.7.4。GPC的例子
1.7.4.1。GPC的機率預測
1.7.4.2。異或數據集上的GPC實例
1.7.4.3。虹膜數據集的高斯過程分類
1.7.5。高斯過程的核函數
1.7.5.1。高斯過程核API
1.7.5.2。基本內核
1.7.5.3。核心運營商
1.7.5.4。徑向基函數（RBF）核
1.7.5.5。堆芯
1.7.5.6。有理二次核
1.7.5.7。驗正弦平方核
1.7.5.8。點積核
1.7.5.9。工具書類
1.7.6。傳統的高斯過程
1.7.6.1。介紹性迴歸例子
1.7.6.2。數據擬合
1.7.6.3。數學公式
1.7.6.3.1。最初的假設
1.7.6.3.2。最佳線性無偏預測（BLUP）
1.7.6.3.3。經驗最佳線性無偏預測（EBLUP）
1.7.6.4。相關模型
1.7.6.5。迴歸模型
1.7.6.6。實施細則



1.8。交叉分解



1.9。樸素貝葉斯
1.9.1。高斯樸素貝葉斯
class sklearn.naive_bayes.GaussianNB(priors=None)
1.9.2。多項式樸素貝葉斯
class sklearn.naive_bayes.MultinomialNB(alpha=1.0, fit_prior=True, class_prior=None)
1.9.3。伯努利的樸素貝葉斯
class sklearn.naive_bayes.BernoulliNB(alpha=1.0, binarize=0.0, fit_prior=True, class_prior=None)
binarize=0.0:默認輸入的是二進制的向量
fit_prior=True：是否從新計算先驗機率，若爲FALSE，則使用統一的鮮豔機率（我也不知道有啥用）
class_prior=None：指定先驗機率 
<p>在多項式模型中：</p>
<p>在多項式模型中， 設某文檔d=(t1,t2,…,tk)，tk是該文檔中出現過的單詞，容許重複，則</p>
<p>先驗機率P(c)= 類c下單詞總數/整個訓練樣本的單詞總數 </p>
<p>類條件機率P(tk|c)=(類c下單詞tk在各個文檔中出現過的次數之和+1)/(類c下單詞總數+|V|)</p>
<p>V是訓練樣本的單詞表（即抽取單詞，單詞出現屢次，只算一個），|V|則表示訓練樣本包含多少種單詞。 P(tk|c)能夠看做是單詞tk在證實d屬於類c上提供了多大的證據，而P(c)則能夠認爲是類別c在總體上佔多大比例(有多大可能性)。</p>
<p>在伯努利模型中：</p>
<p>P(c)= 類c下文件總數/整個訓練樣本的文件總數 </p>
<p>P(tk|c)=(類c下包含單詞tk的文件數+1)/(類c下單詞總數+2)</p>
1.9.4。非核心樸素貝葉斯模型擬合


1.10。決策樹
決策樹的一些優勢是：
簡單的理解和解釋。樹木可視化。
須要不多的數據準備。其餘技術一般須要數據歸一化，須要建立虛擬變量，並刪除空值。請注意，此模塊不支持缺乏值。
使用樹的成本（即，預測數據）在用於訓練樹的數據點的數量上是對數的。
可以處理數字和分類數據。其餘技術一般專門用於分析只有一種變量類型的數據集。有關更多信息，請參閱算法。
可以處理多輸出問題。
使用白盒模型。若是給定的狀況在模型中能夠觀察到，那麼條件的解釋很容易用布爾邏輯來解釋。相比之下，在黑盒子模型（例如，在人造神經網絡中），結果可能更難解釋。
可使用統計測試驗證模型。這樣能夠說明模型的可靠性。
即便其假設被數據生成的真實模型有些違反，表現良好。
決策樹的缺點包括：
決策樹學習者能夠建立不能很好地推廣數據的過於複雜的樹。這被稱爲過擬合。修剪（不支持當前）的機制，設置葉節點所需的最小樣本數或設置樹的最大深度是避免此問題的必要條件。
決策樹可能不穩定，由於數據的小變化可能會致使徹底不一樣的樹生成。經過使用合奏中的決策樹來減輕這個問題。
在最優性的幾個方面甚至簡單的概念中，學習最優決策樹的問題已知是NP完整的。所以，實際的決策樹學習算法基於啓發式算法，例如在每一個節點進行局部最優決策的貪心算法。這樣的算法不能保證返回全局最優決策樹。這能夠經過在綜合學習者中訓練多個樹木來緩解，其中特徵和樣本隨機抽樣取代。
有一些難以學習的概念，由於決策樹不能很容易地表達它們，例如XOR，奇偶校驗或複用器問題。
若是某些類占主導地位，決策樹學習者會創造有偏見的樹木。所以，建議在擬合以前平衡數據集與決策樹。
1.10.1。分類
class sklearn.tree.DecisionTreeClassifier(criterion=’gini’, splitter=’best’, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=None, random_state=None, max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, class_weight=None, presort=False)
>>> import graphviz
>>> dot_data = tree.export_graphviz(clf, out_file=None, 
                         feature_names=iris.feature_names,  
                         class_names=iris.target_names,  
                         filled=True, rounded=True,  
                         special_characters=True)  
>>> graph = graphviz.Source(dot_data)  
>>> graph.render("iris")
輸出n_output值predict
輸出類機率的n_output數組列表 predict_proba
*劃分時考慮的最大特徵數max_features：可使用不少種類型的值，默認是"None",意味着劃分時考慮全部的特徵數；若是是"log2"意味着劃分時最多考慮log2N個特徵；若是是"sqrt"或者"auto"意味着劃分時最多考慮N個特徵。若是是整數，表明考慮的特徵絕對數。若是是浮點數，表明考慮特徵百分比，即考慮（百分比xN）取整後的特徵數。其中N爲樣本總特徵數。通常來講，若是樣本特徵數很少，好比小於50，咱們用默認的"None"就能夠了，若是特徵數很是多，咱們能夠靈活使用剛纔描述的其餘取值來控制劃分時考慮的最大特徵數，以控制決策樹的生成時間。
*決策樹最大深max_depth：決策樹的最大深度，默承認以不輸入，若是不輸入的話，決策樹在創建子樹的時候不會限制子樹的深度。通常來講，數據少或者特徵少的時候能夠無論這個值。若是模型樣本量多，特徵也多的狀況下，推薦限制這個最大深度，具體的取值取決於數據的分佈。經常使用的能夠取值10-100之間。
*內部節點再劃分所需最小樣本數min_samples_split：這個值限制了子樹繼續劃分的條件，若是某節點的樣本數少於min_samples_split，則不會繼續再嘗試選擇最優特徵來進行劃分。 默認是2.若是樣本量不大，不須要管這個值。若是樣本量數量級很是大，則推薦增大這個值。我以前的一個項目例子，有大概10萬樣本，創建決策樹時，我選擇了min_samples_split=10。能夠做爲參考。
*葉子節點最少樣本數min_samples_leaf：這個值限制了葉子節點最少的樣本數，若是某葉子節點數目小於樣本數，則會和兄弟節點一塊兒被剪枝。 默認是1,能夠輸入最少的樣本數的整數，或者最少樣本數佔樣本總數的百分比。若是樣本量不大，不須要管這個值。若是樣本量數量級很是大，則推薦增大這個值。以前的10萬樣本項目使用min_samples_leaf的值爲5，僅供參考。
*特徵選擇標準criterion:可使用"gini"或者"entropy"，前者表明基尼係數，後者表明信息增益。通常說使用默認的基尼係數"gini"就能夠了，即CART算法。除非你更喜歡相似ID3, C4.5的最優特徵選擇方法.////regression：可使用"mse"或者"mae"，前者是均方差，後者是和均值之差的絕對值之和。推薦使用默認的"mse"。通常來講"mse"比"mae"更加精確。除非你想比較二個參數的效果的不一樣之處。
特徵劃分點選擇標準splitter：可使用"best"或者"random"。前者在特徵的全部劃分點中找出最優的劃分點。後者是隨機的在部分劃分點中找局部最優的劃分點。默認的"best"適合樣本量不大的時候，而若是樣本數據量很是大，此時決策樹構建推薦"random" 
葉子節點最小的樣本權重和min_weight_fraction_leaf：這個值限制了葉子節點全部樣本權重和的最小值，若是小於這個值，則會和兄弟節點一塊兒被剪枝。 默認是0，就是不考慮權重問題。通常來講，若是咱們有較多樣本有缺失值，或者分類樹樣本的分佈類別誤差很大，就會引入樣本權重，這時咱們就要注意這個值了。
最大葉子節點數max_leaf_nodes：經過限制最大葉子節點數，能夠防止過擬合，默認是"None」，即不限制最大的葉子節點數。若是加了限制，算法會創建在最大葉子節點數內最優的決策樹。若是特徵很少，能夠不考慮這個值，可是若是特徵分紅多的話，能夠加以限制，具體的值能夠經過交叉驗證獲得。
類別權重class_weight：指定樣本各種別的的權重，主要是爲了防止訓練集某些類別的樣本過多，致使訓練的決策樹過於偏向這些類別。這裏能夠本身指定各個樣本的權重，或者用「balanced」，若是使用「balanced」，則算法會本身計算權重，樣本量少的類別所對應的樣本權重會高。固然，若是你的樣本類別分佈沒有明顯的偏倚，則能夠無論這個參數，選擇默認的"None"     不適用於迴歸樹
節點劃分最小不純度min_impurity_split：這個值限制了決策樹的增加，若是某節點的不純度(基尼係數，信息增益，均方差，絕對差)小於這個閾值，則該節點再也不生成子節點。即爲葉子節點 。通常不推薦改動默認值1e-7。
數據是否預排序presort：這個值是布爾值，默認是False不排序。通常來講，若是樣本量少或者限制了一個深度很小的決策樹，設置爲true可讓劃分點選擇更加快，決策樹創建的更加快。若是樣本量太大的話，反而沒有什麼好處。問題是樣本量少的時候，我速度原本就不慢。因此這個值通常懶得理它就能夠了。
　　　　除了這些參數要注意之外，其餘在調參時的注意點有：
　　　　1）當樣本少數量可是樣本特徵很是多的時候，決策樹很容易過擬合，通常來講，樣本數比特徵數多一些會比較容易創建健壯的模型
　　　　2）若是樣本數量少可是樣本特徵很是多，在擬合決策樹模型前，推薦先作維度規約，好比主成分分析（PCA），特徵選擇（Losso）或者獨立成分分析（ICA）。這樣特徵的維度會大大減少。再來擬合決策樹模型效果會好。
　　　　3）推薦多用決策樹的可視化（下節會講），同時先限制決策樹的深度（好比最多3層），這樣能夠先觀察下生成的決策樹裏數據的初步擬合狀況，而後再決定是否要增長深度。
　　　　4）在訓練模型先，注意觀察樣本的類別狀況（主要指分類樹），若是類別分佈很是不均勻，就要考慮用class_weight來限制模型過於偏向樣本多的類別。
　　　　5）決策樹的數組使用的是numpy的float32類型，若是訓練數據不是這樣的格式，算法會先作copy再運行。
　　　　6）若是輸入的樣本矩陣是稀疏的，推薦在擬合前調用csc_matrix稀疏化，在預測前調用csr_matrix稀疏化。
1.10.2。迴歸
class sklearn.tree.DecisionTreeRegressor(criterion=’mse’, splitter=’best’, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=None, random_state=None, max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, presort=False)
1.10.3。多輸出問題
多輸出是指目標Y值不止一個，好比輸入一個X，要求輸出cos和sin。
只能使用迴歸，不能使用分類
1.10.4。複雜性
1.10.5。實際使用技巧
1.10.6。算法：ID3，C4.5，樹C5.0和CART
ID3，C4.5，CART的僞代碼，差異，剪枝。sklearn中中默認使用CART，由於CART能夠運用到迴歸中，另外兩個不行。
1.10.7。數學公式
1.10.7.1。分類標準
1.10.7.2。迴歸的標準



1.11。集成方法
1.11.1。bagging
GBDT的子採樣是無放回採樣，而Bagging的子採樣是放回採樣。隨機森林使用的是bagging採樣。一個是boosting派系，它的特色是各個弱學習器之間有依賴關係。另外一種是bagging流派，它的特色是各個弱學習器之間沒有依賴關係，能夠並行擬合。
1.11.2。隨機樹的森林
RF的主要優勢有：
1） 訓練能夠高度並行化，對於大數據時代的大樣本訓練速度有優點。我的以爲這是的最主要的優勢。
2） 因爲能夠隨機選擇決策樹節點劃分特徵，這樣在樣本特徵維度很高的時候，仍然能高效的訓練模型。
3） 在訓練後，能夠給出各個特徵對於輸出的重要性
4） 因爲採用了隨機採樣，訓練出的模型的方差小，泛化能力強。
5） 相對於Boosting系列的Adaboost和GBDT， RF實現比較簡單。
6） 對部分特徵缺失不敏感。
RF的主要缺點有：
1）在某些噪音比較大的樣本集上，RF模型容易陷入過擬合。
2) 取值劃分比較多的特徵容易對RF的決策產生更大的影響，從而影響擬合的模型的效果。
class sklearn.ensemble.RandomForestClassifier(n_estimators=10, criterion=’gini’, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=’auto’, max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, bootstrap=True, oob_score=False, n_jobs=1, random_state=None, verbose=0, warm_start=False, class_weight=None)
class sklearn.ensemble.RandomForestRegressor(n_estimators=10, criterion=’mse’, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=’auto’, max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, bootstrap=True, oob_score=False, n_jobs=1, random_state=None, verbose=0, warm_start=False)
1.11.2.1。隨機森林
1.11.2.2。很是隨機的樹木
1.11.2.3。參數
RF與DT主要的參數差異
*n_estimators: 也就是弱學習器的最大迭代次數，或者說最大的弱學習器的個數。通常來講n_estimators過小，容易欠擬合，n_estimators太大，計算量會太大，而且n_estimators到必定的數量後，再增大n_estimators得到的模型提高會很小，因此通常選擇一個適中的數值。默認是100。在實際調參的過程當中，咱們經常將n_estimators和learning_rate一塊兒考慮。
oob_score :便是否採用袋外樣原本評估模型的好壞。默認識False。我的推薦設置爲True，由於袋外分數反應了一個模型擬合後的泛化能力。
criterion: 即CART樹作劃分時對特徵的評價標準。分類模型和迴歸模型的損失函數是不同的。分類RF對應的CART分類樹默認是基尼係數gini,另外一個可選擇的標準是信息增益。迴歸RF對應的CART迴歸樹默認是均方差mse，另外一個能夠選擇的標準是絕對值差mae。通常來講選擇默認的標準就已經很好的。
1.11.2.4。並行化
1.11.2.5。特徵重要性評價
1.11.2.6。徹底隨機樹嵌入

1.11.3。AdaBoost
class sklearn.ensemble.AdaBoostClassifier(base_estimator=None, n_estimators=50, learning_rate=1.0, algorithm=’SAMME.R’, random_state=None)
class sklearn.ensemble.AdaBoostRegressor(base_estimator=None, n_estimators=50, learning_rate=1.0, loss=’linear’, random_state=None)
base_estimator：
n_estimators：迭代次數，弱分類器個數
learning_rate：步長；在經過在範圍（0.0,1.0）中放縮來限制過擬合的高級參數；限制每一個弱分類器的步長
algorithm：指定算法
estimators_ : list of classifiers：The collection of fitted sub-estimators.
estimator_weights_ : array of floats：Weights for each estimator in the boosted ensemble.
estimator_errors_ : array of floats：Regression error for each estimator in the boosted ensemble.
feature_importances_ : array of shape = [n_features]：The feature importances if supported by the base_estimator.
1.11.3.1。使用
1.11.4。梯度樹提升
GBRT的優勢是：
混合型數據的天然處理（=異構特徵）
預測力
輸出空間異常值的魯棒性（經過強大的損失函數）
GBRT的缺點是：
因爲升壓的順序性，可擴展性幾乎不能並行化。
1.11.4.1。分類
class sklearn.ensemble.GradientBoostingClassifier(loss=’deviance’, learning_rate=0.1, n_estimators=100, subsample=1.0, criterion=’friedman_mse’, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_depth=3, min_impurity_decrease=0.0, min_impurity_split=None, init=None, random_state=None, max_features=None, verbose=0, max_leaf_nodes=None, warm_start=False, presort=’auto’)
1.11.4.2。迴歸
class sklearn.ensemble.GradientBoostingRegressor(loss=’ls’, learning_rate=0.1, n_estimators=100, subsample=1.0, criterion=’friedman_mse’, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_depth=3, min_impurity_decrease=0.0, min_impurity_split=None, init=None, random_state=None, max_features=None, alpha=0.9, verbose=0, max_leaf_nodes=None, warm_start=False, presort=’auto’)
1) n_estimators: 也就是弱學習器的最大迭代次數，或者說最大的弱學習器的個數。通常來講n_estimators過小，容易欠擬合，n_estimators太大，又容易過擬合，通常選擇一個適中的數值。默認是100。在實際調參的過程當中，咱們經常將n_estimators和下面介紹的參數learning_rate一塊兒考慮。
2) learning_rate: 即每一個弱學習器的權重縮減係數νν，也稱做步長，在原理篇的正則化章節咱們也講到了，加上了正則化項，咱們的強學習器的迭代公式爲fk(x)=fk−1(x)+νhk(x)fk(x)=fk−1(x)+νhk(x)。νν的取值範圍爲0<ν≤10<ν≤1。對於一樣的訓練集擬合效果，較小的νν意味着咱們須要更多的弱學習器的迭代次數。一般咱們用步長和迭代最大次數一塊兒來決定算法的擬合效果。因此這兩個參數n_estimators和learning_rate要一塊兒調參。通常來講，能夠從一個小一點的νν開始調參，默認是1。
3) subsample: 即咱們在原理篇的正則化章節講到的子採樣，取值爲(0,1]。注意這裏的子採樣和隨機森林不同，隨機森林使用的是放回抽樣，而這裏是不放回抽樣。若是取值爲1，則所有樣本都使用，等於沒有使用子採樣。若是取值小於1，則只有一部分樣本會去作GBDT的決策樹擬合。選擇小於1的比例能夠減小方差，即防止過擬合，可是會增長樣本擬合的誤差，所以取值不能過低。推薦在[0.5, 0.8]之間，默認是1.0，即不使用子採樣。
4) init: 即咱們的初始化的時候的弱學習器，擬合對應原理篇裏面的f0(x)f0(x)，若是不輸入，則用訓練集樣原本作樣本集的初始化分類迴歸預測。不然用init參數提供的學習器作初始化分類迴歸預測。通常用在咱們對數據有先驗知識，或者以前作過一些擬合的時候，若是沒有的話就不用管這個參數了。
5) loss: 即咱們GBDT算法中的損失函數。分類模型和迴歸模型的損失函數是不同的。
對於分類模型，有對數似然損失函數"deviance"和指數損失函數"exponential"二者輸入選擇。默認是對數似然損失函數"deviance"。在原理篇中對這些分類損失函數有詳細的介紹。通常來講，推薦使用默認的"deviance"。它對二元分離和多元分類各自都有比較好的優化。而指數損失函數等於把咱們帶到了Adaboost算法。
二項式誤差（'deviance'）：二進制分類的負二項對數似然損失函數（提供機率估計）。初始模型由對數優點比給出。
多項式誤差（'deviance'）：用於具備n_classes互斥類的多類分類的負多項式對數似然損失函數 。它提供機率估計。初始模型由每一個類的先驗機率給出。在每一個迭代n_classes 迴歸中，必須構造樹，這樣使得GBRT對於具備大量類的數據集而言效率低下。
指數損失（'exponential'）：與損失函數相同AdaBoostClassifier。較不堅固到錯誤標記的例子比'deviance'; 只能用於二進制分類。
對於迴歸模型，有均方差"ls", 絕對損失"lad", Huber損失"huber"和分位數損失「quantile」。默認是均方差"ls"。通常來講，若是數據的噪音點很少，用默認的均方差"ls"比較好。若是是噪音點較多，則推薦用抗噪音的損失函數"huber"。而若是咱們須要對訓練集進行分段預測的時候，則採用「quantile」。
最小二乘（'ls'）：因爲其優越的計算性質，迴歸的天然選擇。初始模型由目標值的平均值給出。
最小絕對誤差（'lad'）：用於迴歸的強大的損失函數。初始模型由目標值的中值給出。
Huber（'huber'）：另外一個結合最小二乘和最小絕對誤差的強大的損失函數; 用於alpha控制異常值的靈敏度（詳見[F2001]）。
分位數（'quantile'）：分位數迴歸的損失函數。使用指定的位數。該損失函數可用於建立預測間隔（參見梯度加強迴歸的預測間隔）。0 < alpha < 1 
6) alpha：這個參數只有GradientBoostingRegressor有，當咱們使用Huber損失"huber"和分位數損失「quantile」時，須要指定分位數的值。默認是0.9，若是噪音點較多，能夠適當下降這個分位數的值。
1.11.4.3。適合學習能力較弱的學生
warm_start=True，容許您添加更多的估計器到已經適合的模型
1.11.4.4。控制樹的大小
1.11.4.5。數學公式
1.11.4.5.1。損失函數
1.11.4.6。正則化
1.11.4.6.1。收縮
learning_rate
1.11.4.6.2。子採樣
subsample
1.11.4.7。解釋
1.11.4.7.1。特徵的重要性
feature_importances_ : array of shape = [n_features]：The feature importances if supported by the base_estimator.
1.11.4.7.2。部分依賴
from sklearn.ensemble.partial_dependence import plot_partial_dependence？？？？


1.11.5。投票分類器
1.11.5.1。多數類標籤（多數/硬投票）
1.11.5.1.1。使用
1.11.5.2。加權平均機率（軟投票）
1.11.5.3。使用votingclassifier與網格搜索法
1.11.5.3.1。使用



1.12。Multiclass和細粒度的算法
from sklearn.multiclass import OneVsRestClassifier
from sklearn.svm import LinearSVC
1.12.1。細粒度的分類格式
1.12.2。一對其他
OneVsRestClassifier(LinearSVC(random_state=0)).fit(X, y).predict(X)
1.12.2.1。Multiclass學習
1.12.2.2。多標記學習
1.12.3。一對一
OneVsOneClassifier(LinearSVC(random_state=0)).fit(X, y).predict(X)
1.12.3.1。Multiclass學習
1.12.4。糾錯輸出碼
1.12.4.1。Multiclass學習
1.12.5。多輸出迴歸
1.12.6。多分類
1.12.7。分類器鏈



1.13。特徵選擇
1.13.1。去除低方差特徵
1.13.2。單變量的特徵選擇
掛。遞歸特徵消除
1.13.4。使用selectfrommodel特徵選擇
1.13.4.1。基於L1的特徵選擇
1.13.4.2。基於樹的特徵選擇
1.13.5。做爲管道的一部分的特徵選擇



1.14。半監督
1.14.1。標籤傳播
1.15。保序迴歸
1.16。機率校準



1.17。神經網絡模型（監督）
所謂神經網絡的訓練或者是學習，其主要目的在於經過學習算法獲得神經網絡解決指定問題所需的參數，這裏的參數包括各層神經元之間的鏈接權重以及偏置等。



由於做爲算法的設計者（咱們），咱們一般是根據實際問題來構造出網絡結構，參數的肯定則須要神經網絡經過訓練樣本和學習算法來迭代找到最優參數組。

提及神經網絡的學習算法，不得不提其中最傑出、最成功的表明——偏差逆傳播（error BackPropagation，簡稱BP）算法。BP學習算法一般用在最爲普遍使用的多層前饋神經網絡中。

深度學習指的是深度神經網絡模型，通常指網絡層數在三層或者三層以上的神經網絡結構。1.17.1。多層感知器
多層感知器的優勢是：
學習非線性模型的能力。
可以實時學習模型（在線學習）partial_fit。
多層感知器（MLP）的缺點包括：
具備隱層的MLP具備非凸失去函數，其中存在多於一個局部最小值。所以，不一樣的隨機權重初始化可能致使不一樣的驗證精度。
MLP須要調整許多超參數，例如隱藏的神經元，層和迭代的數量。
MLP對特徵縮放很敏感。
1.17.2。分類
class sklearn.neural_network.MLPClassifier(hidden_layer_sizes=(100, ), activation=’relu’, solver=’adam’, alpha=0.0001, batch_size=’auto’, learning_rate=’constant’, learning_rate_init=0.001, power_t=0.5, max_iter=200, shuffle=True, random_state=None, tol=0.0001, verbose=False, warm_start=False, momentum=0.9, nesterovs_momentum=True, early_stopping=False, validation_fraction=0.1, beta_1=0.9, beta_2=0.999, epsilon=1e-08)
1.17.3。迴歸
class sklearn.neural_network.MLPRegressor(hidden_layer_sizes=(100, ), activation=’relu’, solver=’adam’, alpha=0.0001, batch_size=’auto’, learning_rate=’constant’, learning_rate_init=0.001, power_t=0.5, max_iter=200, shuffle=True, random_state=None, tol=0.0001, verbose=False, warm_start=False, momentum=0.9, nesterovs_momentum=True, early_stopping=False, validation_fraction=0.1, beta_1=0.9, beta_2=0.999, epsilon=1e-08)
1.17.4。正則化
1.17.5。算法
1.17.6。複雜性
1.17.7。數學公式
1.17.8。實際使用技巧
多層感知器對特徵縮放很敏感，所以強烈建議您擴展數據。例如，將輸入向量X上的每一個屬性縮放爲[0,1]或[-1，+1]，或將其標準化爲平均值0和方差1.
請注意，必須將相同的 縮放應用於測試集有意義的結果。您可使用StandardScaler標準化。
一種替代和推薦的方法是StandardScaler 在a中使用Pipeline
找到合理的正則化參數最好使用GridSearchCV，一般在範圍內。10.0 ** -np.arange(1, 7)
經驗上，咱們觀察到L-BFGS收斂速度更快，而且對小數據集有更好的解決方案。
然而，對於相對較大的數據集，Adam很是強大。它一般會快速收斂並給出至關不錯的表現。
另外一方面，若是學習率正確調整，SGD具備動量或nesterov的動量，能夠比這兩種算法表現更好。
1.17.9。與warm_start更多的控制



2。無監督學習
2.1。高斯混合模型
2.1.1。高斯混合
2.1.1.1。類高斯利弊
2.1.1.1.1。同意的意見
2.1.1.1.2。欺騙
2.1.1.2。經典高斯混合模型中元件數的選取
2.1.1.3。估計算法指望最大化
2.1.2。變分貝葉斯高斯混合
2.1.2.1。估計算法：變分推理
2.1.2.2。與bayesiangaussianmixture變分推理的利弊
2.1.2.2.1。同意的意見
2.1.2.2.2。欺騙
2.1.2.3。Dirichlet過程
2.2。流形學習
2.2.1。介紹
2.2.2。等距映射
2.2.2.1。複雜性
2.2.3。局部線性嵌入
2.2.3.1。複雜性
2.2.4。改進的局部線性嵌入
2.2.4.1。複雜性
2.2.5。海森eigenmapping
2.2.5.1。複雜性
2.2.6。譜嵌入
2.2.6.1。複雜性
2.2.7。局部切空間排列算法
2.2.7.1。複雜性
2.2.8。多維標度（MDS）
2.2.8.1。度量MDS
2.2.8.2。非度量MDS
2.2.9。t分佈的隨機鄰居嵌入（T-SNE）
2.2.9.1。優化T-SNE
2.2.9.2。巴尼斯的小屋T-SNE
2.2.10。實際使用技巧


2.3。聚類
2.3.1。聚類方法綜述
2.3.2。聚類
class sklearn.cluster.KMeans(n_clusters=8, init=’k-means++’, n_init=10, max_iter=300, tol=0.0001, precompute_distances=’auto’, verbose=0, random_state=None, copy_x=True, n_jobs=1, algorithm=’auto’)
n_clusters:質心數
init：'kmeans++'：將初始化質心（一般）彼此遠離，致使比隨機初始化更好的結果。‘random’表示隨機選初始質點，ndarray是初入一個數組(n_clusters, n_features)，指定了質點
n_init:Number of time the k-means algorithm will be run with different centroid seeds. The final results will be the best output of n_init consecutive runs in terms of inertia.
max_iter:一個運行k-均值算法的最大迭代次數。
precompute_distances：三個可選值，‘auto’，True 或者 False。預計算距離，計算速度更快但佔用更多內存。‘auto’：若是 樣本數乘以聚類數大於 12million 的話則不預計算距離。True：老是預先計算距離。False：永遠不預先計算距離。
當咱們precomputing distances時，將數據中心化會獲得更準確的結果。若是把此參數值設爲True，則原始數據不會被改變。若是是False，則會直接在原始數據
上作修改並在函數返回值時將其還原。可是在計算過程當中因爲有對數據均值的加減運算，因此數據返回後，原始數據和計算前可能會有細小差異。
屬性：
tol：float形，默認值= 1e-4,與inertia結合來肯定收斂條件。
n_jobs：整形數。指定計算所用的進程數。內部原理是同時進行n_init指定次數的計算。若值爲 -1，則用全部的CPU進行運算。若值爲1，則不進行並行運算，這樣的話方便調試。若值小於-1，則用到的CPU數爲(n_cpus + 1 + n_jobs)。所以若是 n_jobs值爲-2，則用到的CPU數爲總CPU數減1。
random_state：整形或 numpy.RandomState 類型，可選用於初始化質心的生成器（generator）。若是值爲一個整數，則肯定一個seed。此參數默認值爲numpy的隨機數生成器。
verbose：整形，默認值=0
copy_x：布爾型，默認值=True
cluster_centers_：向量，質心[n_clusters, n_features]
Labels_:每一個點的分類
inertia_：float形,每一個點到其簇的質心的距離之和。
fit_transform(X[,y]):計算簇並 transform X to cluster-distance space。
transform(X[,y]):將X轉換入cluster-distance 空間。
get_params([deep]):取得估計器的參數。
set_params(**params):爲這個估計器手動設定參數。

缺點：
慣性假定簇是凸的和各向同性的，這並不老是這樣。它對細長的團簇或具備不規則形狀的歧管反應不佳。
慣性不是歸一化度量：咱們只知道較低的值是更好的，零是最優的。可是在很是高維的空間中，歐幾里德的距離每每變得膨脹（這是所謂的「維度詛咒」的一個例子）。 在k-means聚類以前運行諸如PCA的維度下降算法能夠緩解這個問題並加快計算速度。
2.3.2.1。k-均值迷你批
class sklearn.cluster.MiniBatchKMeans(n_clusters=8, init=’k-means++’, max_iter=100, batch_size=100, verbose=0, compute_labels=True, random_state=None, tol=0.0, max_no_improvement=10, init_size=None, n_init=3, reassignment_ratio=0.01)
小批量是輸入數據的子集，在每次訓練迭代中隨機抽樣。
2.3.3。親和傳播
2.3.4。均值漂移
2.3.5。譜聚類算法
2.3.5.1。不一樣標籤分配策略
2.3.6。層次聚類
2.3.6.1。不一樣的連鎖類型：病房，徹底和平均聯動
class sklearn.cluster.AgglomerativeClustering(n_clusters=2, affinity=’euclidean’, memory=None, connectivity=None, compute_full_tree=’auto’, linkage=’ward’, pooling_func=<function mean>)
AgglomerativeClustering: 使用自底向上的聚類方法。
linkage : {「ward」, 「complete」, 「average」}三種聚類準則：complete(maximum) linkage: 兩類間的距離用最遠點距離表示。avarage linkage:平均距離。ward's method: 以組內平方和最小，組間平方和最大爲目的。
affinity：字符串或可調用默認：「euclidean（歐幾里德l2）」度量用於計算聯動。可「歐幾里德」、「語言」、「語言」、「曼哈頓l1」、「餘弦」，或「算」。若是是聯動的「ward」，只有「歐幾里德」是公認的。
當affinity不是歐幾里得氟度量時，推薦使用average。l1距離一般對於稀疏特徵或稀疏噪聲是有利的：即許多特徵都是零，如在使用罕見詞的發生的文本挖掘中。
餘弦距離頗有趣，由於它對信號的全局縮放是不變的。
2.3.6.2。添加鏈接限制
2.3.6.3。不一樣的度量
2.3.7。DBSCAN算法
class sklearn.cluster.DBSCAN(eps=0.5, min_samples=5, metric=’euclidean’, metric_params=None, algorithm=’auto’, leaf_size=30, p=None, n_jobs=1)
eps : float, optional。在同一個街區的兩個樣本之間的最大距離。
min_samples : int, optional核心點區域的最小樣本個數
metric : string, or callable
The metric to use when calculating distance between instances in a feature array. If metric is a string or callable, it must be one of the options allowed by metrics.pairwise.calculate_distance for its metric parameter. If metric is 「precomputed」, X is assumed to be a distance matrix and must be square. X may be a sparse matrix, in which case only 「nonzero」 elements may be considered neighbors for DBSCAN.
New in version 0.17: metric precomputed to accept precomputed sparse matrix.
metric_params : dict, optional
Additional keyword arguments for the metric function.
New in version 0.19.
algorithm : {‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’}, optional
leaf_size : int, optional (default = 30)Leaf size passed to BallTree or cKDTree. 
p : float, optionalThe power of the Minkowski（閔可夫斯基） metric to be used to calculate distance between points.
n_jobs : int, optional (default = 1)The number of parallel jobs to run. If -1, then the number of jobs is set to the number of CPU cores.
屬性：
core_sample_indices_ : array, shape = [n_core_samples]，Indices of core samples.
components_ : array, shape = [n_core_samples, n_features]，Copy of each core sample found by training.
labels_ : array, shape = [n_samples]，Cluster labels for each point in the dataset given to fit(). Noisy samples are given the label 
1.2.3.8。樺木




2.3.9。聚類算法的性能評價
2.3.9.1。調整後的蘭德指數
2.3.9.1.1。優點
2.3.9.1.2。缺點
2.3.9.1.3。數學公式
2.3.9.2。互信息評分
2.3.9.2.1。優點
2.3.9.2.2。缺點
2.3.9.2.3。數學公式
2.3.9.3。均勻性、完整性和v-measure
2.3.9.3.1。優點
2.3.9.3.2。缺點
2.3.9.3.3。數學公式
2.3.9.4。Fowlkes，錦葵評分
2.3.9.4.1。優點
2.3.9.4.2。缺點
2.3.9.5。輪廓係數
2.3.9.5.1。優點
2.3.9.5.2。缺點
2.3.9.6。Calinski Harabaz指數
2.3.9.6.1。優點
2.3.9.6.2。缺點
2.4。雙聚類
2.4.1。譜聚類
2.4.1.1。數學公式
2.4.2。譜聚類
2.4.2.1。數學公式
2.4.3。雙聚類評價



2.5。元件分解信號（矩陣分解問題）
2.5.1。主成分分析（PCA）
2.5.1.1。精確PCA與機率解釋
2.5.1.2。增量PCA
2.5.1.3。使用隨機奇異值分解
2.5.1.4。核的主份量分析
2.5.1.5。稀疏主成分分析（SparsePCA和minibatchsparsepca）
2.5.2。截斷奇異值分解與潛在語義分析
2.5.3。字典學習
2.5.3.1。與預先計算的編碼字典的稀疏
2.5.3.2。泛型字典學習
2.5.3.3。小批量字典學習
2.5.4。因子分析
2.5.5。獨立份量分析（ICA）
2.5.6。非負矩陣分解（NMF或NNMF）
2.5.6.1。NMF的Frobenius範數
2.5.6.2。具備β散度的NMF
2.5.7。潛在狄利克雷分配（LDA）



2.6。協方差估計
2.6.1。經驗協方差
2.6.2。縮水的協方差
2.6.2.1。基本收縮
2.6.2.2。Ledoit Wolf收縮
2.6.2.3。Oracle逼近收縮
2.6.3。稀疏逆協方差
2.6.4。強大的協方差估計
2.6.4.1。最小的Covariance Determinant



2.7。新穎性與異常檢測
2.7.1。新穎性檢測
2.7.2。孤立點檢測
2.7.2.1。橢圓包絡線擬合
2.7.2.2。隔離的森林
2.7.2.4。一類支持向量機與橢圓包絡與隔離的森林與LOF



2.8。密度估計
2.8.1。密度估計：Histograms
2.8.2。核密度估計



2.9。神經網絡模型（無監督）
2.9.1。限制Boltzmann的機器
2.9.1.1。圖形模型與參數化
2.9.1.2。Bernoulli Restricted Boltzmann機器
2.9.1.3。隨機最大似然學習



三.模型選擇與評價
3.1。交叉驗證：評估估計器性能
sklearn.model_selection.train_test_split(*arrays, **options)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
*arrays : sequence of indexables with same length / shape[0]。Allowed inputs are lists, numpy arrays, scipy-sparse matrices or pandas dataframes.
test_size : 若是是float,應該在0.0和1.0之間。若是是int,表明測試樣本的絕對數量。默認是0.25. 
train_size : 與test_size差很少。
random_state : 若是指定，則表示選定了一個隨機種子。只要選這個值，生成的隨機數都同樣。
shuffle : 從新排序打亂樣本。若是shuffle=False，那麼stratify必須是None.
stratify : array-like or None (default is None)。If not None, data is split in a stratified fashion, using this as the class label
3.1.1。計算交叉驗證指標
sklearn.model_selection.cross_val_score(estimator, X, y=None, groups=None, scoring=None, cv=None, n_jobs=1, verbose=0, fit_params=None, pre_dispatch=‘2*n_jobs’)
estimator : 用來fit的算法
X :  須要學習的數組。
y : 目標值
scoring : string, callable or None,默認 None。string或者callable要使用scorer(estimator,X,y)函數。通常都是使用使用的算法自身的score函數，可是若要制定，使用scoring='f1_macro'指定。
cv : 交叉驗證生成器或者迭代器。可選值有None:使用默認3-fold的交叉驗證；integer, 指定fold裏的k；能夠用作交叉驗證生成器的一個對象；一個能產生train/test劃分的迭代器對象,也能夠經過交叉驗證使用迭代器
對於integer/None的輸入，而且算法是一個分類算法，y是對應的類標籤，使用Stratified.其餘狀況使用kfold
from sklearn.model_selection import ShuffleSplit
cv = ShuffleSplit(n_splits=3, test_size=0.3, random_state=0)
cross_val_score(clf, iris.data, iris.target, cv=cv)
3.1.1.1。的cross_validate函數和多指標評價
sklearn.model_selection.cross_validate(estimator, X, y=None, groups=None, scoring=None, cv=None, n_jobs=1, verbose=0, fit_params=None, pre_dispatch=‘2*n_jobs’, return_train_score=True)
返回的結果中這些['test_score', 'fit_time', 'score_time']，
其中test_score能夠是指定經過字典或者列表傳入scoring參數的多種score方式，例如召回率，準確率。
當 return_train_score=True時同時返回測試集的評分，False不返回。默認爲True.
3.1.1.2。經過交叉驗證得到預測
sklearn.model_selection.cross_val_predict(estimator, X, y=None, groups=None, cv=None, n_jobs=1, verbose=0, fit_params=None, pre_dispatch=‘2*n_jobs’, method=’predict’)
返回預測的結果，沒有評分，本身調用評分函數進行評估，如
from sklearn.model_selection import cross_val_predict
predicted = cross_val_predict(clf, iris.data, iris.target, cv=10)
metrics.accuracy_score(iris.target, predicted) 
3.1.2。交叉驗證的迭代器
3.1.3。獨立同分布的數據的交叉驗證迭代器
3.1.3.1。kfold
class sklearn.model_selection.KFold(n_splits=3, shuffle=False, random_state=None)
KFold隨機分幾組
split(X[, y, groups])    Generate indices to split data into training and test set.須要用循環導出每一個分組的索引
class sklearn.model_selection.StratifiedKFold(n_splits=3, shuffle=False, random_state=None)
StratifiedKFold 是一種將數據集中每一類樣本的數據成分，按均等方式拆分的方法。
3.1.3.2。反覆折
class sklearn.model_selection.RepeatedKFold(n_splits=5, n_repeats=10, random_state=None)
3.1.3.3。留一個出去
class sklearn.model_selection.LeaveOneOut
LeaveOneOut() = KFold(n_splits=n) = LeavePOut(p=1)
運用於稀疏數據
3.1.3.4。留下P（LPO）
class sklearn.model_selection.LeavePOut(p)
LeavePOut與LeaveOneOut經過p從完整集合中移除樣本建立全部可能的訓練/測試集很是類似。
3.1.3.5。隨機排列的交叉驗證又名洗牌與分裂
class sklearn.model_selection.ShuffleSplit(n_splits=10, test_size=’default’, train_size=None, random_state=None)
能夠經過明確種子random_state僞隨機數發生器來控制結果的重現性的隨機性。
3.1.4。基於類標籤的交叉驗證迭代器分層。
3.1.4.1。分層折
3.1.4.2。分層隨機分
3.1.5。分組數據的交叉驗證迭代器。
3.1.5.1。組折
model_selection.GroupKFold([n_splits])    K-fold iterator variant with non-overlapping groups.
3.1.5.2。離開一組
class sklearn.model_selection.LeaveOneGroupOut() Leave One Group Out cross-validator
3.1.5.3。使p組離開
class sklearn.model_selection.LeavePGroupsOut(n_groups)    Leave P Group(s) Out cross-validator
3.1.5.4。組隨機分
model_selection.GroupShuffleSplit([…])    Shuffle-Group(s)-Out cross-validation iterator
3.1.6。預約義摺疊分割/驗證集
model_selection.PredefinedSplit(test_fold)    Predefined split cross-validator
3.1.7。時間序列數據交叉驗證
因爲kfold是創建在樣本之間獨立的狀況下，，對時間樣本會有影響，因此要用新的。。。
3.1.7.1。時間序列分割
class sklearn.model_selection.TimeSeriesSplit(n_splits=3, max_train_size=None)    Time Series cross-validator
3.1.8。洗牌的說明
3.1.9。交叉驗證與模型選擇




3.2。調整估計量的超參數
它實際上是一種貪心算法：拿當前對模型影響最大的參數調優，直到最優化；再拿下一個影響最大的參數調優，如此下去，直到全部的參數調整完畢。這個方法的缺點就是可能會調到局部最優而不是全局最優，可是省時間省力，巨大的優點面前，仍是試一試吧，後續能夠再拿bagging再優化。
3.2.1。詳盡的網格搜索
class sklearn.model_selection.GridSearchCV(estimator, param_grid, scoring=None, fit_params=None, n_jobs=1, iid=True, refit=True, cv=None, verbose=0, pre_dispatch=‘2*n_jobs’, error_score=’raise’, return_train_score=True)
tuned_parameters = [{'kernel': ['rbf'], 'gamma': [1e-3, 1e-4],
                     'C': [1, 10, 100, 1000]},
                    {'kernel': ['linear'], 'C': [1, 10, 100, 1000]}]
scores = ['precision', 'recall']
for score in scores:
    print("# Tuning hyper-parameters for %s" % score)
    clf = GridSearchCV(SVC(), tuned_parameters, cv=5,
                       scoring='%s_macro' % score)
    clf.fit(X_train, y_train)
參數：
verbose：日誌冗長度，int：冗長度，0：不輸出訓練過程，1：偶爾輸出，>1：對每一個子模型都輸出。
n_jobs: 並行數，int：個數,-1：跟CPU核數一致, 1:默認值。
pre_dispatch：指定總共分發的並行任務數。當n_jobs大於1時，數據將在每一個運行點進行復制，這可能致使OOM，而設置pre_dispatch參數，則能夠預先劃分總共的job數量，使數據最多被複制pre_dispatch次
屬性有：
cv_results_：
{
'param_kernel': masked_array(data = ['poly', 'poly', 'rbf', 'rbf'],mask = [False False False False]...)
'param_gamma': masked_array(data = [-- -- 0.1 0.2],mask = [ True  True False False]...),
'param_degree': masked_array(data = [2.0 3.0 -- --], mask = [False False  True  True]...),
'split0_test_score'  : [0.8, 0.7, 0.8, 0.9],
'split1_test_score'  : [0.82, 0.5, 0.7, 0.78],
'mean_test_score'    : [0.81, 0.60, 0.75, 0.82],
'std_test_score'     : [0.02, 0.01, 0.03, 0.03],
'rank_test_score'    : [2, 4, 3, 1],
'split0_train_score' : [0.8, 0.9, 0.7],
'split1_train_score' : [0.82, 0.5, 0.7],
'mean_train_score'   : [0.81, 0.7, 0.7],
'std_train_score'    : [0.03, 0.03, 0.04],
'mean_fit_time'      : [0.73, 0.63, 0.43, 0.49],
'std_fit_time'       : [0.01, 0.02, 0.01, 0.01],
'mean_score_time'    : [0.007, 0.06, 0.04, 0.04],
'std_score_time'     : [0.001, 0.002, 0.003, 0.005],
'params'             : [{'kernel': 'poly', 'degree': 2}, ...],
}
best_estimator_ : 
best_score_ : 
best_params_ :
best_index_ : 
scorer_ :
n_splits_ :
3.2.2。隨機參數的優化
class sklearn.model_selection.RandomizedSearchCV(estimator, param_distributions, n_iter=10, scoring=None, fit_params=None, n_jobs=1, iid=True, refit=True, cv=None, verbose=0, pre_dispatch=‘2*n_jobs’, random_state=None, error_score=’raise’, return_train_score=True)
3.2.3。參數搜索的技巧
3.2.3.1。指定一個客觀度量
scoring
3.2.3.2。指定多個評估指標
refit :默認爲True,程序將會以交叉驗證訓練集獲得的最佳參數，從新對全部可用的訓練集與開發集進行，做爲最終用於性能評估的最佳模型參數。即在搜索參數結束後，用最佳參數結果再次fit一遍所有數據集。
gs = GridSearchCV(DecisionTreeClassifier(random_state=42),
                  param_grid={'min_samples_split': range(2, 403, 10)},
                  scoring=scoring, cv=5, refit='AUC')
當指定多個度量，改裝參數必須設置爲公制（字符串），best_params_將被發現，用來建造best_estimator_對整個數據集。若是搜索不該改裝，將改裝=假。離開改裝爲默認值都會產生一個錯誤時，使用多個度量。
3.2.3.3。綜合估計和參數空間
pipeline與gridsearchcv一塊兒使用
pipe = Pipeline([('reduce_dim', PCA()),('classify', LinearSVC())])
N_FEATURES_OPTIONS = [2, 4, 8]
C_OPTIONS = [1, 10, 100, 1000]
param_grid = [{'reduce_dim': [PCA(iterated_power=7), NMF()],'reduce_dim__n_components': N_FEATURES_OPTIONS,'classify__C': C_OPTIONS},
                {'reduce_dim': [SelectKBest(chi2)],'reduce_dim__k': N_FEATURES_OPTIONS,'classify__C': C_OPTIONS}]
#reduce_dim就代指pca，reduce_dim_n_componets:pca.n_componets,classfiy也是這麼理解。參數要與管道命名對應起來。
grid = GridSearchCV(pipe, cv=3, n_jobs=2, param_grid=param_grid)
3.2.3.4 模型選擇：開發和評估
3.2.3.5。並行
n_jobs=-1
n_jobs : int, default=1  Number of jobs to run in parallel.
3.2.3.6。魯棒性的失敗
error_score=0 (or =np.NaN)
error_score : ‘raise’ (default) or numeric
Value to assign to the score if an error occurs in estimator fitting. If set to ‘raise’, the error is raised. If a numeric value is given, FitFailedWarning is raised. This parameter does not affect the refit step, which will always raise the error.
3.2.4。蠻力參數搜索的替代方案
3.2.4.1。模型特異交叉驗證
用這個pipeline和ridgeCV函數就不用使用gridsearchcv與linear_model.Ridge()
Pipeline([
            ('poly', PolynomialFeatures()),
            ('linear', RidgeCV(alphas=np.logspace(-3, 2, 50), fit_intercept=False))]),
3.2.4.1.1。sklearn.linear_model.elasticnetcv  彈性網模型沿正則化路徑迭代擬合
class sklearn.linear_model.ElasticNetCV(l1_ratio=0.5, eps=0.001, n_alphas=100, alphas=None, fit_intercept=True, normalize=False, precompute=’auto’, max_iter=1000, tol=0.0001, cv=None, copy_X=True, verbose=0, n_jobs=1, positive=False, random_state=None, selection=’cyclic’)
正則化項:爲了防止損失函數過擬合的問題，通常會在損失函數中加上正則化項,增長模型的泛化能力
損失函數：J(θ)=1/2m(Xθ−Y)T(Xθ−Y)+αρ||θ||1+α(1−ρ)/2||θ||22  其中α爲正則化超參數，ρ爲範數權重超參數
alphas=np.logspace(-3, 2, 50), l1_ratio=[.1, .5, .7, .9, .95, .99, 1] ElasticNetCV會從中選出最優的 a和p
ElasticNetCV類對超參數a和p使用交叉驗證，幫助咱們選擇合適的a和p
使用場景:ElasticNetCV類在咱們發現用Lasso迴歸太過(太多特徵被稀疏爲0),而Ridge迴歸也正則化的不夠(迴歸係數衰減太慢)的時候
ElasticNet 是一種使用L1和L2先驗做爲正則化矩陣的線性迴歸模型.這種組合用於只有不多的權重非零的稀疏模型，好比:class:Lasso, 可是又能保持:class:Ridge 的正則化屬性.咱們可使用 l1_ratio 參數來調節L1和L2的凸組合(一類特殊的線性組合)。
當多個特徵和另外一個特徵相關的時候彈性網絡很是有用。Lasso 傾向於隨機選擇其中一個，而彈性網絡更傾向於選擇兩個.
在實踐中，Lasso 和 Ridge 之間權衡的一個優點是它容許在循環過程（Under rotate）中繼承 Ridge 的穩定性.

3.2.4.1.2。sklearn.linear_model.larscv 交叉驗證的最小二乘迴歸模型
class sklearn.linear_model.LarsCV(fit_intercept=True, verbose=False, max_iter=500, normalize=True, precompute=’auto’, cv=None, max_n_alphas=1000, n_jobs=1, eps=2.2204460492503131e-16, copy_X=True, positive=False)

3.2.4.1.3。sklearn.linear_model.lassocv 拉索線性模型，沿正則化路徑迭代擬合（座標降低）
class sklearn.linear_model.LassoCV(eps=0.001, n_alphas=100, alphas=None, fit_intercept=True, normalize=False, precompute=’auto’, max_iter=1000, tol=0.0001, copy_X=True, cv=None, verbose=False, n_jobs=1, positive=False, random_state=None, selection=’cyclic’)
損失函數:J(θ)=1/2m(Xθ−Y)T(Xθ−Y)+α||θ||1 線性迴歸LineaRegression的損失函數+L1（1範式的正則化項）)
Lasso迴歸可使得一些特徵的係數變小,甚至還使一些絕對值較小的係數直接變爲0，從而加強模型的泛化能力
使用場景:對於高緯的特徵數據,尤爲是線性關係是稀疏的，就採用Lasso迴歸,或者是要在一堆特徵裏面找出主要的特徵，那麼
Lasso迴歸更是首選了
3.2.4.1.3.1。例子中使用sklearn.linear_model.lassocv

3.2.4.1.4。sklearn.linear_model.lassolarscv 使用LARS算法進行交叉驗證的Lasso（最小二乘法）
class sklearn.linear_model.LassoLarsCV(fit_intercept=True, verbose=False, max_iter=500, normalize=True, precompute=’auto’, cv=None, max_n_alphas=1000, n_jobs=1, eps=2.2204460492503131e-16, copy_X=True, positive=False)
3.2.4.1.4.1。例子中使用sklearn.linear_model.lassolarscv

3.2.4.1.5。sklearn.linear_model.logisticregressioncv Logistic迴歸CV（又名logit，MaxEnt）分類器。
class sklearn.linear_model.LogisticRegressionCV(Cs=10, fit_intercept=True, cv=None, dual=False, penalty=’l2’, scoring=None, solver=’lbfgs’, tol=0.0001, max_iter=100, class_weight=None, n_jobs=1, verbose=0, refit=True, intercept_scaling=1.0, multi_class=’ovr’, random_state=None)
Cs:正則化參數，其他參照logisticregression
3.2.4.1.6。sklearn.linear_model.multitaskelasticnetcv 多任務L1 / L2 ElasticNet內置交叉驗證。

3.2.4.1.7。sklearn.linear_model.multitasklassocv 多任務L1 / L2 Lasso內置交叉驗證。

3.2.4.1.8。sklearn.linear_model.orthogonalmatchingpursuitcv 交叉驗證的正交匹配追蹤模型（OMP）
3.2.4.1.8.1。例子中使用sklearn.linear_model.orthogonalmatchingpursuitcv

3.2.4.1.9。sklearn.linear_model.ridgecv 裏奇迴歸與內置交叉驗證。
Ridge迴歸(嶺迴歸)損失函數的表達形式：J(θ)=1/2(Xθ−Y)T(Xθ−Y)+1/2α||θ||22(線性迴歸LineaRegression的損失函數+L2（2範式的正則化項）)
a爲超參數 alphas=np.logspace(-3, 2, 50) 從給定的超參數a中選擇一個最優的,logspace用於建立等比數列 本例中 開始點爲10的-3次冪,結束點10的2次冪,元素個數爲
50,而且從這50個數中選擇一個最優的超參數
linspace建立等差數列
Ridge迴歸中超參數a和迴歸係數θ的關係,a越大，正則項懲罰的就越厲害，獲得的迴歸係數θ就越小,最終趨近與0
若是a越小,即正則化項越小，那麼迴歸係數θ就愈來愈接近於普通的線性迴歸係數
使用場景:只要數據線性相關，用LinearRegression擬合的不是很好，須要正則化，能夠考慮使用RidgeCV迴歸,
如何輸入特徵的維度很高,並且是稀疏線性關係的話， RidgeCV就不太合適,考慮使用Lasso迴歸類家族
3.2.4.1.9.1。例子中使用sklearn.linear_model.ridgecv

3.2.4.1.10。sklearn.linear_model.ridgeclassifiercv 裏奇分類器內置交叉驗證。

3.2.4.2。信息準則
3.2.4.2.1。sklearn.linear_model.lassolarsic Lasso模型適合Lars使用Aikike信息標準（AIC）或貝葉斯信息標準（BIC）進行型號選擇
class sklearn.linear_model.LassoLarsIC(criterion=’aic’/'bic', fit_intercept=True, verbose=False, normalize=True, precompute=’auto’, max_iter=500, eps=2.2204460492503131e-16, copy_X=True, positive=False)
(1 / (2 * n_samples)) * ||y - Xw||^2_2 + alpha * ||w||_1
3.2.4.2.1.1。例子中使用sklearn.linear_model.lassolarsic

3.2.4.3。袋外估計
3.2.4.3.1。sklearn.ensemble.randomforestclassifier 隨機森林分類器
3.2.4.3.1.1。例子中使用sklearn.ensemble.randomforestclassifier

3.2.4.3.2。sklearn.ensemble.randomforestregressor 隨機森林迴歸。
3.2.4.3.2.1。例子中使用sklearn.ensemble.randomforestregressor

3.2.4.3.3。sklearn.ensemble.extratreesclassifier 一個額外的樹分類器。
class sklearn.ensemble.ExtraTreesClassifier(n_estimators=10, criterion=’gini’, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=’auto’, max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, bootstrap=False, oob_score=False, n_jobs=1, random_state=None, verbose=0, warm_start=False, class_weight=None)
3.2.4.3.3.1。例子中使用sklearn.ensemble.extratreesclassifier

3.2.4.3.4。sklearn.ensemble.extratreesregressor 一個額外的樹木迴歸。
3.2.4.3.4.1。例子中使用sklearn.ensemble.extratreesregressor

3.2.4.3.5。sklearn.ensemble.gradientboostingclassifier 梯度提高分類。
3.2.4.3.5.1。例子中使用sklearn.ensemble.gradientboostingclassifier

3.2.4.3.6。sklearn.ensemble.gradientboostingregressor 漸變提高迴歸。
3.2.4.3.6.1。例子中使用sklearn.ensemble.gradientboostingregressor

3D圖
import  numpy  as  np    
from    mpl_toolkits.mplot3d  import Axes3D    
from pylab import  *    
fig=figure()    
ax=Axes3D(fig)    
x=np.arange(-4,4,0.1)    
y=np.arange(-4,4,0.1)    
x,y=np.meshgrid(x,y)    
R=np.sqrt(x**2+y**2)    
z=np.sin(R)    
ax.plot_surface(x,y,z,rstride=1,cstride=1,cmap='hot')    
show()  

3.3。模型評估：量化預測的質量
3.3.1。評分參數：定義模型評估規則
使用scoring指定
3.3.1.1。常見的狀況：預約義值
3.3.1.2。從度量函數定義評分策略
sklearn.metrics.make_scorer(score_func, greater_is_better=True, needs_proba=False, needs_threshold=False, **kwargs)
from sklearn.metrics import fbeta_score, make_scorer
ftwo_scorer = make_scorer(fbeta_score, beta=2)
grid = GridSearchCV(LinearSVC(), param_grid={'C': [1, 10]}, scoring=ftwo_scorer)
3.3.1.3。實現本身的評分對象
3.3.1.4。多指標評價
scoring = ['accuracy', 'precision']
from sklearn.metrics import accuracy_score
scoring = {'accuracy': make_scorer(accuracy_score),
           'prec': 'precision'}
3.3.2。分類指標
3.3.2.1設施上。從二進制到多細粒度
3.3.2.2。準確度評分
3.3.2.3。科恩的Kappa
3.3.2.4。混淆矩陣
sklearn.metrics.confusion_matrix(y_true, y_pred, labels=None, sample_weight=None)
3.3.2.5。分類報告
sklearn.metrics.classification_report(y_true, y_pred, labels=None, target_names=None, sample_weight=None, digits=2)
3.3.2.6。Hamming 損失
3.3.2.7。Jaccard類似係數評分
3.3.2.8。精度，召回和F-措施
分類準確率分數是指全部分類正確的百分比。分類準確率這一衡量分類器的標準比較容易理解，可是它不能告訴你響應值的潛在分佈，而且它也不能告訴你分類器犯錯的類型。
形式：
sklearn.metrics.accuracy_score(y_true, y_pred, normalize=True, sample_weight=None)
normalize：默認值爲True，返回正確分類的比例；若是爲False，返回正確分類的樣本數
klearn.metrics.recall_score(y_true, y_pred, labels=None, pos_label=1,average='binary', sample_weight=None)
參數average : string, [None, ‘micro’, ‘macro’(default), ‘samples’, ‘weighted’]
將一個二分類matrics拓展到多分類或多標籤問題時，咱們能夠將數據當作多個二分類問題的集合，每一個類都是一個二分類。接着，咱們能夠經過跨多個分類計算每一個二分類metrics得分的均值，這在一些狀況下頗有用。你可使用average參數來指定。
macro：計算二分類metrics的均值，爲每一個類給出相同權重的分值。當小類很重要時會出問題，由於該macro-averging方法是對性能的平均。另外一方面，該方法假設全部分類都是同樣重要的，所以macro-averaging方法會對小類的性能影響很大。
weighted:對於不均衡數量的類來講，計算二分類metrics的平均，經過在每一個類的score上進行加權實現。
micro：給出了每一個樣本類以及它對整個metrics的貢獻的pair（sample-weight），而非對整個類的metrics求和，它會每一個類的metrics上的權重及因子進行求和，來計算整個份額。Micro-averaging方法在多標籤（multilabel）問題中設置，包含多分類，此時，大類將被忽略。
samples：應用在multilabel問題上。它不會計算每一個類，相反，它會在評估數據中，經過計算真實類和預測類的差別的metrics，來求平均（sample_weight-weighted）
average：average=None將返回一個數組，它包含了每一個類的得分.
3.3.2.8.1。二分類
只限於二分類單標籤分類問題的評估指標
matthews_corrcoef(y_true,y_pred[],...):計算二元分類中的Matthews相關係數（MCC）
precision_recall_curve(y_true,probas_pred)：在不一樣的機率閾值下計算precision-recall點，造成曲線
roc_curve(y_true,y_score[,pos_label,...]):計算ROC曲線
可用於二分類多標籤分類問題的評估指標
average_precision_score(y_true,y_score[,...]) 計算預測得分的平均精度（mAP）
roc_auc_score(y_true,y_score[,average,...])計算預測得分的AUC值
3.3.2.8.2。Multiclass和細粒度的分類
可用於多分類問題的評估指標（紫色的可用於多標籤分類問題）
cohen_kappa_score(y1,y2[,labels,weights])
confusion_matrix(y_true,y_pred[,labels,...])
hinge_loss(y_true,pred_decision[,labels,...])
//accuracy_score(y_true,y_pred[,normalize,...])
classification_report(y_true,y_pred[,...])
f1_score(y_true,y_pres[,labels,...])
fbeta_score(y_true,,y_pres,beta[,labels,...])
hamming_loss(y_true,y_pres[,labels,...])
jaccard_similarity_score(y_true,y_pres[,...])
log_loss(y_true,y_pres[,eps,normalize,...])
 zero_one_loss(y_true,y_pres[,normalize,...])
precision_recall_fsconfe_support(y_true,y_pres)
3.3.2.9。鉸鏈損失
3.3.2.10。日誌丟失
3.3.2.11。馬修斯相關係數
3.3.2.12。接收機工做特性（ROC）
ROC曲線指受試者工做特徵曲線/接收器操做特性(receiver operating characteristic，ROC)曲線,是反映靈敏性和特效性連續變量的綜合指標,是用構圖法揭示敏感性和特異性的相互關係，它經過將連續變量設定出多個不一樣的臨界值，從而計算出一系列敏感性和特異性。ROC曲線是根據一系列不一樣的二分類方式（分界值或決定閾），以真正例率（也就是靈敏度）（True Positive Rate,TPR）爲縱座標，假正例率（1-特效性）（False Positive Rate,FPR）爲橫座標繪製的曲線。
ROC觀察模型正確地識別正例的比例與模型錯誤地把負例數據識別成正例的比例之間的權衡。TPR的增長以FPR的增長爲代價。ROC曲線下的面積是模型準確率的度量，AUC（Area under roccurve）。
縱座標：真正率（True Positive Rate , TPR）或靈敏度（sensitivity）
TPR = TP /（TP + FN）  （正樣本預測結果數 / 正樣本實際數）
橫座標：假正率（False Positive Rate , FPR）
FPR = FP /（FP + TN） （被預測爲正的負樣本結果數 /負樣本實際數）
形式：
sklearn.metrics.roc_curve(y_true,y_score, pos_label=None, sample_weight=None, drop_intermediate=True)
該函數返回這三個變量：fpr,tpr,和閾值thresholds;
這裏理解thresholds:
分類器的一個重要功能「機率輸出」，即表示分類器認爲某個樣本具備多大的機率屬於正樣本（或負樣本）。
「Score」表示每一個測試樣本屬於正樣本的機率。
接下來，咱們從高到低，依次將「Score」值做爲閾值threshold，當測試樣本屬於正樣本的機率大於或等於這個threshold時，咱們認爲它爲正樣本，不然爲負樣本。每次選取一個不一樣的threshold，咱們就能夠獲得一組FPR和TPR，即ROC曲線上的一點。當咱們將threshold設置爲1和0時，分別能夠獲得ROC曲線上的(0,0)和(1,1)兩個點。將這些(FPR,TPR)對鏈接起來，就獲得了ROC曲線。當threshold取值越多，ROC曲線越平滑。其實，咱們並不必定要獲得每一個測試樣本是正樣本的機率值，只要獲得這個分類器對該測試樣本的「評分值」便可（評分值並不必定在(0,1)區間）。評分越高，表示分類器越確定地認爲這個測試樣本是正樣本，並且同時使用各個評分值做爲threshold。我認爲將評分值轉化爲機率更易於理解一些。
3.3.2.13。零損失
3.3.2.14。蒺藜分數損失
3.3.3。細粒度的排序指標
3.3.3.1。覆蓋偏差
3.3.3.2。標號排序平均精度
3.3.3.3。排名損失
3.3.4。迴歸指標
3.3.4.1。解釋方差分
3.3.4.2。平均絕對偏差
3.3.4.3。均方偏差
3.3.4.4。均方對數偏差
3.3.4.5。平均絕對偏差
3.3.4.6。R²評分系數的測定
3.3.5。聚類度量
3.3.6。虛擬的估計
太多太雜了，要用的時候在再說吧
損失函數：
hinge_loss,hamming_loss,log_loss,zero_one_loss,brier_score_loss

3.4。模型的持久性
3.4.1。持久性的例子
3.4.2。安全性和可維護性限制

3.5。驗證曲線：繪製評分以評估模型
3.5.1。驗證曲線
sklearn.model_selection.validation_curve(estimator, X, y, param_name, param_range, groups=None, cv=None, scoring=None, n_jobs=1, pre_dispatch=’all’, verbose=0)
3.5.2。學習曲線
sklearn.model_selection.learning_curve(estimator, X, y, groups=None, train_sizes=array([ 0.1, 0.33, 0.55, 0.78, 1. ]), cv=None, scoring=None, exploit_incremental_learning=False, n_jobs=1, pre_dispatch=’all’, verbose=0, shuffle=False, random_state=None)
看筆記

4。數據變換
4.1。管道和featureunion：結合估計
class sklearn.pipeline.Pipeline(steps, memory=None)
estimators = [('reduce_dim', PCA()), ('clf', SVC())]
pipe = Pipeline(estimators)
sklearn.pipeline.make_pipeline(*steps, **kwargs)
make_pipeline(Binarizer(), MultinomialNB()) 差異是後者自動填寫step的名稱
from sklearn.pipeline import make_pipeline
clf = make_pipeline(preprocessing.StandardScaler(), svm.SVC(C=1))
cross_val_score(clf, iris.data, iris.target, cv=cv)
4.1.1。管道：連接估計
4.1.1.1。使用
from sklearn.linear_model import LogisticRegression
param_grid = dict(reduce_dim=[None, PCA(5), PCA(10)],
                  clf=[SVC(), LogisticRegression()],
                  clf__C=[0.1, 10, 100])
grid_search = GridSearchCV(pipe, param_grid=param_grid)
4.1.1.2。筆記
在管道上調用適配與依次調用每一個估計器的擬合相同，轉換輸入並將其傳遞到下一步。 
流水線具備管道中最後一個估計器的全部方法，即若是最後一個估計器是分類器，則能夠將流水線用做分類器。
 若是最後一個估計器是一個變壓器，那麼管道也是如此
4.1.1.3。緩存變壓器：避免重複計算
from tempfile import mkdtemp
from shutil import rmtree
pca1 = PCA()
svm1 = SVC()
cachedir = mkdtemp()
pipe = Pipeline([('reduce_dim', pca1), ('clf', svm1)])
pipe.fit(digits.data, digits.target)
# The pca instance can be inspected directly
print(pca1.components_) 
rmtree(cachedir)
注意：在未使用cache的狀況下，能夠直接使用pca1訪問實例。使用cache後必須使用pipe.named_steps['reduce_dim'].components_
4.1.2。featureunion：複合特徵空間
4.1.2.1。使用
跟pipe差很少，能夠與pipe公用建立更佳的管道



4.2。特徵提取
4.2.1。加載特徵詞典  將dict類型的list數據，轉換成numpy array
class sklearn.feature_extraction.DictVectorizer(dtype=<class ‘numpy.float64’>, separator=’=’, sparse=True, sort=True)
fit(X[, y])    Learn a list of feature name -> indices mappings.
fit_transform(X[, y])    Learn a list of feature name -> indices mappings and transform X.
                          fit_transform(measurements).toarray()
get_feature_names()    Returns a list of feature names, ordered by their indices.
get_params([deep])    Get parameters for this estimator.
inverse_transform(X[, dict_type])    Transform array or sparse matrix X back to feature mappings.
restrict(support[, indices])    Restrict the features to those in support using feature selection.
set_params(**params)    Set the parameters of this estimator.
transform(X)    Transform feature->value dicts to array or sparse matrix.
4.2.2。特徵哈希 特徵哈希，至關於一種降維技巧
class sklearn.feature_extraction.FeatureHasher(n_features=1048576, input_type=’dict’, dtype=<class ‘numpy.float64’>, alternate_sign=True, non_negative=False)
4.2.2.1。實施細則

4.2.3。文本特徵提取 
4.2.3.1。詞語表達袋
4.2.3.2。稀疏
4.2.3.3。常見的矢量化，使用 將文本轉換爲每一個詞出現的個數的向量
class sklearn.feature_extraction.text.CountVectorizer(input=’content’, encoding=’utf-8’, decode_error=’strict’, strip_accents=None, lowercase=True, preprocessor=None, tokenizer=None, stop_words=None, token_pattern='(?u)\b\w\w+\b', ngram_range=(1, 1), analyzer=’word’, max_df=1.0, min_df=1, max_features=None, vocabulary=None, binary=False, dtype=<class ‘numpy.int64’>)
ngram_range： tuple (min_n, max_n)，連在一塊兒的的詞彙的個數範圍
token_pattern：分詞的正則表達式
min_df:最小的詞頻，過濾出現次數少的詞彙
from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer()
corpus = [
    'This is the first document.',
    'This is the second second document.',
    'And the third one.',
    'Is this the first document?',
]
X = vectorizer.fit_transform(corpus)
In[53]:vectorizer.vocabulary_
Out[53]: 
{'and': 0,
 'document': 1,
 'first': 2,
 'is': 3,
 'one': 4,
 'second': 5,
 'the': 6,
 'third': 7,
 'this': 8}

On[54]:X.toarray()
Out[54]: 
array([[0, 1, 1, ..., 1, 0, 1],
       [0, 1, 0, ..., 1, 0, 1],
       [1, 0, 0, ..., 1, 1, 0],
       [0, 1, 1, ..., 1, 0, 1]], dtype=int64)

build_analyzer()    Return a callable that handles preprocessing and tokenization
build_preprocessor()    Return a function to preprocess the text before tokenization
build_tokenizer()    Return a function that splits a string into a sequence of tokens
decode(doc)    Decode the input into a string of unicode symbols
fit(raw_documents[, y])    Learn a vocabulary dictionary of all tokens in the raw documents.
fit_transform(raw_documents[, y])    Learn the vocabulary dictionary and return term-document matrix.
get_feature_names()    Array mapping from feature integer indices to feature name
get_params([deep])    Get parameters for this estimator.
get_stop_words()    Build or fetch the effective stop words list
inverse_transform(X)    Return terms per document with nonzero entries in X.
set_params(**params)    Set the parameters of this estimator.
transform(raw_documents)    Transform documents to document-term matrix.
4.2.3.4。術語加權 將文本轉換爲tfidf值的向量
class sklearn.feature_extraction.text.TfidfTransformer(norm=’l2’, use_idf=True, smooth_idf=True, sublinear_tf=False)
fit_transform(CountVectorizer.fit_transform.toarray())
class sklearn.feature_extraction.text.TfidfVectorizer(input=’content’, encoding=’utf-8’, decode_error=’strict’, strip_accents=None, lowercase=True, preprocessor=None, tokenizer=None, analyzer=’word’, stop_words=None, token_pattern='(?u)\b\w\w+\b', ngram_range=(1, 1), max_df=1.0, min_df=1, max_features=None, vocabulary=None, binary=False, dtype=<class ‘numpy.int64’>, norm=’l2’, use_idf=True, smooth_idf=True, sublinear_tf=False)
TfidfVectorizer綜合了TfidfTransformer和CountVectorizer
4.2.3.5。解碼的文本文件
chardet  
4.2.3.6。應用與實例
4.2.3.7。詞袋錶徵的侷限性
4.2.3.8。矢量化大型文本語料庫與哈希的把戲
4.2.3.9。執行的核心尺度與HashingVectorizer 文本的特徵哈希
4.2.3.10。自定義矢量類

4.2.4。圖像特徵提取
4.2.4.1。補丁提取
4.2.4.2。圖像連通圖



4.3。數據的預處理
4.3.1。標準化，或均值去除和方差縮放
sklearn.preprocessing.scale(X, axis=0, with_mean=True, with_std=True, copy=True)
class sklearn.preprocessing.StandardScaler(copy=True, with_mean=True, with_std=True)
4.3.1.1。縮放範圍的特徵
class sklearn.preprocessing.MinMaxScaler(feature_range=(0, 1), copy=True)
class sklearn.preprocessing.MaxAbsScaler(copy=True)
4.3.1.2。縮放數據稀疏
sklearn.preprocessing.maxabs_scale(X, axis=0, copy=True)
4.3.1.3。離羣數據縮放
sklearn.preprocessing.robust_scale(X, axis=0, with_centering=True, with_scaling=True, quantile_range=(25.0, 75.0), copy=True)
class sklearn.preprocessing.RobustScaler(with_centering=True, with_scaling=True, quantile_range=(25.0, 75.0), copy=True)
4.3.1.4。圍繞核矩陣
class sklearn.preprocessing.KernelCenterer
4.3.2。非線性變換
class sklearn.preprocessing.QuantileTransformer(n_quantiles=1000, output_distribution=’uniform’, ignore_implicit_zeros=False, subsample=100000, random_state=None, copy=True)
sklearn.preprocessing.quantile_transform(X, axis=0, n_quantiles=1000, output_distribution=’uniform’, ignore_implicit_zeros=False, subsample=100000, random_state=None, copy=False)
4.3.3。歸一化
class sklearn.preprocessing.Normalizer(norm=’l2’, copy=True)
sklearn.preprocessing.normalize(X, norm=’l2’, axis=1, copy=True, return_norm=False)
4.3.4。二值化
4.3.4.1。特徵二值化
class sklearn.preprocessing.Binarizer(threshold=0.0, copy=True)
4.3.5。編碼的分類特徵
class sklearn.preprocessing.OneHotEncoder(n_values=’auto’, categorical_features=’all’, dtype=<class ‘numpy.float64’>, sparse=True, handle_unknown=’error’)
4.3.6。缺失值插補
class sklearn.preprocessing.Imputer(missing_values=’NaN’, strategy=’mean’, axis=0, verbose=0, copy=True)
strategy : string, optional (default=」mean」)
    The imputation strategy.
    If 「mean」, then replace missing values using the mean along the axis.
    If 「median」, then replace missing values using the median along the axis.
    If 「most_frequent」, then replace missing using the most frequent value along the axis.
copy : boolean, optional (default=True)
    If True, a copy of X will be created. If False, imputation will be done in-place whenever possible. Note that, in the following cases, a new copy will always be made, even if copy=False:
    If X is not an array of floating values;
    If X is sparse and missing_values=0;
    If axis=0 and X is encoded as a CSR matrix;
    If axis=1 and X is encoded as a CSC matrix.
4.3.7。生成多項式的特徵
class sklearn.preprocessing.PolynomialFeatures(degree=2, interaction_only=False, include_bias=True)
degree=2：[1, a, b, a^2, ab, b^2].
interaction_only=True：沒有a^2,b^2.本身不跟本身乘
4.3.8。定製變壓器
class sklearn.preprocessing.FunctionTransformer(func=None, inverse_func=None, validate=True, accept_sparse=False, pass_y=’deprecated’, kw_args=None, inv_kw_args=None)

4.4。無監督降維
4.4.1。主成分分析
class sklearn.decomposition.PCA(n_components=None, copy=True, whiten=False, svd_solver=’auto’, tol=0.0, iterated_power=’auto’, random_state=None)[source]
4.4.2。隨機映射
4.4.3。特徵羣
4.5。隨機投影
4.5.1。Johnson Lindenstrauss引理
4.5.2。高斯隨機投影
4.5.3。稀疏隨機投影
4.6。核近似
4.6.1。對於核近似奈斯特龍的方法
4.6.2。徑向基函數核
4.6.3。加性Chi Squared Kernel
4.6.4。歪斜Chi Squared Kernel
4.6.5。數學細節



4.7。成對度量、親和度和核
4.7.1。餘弦類似度
4.7.2。線性核
sklearn.metrics.pairwise.linear_kernel(X, Y=None)
svm.SVC(kernel='linear', C=C)
svm.LinearSVC(C=C)
#該函數linear_kernel計算線性內核，即polynomial_kernel使用degree=1和coef0=0（均勻）的特殊狀況
4.7.3。多項式核函數
sklearn.metrics.pairwise.polynomial_kernel(X, Y=None, degree=3, gamma=None, coef0=1)
svm.SVC(kernel='poly', degree=3, C=C))
4.7.4。Sigmoid核
sklearn.metrics.pairwise.sigmoid_kernel(X, Y=None, gamma=None, coef0=1)
svm.SVC(kernel='sigmoid', gamma=0.7, C=C)
4.7.5。徑向基核函數
sklearn.metrics.pairwise.rbf_kernel(X, Y=None, gamma=None)
svm.SVC(kernel='rbf', gamma=0.7)
4.7.6。拉普拉斯核
sklearn.metrics.pairwise.laplacian_kernel(X, Y=None, gamm a=None)
4.7.7。卡方核
sklearn.metrics.pairwise.chi2_kernel(X, Y=None, gamma=1.0)
clf = svm.SVC(kernel='precomputed')
# linear kernel computation
gram = np.dot(X, X.T)
clf.fit(gram, y)
4.8。改變預測目標（Y）
4.8.1。標籤化
4.8.2。標籤編碼
5。數據加載工具
5.1。通用數據接口
5.2。玩具的數據集
5.3。樣品圖片
5.4。樣品的發電機
5.4.1。分類和聚類生成器
5.4.1.1。單標籤
5.4.1.2。細粒度
5.4.1.3。雙聚類
5.4.2。發電機的迴歸
5.4.3。流形學習生成器
5.4.4。發電機的分解
5.5。svmLight / libsvm格式數據
5.6。來自外部數據集的加載
5.7。Olivetti面臨數據集
5.8。20新聞組文本數據集
5.8.1。使用
5.8.2。文本轉換成向量
5.8.3。過濾文本以得到更真實的訓練
5.9。從mldata.org庫下載數據
5.10。人臉識別數據集中的標記人臉
5.10.1。使用
5.10.2。實例
5.11。森林植被類型
5.12。RCV1數據集
5.13。波士頓房價數據集
5.13.1企業。筆記
5.14。乳腺癌威斯康星（診斷）數據庫
5.14.1企業。筆記
5.14.2。工具書類
5.15。糖尿病數據集
5.15.1公司。筆記
5.16。手寫數字數據集的光學識別
5.16.1。筆記
5.16.2。工具書類
5.17。鳶尾屬植物數據庫
5.17.1。筆記
5.17.2。工具書類
5.18。Linnerrud數據集
5.18.1。筆記
5.18.2。工具書類
6。規模計算策略：更大的數據
6.1。使用核心學習的實例擴展
6.1.1。流實例
6.1.2。特徵提取
6.1.3。增量學習
6.1.4。實例
6.1.5。筆記
7。計算性能
7.1。預測的延遲
7.1.1。體積與Atomic模式
7.1.2。特徵數的影響
7.1.3。輸入數據表示的影響
7.1.4。模型複雜度的影響
7.1.5。特徵提取的延遲
7.2。預測的吞吐量
7.3。提示和技巧
7.3.1。線性代數庫
7.3.2。模型壓縮
7.3.3。模式重塑
7.3.4。連接



   =========   =======================================================
      Colormap    Description
      =========   =======================================================
      autumn      sequential linearly-increasing shades of red-orange-yellow
      bone        sequential increasing black-white color map with
                  a tinge of blue, to emulate X-ray film
      cool        linearly-decreasing shades of cyan-magenta
      copper      sequential increasing shades of black-copper
      flag        repetitive red-white-blue-black pattern (not cyclic at
                  endpoints)
      gray        sequential linearly-increasing black-to-white
                  grayscale
      hot         sequential black-red-yellow-white, to emulate blackbody
                  radiation from an object at increasing temperatures
      hsv         cyclic red-yellow-green-cyan-blue-magenta-red, formed
                  by changing the hue component in the HSV color space
      inferno     perceptually uniform shades of black-red-yellow
      jet         a spectral map with dark endpoints, blue-cyan-yellow-red;
                  based on a fluid-jet simulation by NCSA [#]_
      magma       perceptually uniform shades of black-red-white
      pink        sequential increasing pastel black-pink-white, meant
                  for sepia tone colorization of photographs
      plasma      perceptually uniform shades of blue-red-yellow
      prism       repetitive red-yellow-green-blue-purple-...-green pattern
                  (not cyclic at endpoints)
      spring      linearly-increasing shades of magenta-yellow
      summer      sequential linearly-increasing shades of green-yellow
      viridis     perceptually uniform shades of blue-green-yellow
      winter      linearly-increasing shades of blue-green
      =========   =======================================================
    
    For the above list only, you can also set the colormap using the
    corresponding pylab shortcut interface function, similar to Matlab::
    
      imshow(X)
      hot()
      jet()
    
    The next set of palettes are from the `Yorick scientific visualisation
    package <http://dhmunro.github.io/yorick-doc/>`_, an evolution of
    the GIST package, both by David H. Munro:
    
      ============  =======================================================
      Colormap      Description
      ============  =======================================================
      gist_earth    mapmaker's colors from dark blue deep ocean to green
                    lowlands to brown highlands to white mountains
      gist_heat     sequential increasing black-red-orange-white, to emulate
                    blackbody radiation from an iron bar as it grows hotter
      gist_ncar     pseudo-spectral black-blue-green-yellow-red-purple-white
                    colormap from National Center for Atmospheric
                    Research [#]_
      gist_rainbow  runs through the colors in spectral order from red to
                    violet at full saturation (like *hsv* but not cyclic)
      gist_stern    "Stern special" color table from Interactive Data
                    Language software
      ============  =======================================================
    
    The following colormaps are based on the `ColorBrewer
    <http://colorbrewer2.org>`_ color specifications and designs developed by
    Cynthia Brewer:
    
    ColorBrewer Diverging (luminance is highest at the midpoint, and
    decreases towards differently-colored endpoints):
    
      ========  ===================================
      Colormap  Description
      ========  ===================================
      BrBG      brown, white, blue-green
      PiYG      pink, white, yellow-green
      PRGn      purple, white, green
      PuOr      orange, white, purple
      RdBu      red, white, blue
      RdGy      red, white, gray
      RdYlBu    red, yellow, blue
      RdYlGn    red, yellow, green
      Spectral  red, orange, yellow, green, blue
      ========  ===================================
    
    ColorBrewer Sequential (luminance decreases monotonically):
    
      ========  ====================================
      Colormap  Description
      ========  ====================================
      Blues     white to dark blue
      BuGn      white, light blue, dark green
      BuPu      white, light blue, dark purple
      GnBu      white, light green, dark blue
      Greens    white to dark green
      Greys     white to black (not linear)
      Oranges   white, orange, dark brown
      OrRd      white, orange, dark red
      PuBu      white, light purple, dark blue
      PuBuGn    white, light purple, dark green
      PuRd      white, light purple, dark red
      Purples   white to dark purple
      RdPu      white, pink, dark purple
      Reds      white to dark red
      YlGn      light yellow, dark green
      YlGnBu    light yellow, light green, dark blue
      YlOrBr    light yellow, orange, dark brown
      YlOrRd    light yellow, orange, dark red
      ========  ====================================
    
    ColorBrewer Qualitative:
    
    (For plotting nominal data, :class:`ListedColormap` is used,
    not :class:`LinearSegmentedColormap`.  Different sets of colors are
    recommended for different numbers of categories.)
    
    * Accent
    * Dark2
    * Paired
    * Pastel1
    * Pastel2
    * Set1
    * Set2
    * Set3
    
    Other miscellaneous schemes:
    
      ============= =======================================================
      Colormap      Description
      ============= =======================================================
      afmhot        sequential black-orange-yellow-white blackbody
                    spectrum, commonly used in atomic force microscopy
      brg           blue-red-green
      bwr           diverging blue-white-red
      coolwarm      diverging blue-gray-red, meant to avoid issues with 3D
                    shading, color blindness, and ordering of colors [#]_
      CMRmap        "Default colormaps on color images often reproduce to
                    confusing grayscale images. The proposed colormap
                    maintains an aesthetically pleasing color image that
                    automatically reproduces to a monotonic grayscale with
                    discrete, quantifiable saturation levels." [#]_
      cubehelix     Unlike most other color schemes cubehelix was designed
                    by D.A. Green to be monotonically increasing in terms
                    of perceived brightness. Also, when printed on a black
                    and white postscript printer, the scheme results in a
                    greyscale with monotonically increasing brightness.
                    This color scheme is named cubehelix because the r,g,b
                    values produced can be visualised as a squashed helix
                    around the diagonal in the r,g,b color cube.
      gnuplot       gnuplot's traditional pm3d scheme
                    (black-blue-red-yellow)
      gnuplot2      sequential color printable as gray
                    (black-blue-violet-yellow-white)
      ocean         green-blue-white
      rainbow       spectral purple-blue-green-yellow-orange-red colormap
                    with diverging luminance
      seismic       diverging blue-white-red
      nipy_spectral black-purple-blue-green-yellow-red-white spectrum,
                    originally from the Neuroimaging in Python project
      terrain       mapmaker's colors, blue-green-yellow-brown-white,
                    originally from IGOR Pro
      ============= =======================================================
    
    The following colormaps are redundant and may be removed in future
    versions.  It's recommended to use the names in the descriptions
    instead, which produce identical output:
    
      =========  =======================================================
      Colormap   Description
      =========  =======================================================
      gist_gray  identical to *gray*
      gist_yarg  identical to *gray_r*
      binary     identical to *gray_r*
      spectral   identical to *nipy_spectral* [#]_
      =========  =======================================================
生活不易，本人有意向作數據分析兼職或python在線輔導，若有須要請聯繫qq號1334832194。node
相關標籤/搜索
每日一句
每一个你不满意的现在，都有一个你没有努力的曾经。