[Scikit-learn] 1.1 Generalized Linear Models - Comparing various online solvers

時間 2019-11-18

標籤 scikit learn 1.1 generalized linear models comparing various online solvers 简体版

原文原文鏈接

數據集分割

1、Online learning for 手寫識別

From: Comparing various online solvers javascript

An example showing how different online solvers perform on the hand-written digits dataset.css

Ref: 在線機器學習算法及其僞代碼html

PA, CW, AROW, NHerd都是 Jubatus分佈式在線機器學習框架能提供的算法。html5

感知器：linear_model.Perceptronjava

多重感知器：Multi-layer Perceptronnode

被動攻擊算法：Passive Aggressive Perceptronpython

修正權值時，增長了一個參數Tt，預測正確時，不須要調整權值，預測錯誤時，主動調整權值。並能夠加入鬆弛變量的概念，造成其算法的變種。jquery

優勢：能減小錯誤分類的數目，並且適用於不可分的噪聲狀況。linux

Tt 有三種計算方法：android

a. Tt = lt / (||Xt||^2)

b. Tt = min{C, lt / ||Xt||^2}

c. Tt = lt / (||Xt||^2 + 1/(2C))

分別對應PA, PA-I, PA-II 算法，三種類型。

2、代碼演示

In [58]:

# Author: Rob Zinkov <rob at zinkov dot com>
# License: BSD 3 clause

import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets

# from sklearn.cross_validation import train_test_split
from sklearn.model_selection import train_test_split
from sklearn.linear_model import SGDClassifier, Perceptron
from sklearn.linear_model import PassiveAggressiveClassifier
from sklearn.linear_model import LogisticRegression

準備數據¶

In [59]:

digits  = datasets.load_digits()
X, y    = digits.data, digits.target

print(X)
print(X.shape)
print()
print(y)
print(y.shape)

[[ 0.  0.  5. ...  0.  0.  0.]
 [ 0.  0.  0. ... 10.  0.  0.]
 [ 0.  0.  0. ... 16.  9.  0.]
 ...
 [ 0.  0.  1. ...  6.  0.  0.]
 [ 0.  0.  2. ... 12.  0.  0.]
 [ 0.  0. 10. ... 12.  1.  0.]]
(1797, 64)

[0 1 2 ... 8 9 8]
(1797,)

分類器¶

In [60]:

classifiers = [
    ("SGD",                   SGDClassifier()),
    ("ASGD",                  SGDClassifier(average=True)),
    ("Perceptron",            Perceptron()),
    ("Passive-Aggressive I",  PassiveAggressiveClassifier(loss='hinge', C=1.0)),
    ("Passive-Aggressive II", PassiveAggressiveClassifier(loss='squared_hinge', C=1.0)),
    ("SAG",                   LogisticRegression(solver='sag', tol=1e-1, C=1.e4 / X.shape[0], multi_class='auto'))
]

訓練並測試¶

In [61]:

# 每一個數據集分割測試20次求 average performance
rounds  = 20
# test size
heldout = [0.95, 0.90, 0.75, 0.50, 0.01]

xx = 1. - np.array(heldout)
print(xx)

[0.05 0.1  0.25 0.5  0.99]

In [62]:

for name, clf in classifiers:
    print("training %s" % name)
    rng = np.random.RandomState(42)
    yy = []

    for i in heldout:
        yy_ = []
        for r in range(rounds):
            # data, train, predict
            X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=i, random_state=rng)
            clf.fit(X_train, y_train)
            y_pred = clf.predict(X_test)
            
            # try 20 times and calculate ave value.
            yy_.append(1 - np.mean(y_pred == y_test))
            
        yy.append(np.mean(yy_))
        
    plt.plot(xx, yy, label=name)

plt.legend(loc="upper right")
plt.xlabel("Proportion train")
plt.ylabel("Test Error Rate")
plt.show()

training SGD
training ASGD
training Perceptron
training Passive-Aggressive I
training Passive-Aggressive II
training SAG

In [ ]:

minibatch 批訓練

1、是什麼爲何

Ref: 爲何在深度學習裏面須要mini-batch？

對於通常的BP網絡來講，當沒有mini-batch的時候，w、b 的每一次update都是整個數據集gradient的方向，而整個大數據集的數據分佈是不會隨着update的次數而改變的，它不存在隨機性，這樣容易是的在訓練的時候「才跑兩步」 cost function就卡住了，就很容易卡在saddle point或者說是local minimum 裏面；若是加上mini-batch，則每次update的時候，數據集都不盡相同，都會存在一些隨機性（上次update的時候是saddle point，但在下次update的時候可能就不是了），所以若是saddle point不是很麻煩的saddle point或local minimum不是很深的local minimum的話，整個模型是相對很容易跑出來的。

From: Out-of-core classification of text documents

This is an example showing how scikit-learn can be used for classification using an out-of-core approach: learning from data that doesn’t fit into main memory.

We make use of an online classifier, i.e., one that supports the partial_fit method, that will be fed with batches of examples.

To guarantee that the features space remains the same over time we leverage a HashingVectorizer that will project each example into the same feature space. This is especially useful in the case of text classification where new features (words) may appear in each batch.

The dataset used in this example is Reuters-21578 as provided by the UCI ML repository. It will be automatically downloaded and uncompressed on first run.

The plot represents the learning curve of the classifier: the evolution of classification accuracy over the course of the mini-batches. Accuracy is measured on the first 1000 samples, held out as a validation set.

To limit the memory consumption, we queue examples up to a fixed amount before feeding them to the learner. 一點一點地來報到，參加 training。

2、運行結果

runfile('/home/unsw/Programmer/1-python/Scikit-learn/1.9_code/plot_out_of_core_classification.py', wdir='/home/unsw/Programmer/1-python/Scikit-learn/1.9_code')
downloading dataset (once and for all) into /home/unsw/scikit_learn_data/reuters
untarring Reuters dataset...
done.
Test set is 975 documents (114 positive)
  Passive-Aggressive classifier :          931 train docs (   123 positive)    975 test docs (   114 positive) accuracy: 0.911 in 1.16s (  804 docs/s)
          Perceptron classifier :          931 train docs (   123 positive)    975 test docs (   114 positive) accuracy: 0.928 in 1.16s (  802 docs/s)
                 SGD classifier :          931 train docs (   123 positive)    975 test docs (   114 positive) accuracy: 0.924 in 1.16s (  800 docs/s)
      NB Multinomial classifier :          931 train docs (   123 positive)    975 test docs (   114 positive) accuracy: 0.884 in 1.19s (  784 docs/s)


  Passive-Aggressive classifier :         2852 train docs (   346 positive)    975 test docs (   114 positive) accuracy: 0.964 in 3.03s (  939 docs/s)
          Perceptron classifier :         2852 train docs (   346 positive)    975 test docs (   114 positive) accuracy: 0.923 in 3.04s (  939 docs/s)
                 SGD classifier :         2852 train docs (   346 positive)    975 test docs (   114 positive) accuracy: 0.957 in 3.04s (  937 docs/s)
      NB Multinomial classifier :         2852 train docs (   346 positive)    975 test docs (   114 positive) accuracy: 0.892 in 3.06s (  930 docs/s)


  Passive-Aggressive classifier :         5778 train docs (   721 positive)    975 test docs (   114 positive) accuracy: 0.947 in 5.18s ( 1115 docs/s)
          Perceptron classifier :         5778 train docs (   721 positive)    975 test docs (   114 positive) accuracy: 0.929 in 5.18s ( 1114 docs/s)
                 SGD classifier :         5778 train docs (   721 positive)    975 test docs (   114 positive) accuracy: 0.962 in 5.19s ( 1114 docs/s)
      NB Multinomial classifier :         5778 train docs (   721 positive)    975 test docs (   114 positive) accuracy: 0.911 in 5.21s ( 1109 docs/s)


  Passive-Aggressive classifier :         8699 train docs (  1108 positive)    975 test docs (   114 positive) accuracy: 0.972 in 7.32s ( 1188 docs/s)
          Perceptron classifier :         8699 train docs (  1108 positive)    975 test docs (   114 positive) accuracy: 0.961 in 7.32s ( 1187 docs/s)
                 SGD classifier :         8699 train docs (  1108 positive)    975 test docs (   114 positive) accuracy: 0.964 in 7.33s ( 1187 docs/s)
      NB Multinomial classifier :         8699 train docs (  1108 positive)    975 test docs (   114 positive) accuracy: 0.926 in 7.35s ( 1183 docs/s)


  Passive-Aggressive classifier :        11515 train docs (  1439 positive)    975 test docs (   114 positive) accuracy: 0.970 in 9.44s ( 1219 docs/s)
          Perceptron classifier :        11515 train docs (  1439 positive)    975 test docs (   114 positive) accuracy: 0.958 in 9.45s ( 1219 docs/s)
                 SGD classifier :        11515 train docs (  1439 positive)    975 test docs (   114 positive) accuracy: 0.973 in 9.45s ( 1218 docs/s)
      NB Multinomial classifier :        11515 train docs (  1439 positive)    975 test docs (   114 positive) accuracy: 0.929 in 9.47s ( 1215 docs/s)


  Passive-Aggressive classifier :        14376 train docs (  1898 positive)    975 test docs (   114 positive) accuracy: 0.970 in 11.61s ( 1238 docs/s)
          Perceptron classifier :        14376 train docs (  1898 positive)    975 test docs (   114 positive) accuracy: 0.872 in 11.61s ( 1237 docs/s)
                 SGD classifier :        14376 train docs (  1898 positive)    975 test docs (   114 positive) accuracy: 0.972 in 11.62s ( 1237 docs/s)
      NB Multinomial classifier :        14376 train docs (  1898 positive)    975 test docs (   114 positive) accuracy: 0.937 in 11.64s ( 1235 docs/s)


  Passive-Aggressive classifier :        17314 train docs (  2203 positive)    975 test docs (   114 positive) accuracy: 0.969 in 13.85s ( 1249 docs/s)
          Perceptron classifier :        17314 train docs (  2203 positive)    975 test docs (   114 positive) accuracy: 0.964 in 13.85s ( 1249 docs/s)
                 SGD classifier :        17314 train docs (  2203 positive)    975 test docs (   114 positive) accuracy: 0.976 in 13.86s ( 1249 docs/s)
      NB Multinomial classifier :        17314 train docs (  2203 positive)    975 test docs (   114 positive) accuracy: 0.939 in 13.88s ( 1247 docs/s)

3、minibatch 訓練

# Here are some classifiers that support the `partial_fit` method partial_fit_classifiers = {
 'SGD': SGDClassifier(), 'Perceptron': Perceptron(), 'NB Multinomial': MultinomialNB(alpha=0.01), 'Passive-Aggressive': PassiveAggressiveClassifier(), } 


# We will feed the classifier with mini-batches of 1000 documents; this means # we have at most 1000 docs in memory at any time. The smaller the document # batch, the bigger the relative overhead of the partial fit methods.
minibatch_size = 1000

# Create the data_stream that parses Reuters SGML files and iterates on # documents as a stream.
minibatch_iterators = iter_minibatches(data_stream, minibatch_size) total_vect_time = 0.0

 # Main loop : iterate on mini-batches of examples
for i, (X_train_text, y_train) in enumerate(minibatch_iterators): tick = time.time() X_train = vectorizer.transform(X_train_text) total_vect_time += time.time() - tick 
 # 來一批，你們各自訓練一次；再來一批，你們各自再訓練一次 for cls_name, cls in partial_fit_classifiers.items(): tick = time.time() # update estimator with examples in the current mini-batch
        cls.partial_fit(X_train, y_train, classes=all_classes) # accumulate test accuracy stats
        cls_stats[cls_name]['total_fit_time'] += time.time() - tick cls_stats[cls_name]['n_train']        += X_train.shape[0] cls_stats[cls_name]['n_train_pos']    += sum(y_train)
 tick = time.time() cls_stats[cls_name]['accuracy']        = cls.score(X_test, y_test) cls_stats[cls_name]['prediction_time'] = time.time() - tick
 acc_history = (cls_stats[cls_name]['accuracy'], cls_stats[cls_name]['n_train']) cls_stats[cls_name]['accuracy_history'].append(acc_history)
 run_history = (cls_stats[cls_name]['accuracy'], total_vect_time + cls_stats[cls_name]['total_fit_time']) cls_stats[cls_name]['runtime_history'].append(run_history) if i % 3 == 0: print(progress(cls_name, cls_stats[cls_name]))
 if i % 3 == 0: print('\n')

End.

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。