代碼詳解：一文掌握神經網絡超參數調優

時間 2019-12-01

標籤代碼詳解一文掌握神經網絡參數简体版

原文原文鏈接

全文共7002字，預計學習時長14分鐘或更長git

神經網絡在通訊行業和研究中的使用十分常見，但使人遺憾的是，大部分應用都未能產出足以運行其餘算法的高性能網絡。github

應用數學家在開發新型優化算法時，喜歡進行功能測試，有時也被稱爲人造景觀。人造景觀有助於從如下方面比較各算法的性能：算法

· 收斂（算出答案的速度）json

· 精準度（與正確答案的接近程度）bash

· 穩健性（是否全部功能表現優良，或僅一小部分如此）微信

· 綜合表現（如概念複雜度）網絡

瀏覽有關功能優化測試的維基詞條，就會發現有些功能很難對付。不少功能因找出優化算法的問題而被普遍使用。但本文將討論一項看似微不足道的功能——Beale功能。app

Beale功能

Beale功能以下圖所示：dom

Beale功能是測試功能的緣由在於，它能在坡度極小的平坦區域內評估調優算法的性能。在這種狀況下，基於坡度的優化算法程序難以有效地學習，所以很難達到最小值。ide

本文接下來將按照GitHub庫裏的Jupyter筆記本教程開展討論，以得出解決人造景觀的可行方式。該景觀相似於神經網絡的損失平面。訓練神經網絡的目的是經過某種形式的優化找到損失平面上的最小值——典型的隨機坡度減小。

在學習使用高難度的優化功能後，本文讀者能充分應對施行神經網絡時遇到的實際問題場景。

測試神經網絡前，首先須要給功能下定義能並找出最小值（不然沒法肯定爲正確答案）。第一步（引進相關軟件包後），在筆記本中定義Beale功能：

# define Beale's function which we want to minimizedef objective(X): 
x = X[0]; y = X[1]    
return (1.5 - x + x*y)**2 + (2.25 - x + x*y**2)**2 + 
(2.625 - x + x*y**3)**2複製代碼

已知此案例中（由咱們構想）最小值的大概範圍及柵極網孔的步長，第二步設置功能邊界值。

# function boundariesxmin, xmax, 
xstep = -4.5, 4.5, .9ymin, ymax, ystep = -4.5, 4.5, .9複製代碼

根據以上信息製做一組點狀網孔柵極，就能夠找出最小值。

# Let's create some pointsx1, y1 = np.
meshgrid(np.arange(xmin, xmax + xstep, xstep), np.arange(ymin, ymax + ystep, ystep))複製代碼

如今，得出（很是）初步的結論。

# initial guessx0 = [4., 4.] f0 = objective(x0)print (f0)複製代碼

而後使用scipy.optimize功能，得出答案。

bnds = ((xmi, xmax), (ymin, ymax))minimum = minimize(objective, x0, bounds=bnds)print(minimum)複製代碼

答案結果以下：

答案彷佛是（3，0.5）。若是把這些值填入等式，這確實是最小值（維基上也顯示如此）。

接下來進入神經網絡部分。

神經網絡的優化

神經網絡能夠被定義爲一個結合輸入並猜想輸出的系統。幸運的話，在得出被稱做「地面實況」的結果後，將之與神經網絡的各類輸出進行比對，就能計算錯誤。所以，神經網絡首先進行猜想，而後計算錯誤功能；再次猜想，將錯誤最小化；再次猜想，直到錯誤最小化。這就是優化。

神經網絡中最常使用的優化算法是GD（gradient descent，坡降）類型。坡降中使用的客觀功能正是想縮至最小的損失功能。

本教程的重頭戲是Keras，所以再回顧一下。

Keras複習

Keras是一個深度學習Python庫，可同時在Theano和TensorFlow上運行，它們也是兩個強大的快速數字計算Python庫，分別在臉書和谷歌上建立發佈。

Keras旨在開發儘量快捷簡單的深度學習模型，以運用在研究和實用程序中。Keras使用Python 2.7或3.5語言運行，可無縫切換至GPU和CPU運行。

Keras基於一個模型的概念。在其核心有一些按順序線性排列的層級，稱爲順序模型。Keras還提供功能性界面，可定義複雜模型，如多產出模型、定向非循環圖以及有共有層級的模型。

可以使用順序模型總結Keras的深度學習模型構建，以下所示：

1. 定義模型：建立順序模型，增長層級。

2. 編譯模型：具體設置損失功能和優化器，調用the .compile()功能。

3. 調試模型：調用the .fit() 功能用數據測試模型。

4. 進行預測：經過調用.evaluate() 和.predict()功能，使用該模型對新數據生成新預測。

有些人可能會疑惑——如何在運行模型過程當中檢測其性能？這是個好問題，答案就是使用回叫。

回叫：訓練模型過程當中進行監測

經過使用回叫，可在訓練的任何階段監測模型。回叫是指對訓練程序中特定階段使用的一系列功能。使用回叫，可在訓練過程當中觀察模型內部狀態及數據。可向順序或模型分類的the .fit()方法傳輸一系列回叫（做爲關鍵詞變元回叫）。回叫的相關方法將會在訓練的每個階段使用。

· 大衆所熟悉的Keras回叫功能是keras.callbacks.History()。這是.fit()方法自帶的。

· keras.callbacks.ModelCheckpoint也頗有用，可在訓練中存儲特定階段模型的重量。若是模型長時間運行且出現系統故障，該功能會頗有效果。使用該功能後任何數據都不會遺失。好比，只有當累加器計算且觀測到改進時，存儲模型重量纔是適宜的作法。

· 可監測的大批錯誤中止改進時，keras.callbacks.EarlyStopping功能中止訓練。

· keras.callbacks.LearningRateScheduler功能將改變訓練過程當中的學習速度。

以後將應用一些回叫。詳細記錄參見https://keras.io/callbacks/。

首先須要引進不少不一樣的功能，以方便操做。

import tensorflow as tfimport kerasfrom keras import layersfrom 
keras import modelsfrom keras import utilsfrom keras.
layers import Densefrom keras.models import 
Sequentialfrom keras.layers import Flattenfrom keras.
layers import Dropoutfrom keras.layers import 
Activationfrom keras.regularizers import l2from 
keras.optimizers import SGD
from keras.optimizers import RMSprop
from keras import datasetsfrom keras.callbacks import Learning
RateSchedulerfrom keras.callbacks import Historyfrom keras import 
lossesfrom sklearn.utils import shuffleprint
(tf.VERSION)print(tf.keras.__version__)複製代碼

若是想要網絡使用隨機數字但結果可重複，還能夠執行的一個步驟是使用隨機種子。隨機種子每次產出一樣順序的數字，哪怕它們是僞隨機的（有助於比較模型和測試可複製性）。

# fix random seed for reproducibilitynp.random.seed(5)複製代碼

第一步——肯定網絡拓撲（不必定是優化，但也相當重要）

這一步將使用MNIST數據集，其包含手寫數字（0到9）的灰度圖，28×28像素維度。每一個像素是8位數，所以其數值範圍在0到255之間。

Keras有此內置功能，所以能便捷地獲取數據集。

mnist = keras.datasets.mnist(x_train, y_train),
(x_test, y_test) = mnist.load_data
()x_train.shape, y_train.shape複製代碼

X和Y數據的產出分別是(60000, 28, 28)和(60000,1)。建議打印一些數據，檢驗數值（同時須要數據類型）。

可經過觀察每一個數字的圖像來檢查訓練數據，以確保數據中沒有任何遺漏的。

plt.figure(figsize=(10,10))for i in range(10):    
plt.subplot(5,5,i+1)    
plt.xticks([])    
plt.yticks([])    
plt.grid(False)    
plt.imshow(x_train[i], 
cmap=plt.cm.binary)    
plt.xlabel(y_train[i])複製代碼

最後一項檢查是針對訓練維度和測試集，這一步驟操做相對簡單：

print(f'We have {x_train.shape[0]} train samples')print
(f'We have {x_test.shape[0]} test samples')複製代碼

有60,000個訓練圖像和10,000個測試圖像。以後要預處理數據。

預處理數據

運行神經網絡前，須要預處理數據（如下步驟可任意替換順序）：

· 首先，須要將2D圖像陣列轉爲1D（扁平化）。可以使用numpy.reshape()功能進行陣列重塑，或使用Keras的方法：keras.layers.Flatten層級，可將2D陣列（28×28像素）圖像轉化爲1D陣列圖像（28 * 28 = 784像素）。

· 而後須要將像素值調至正常狀態（將數值調整爲0到1之間），轉換以下：

在案例中，最小值爲0，最大值爲255，所以公式爲：:=𝑥/255。

# normalize the datax_train, x_test = x_train 
/ 255.0, x_test / 255.0
# reshape the data into 1D vectors
x_train = x_train.reshape(60000, 784)
x_test = x_test.reshape(10000, 784)
num_classes = 10
# Check the column aengthx_train.shape[1]複製代碼

如今數據中須要一個獨熱碼。

# Convert class vectors to binary class
 matricesy_train = keras.
utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)複製代碼

第二步——調整學習速度

最經常使用的優化算法之一是隨機坡降（SGD）。其中可調優的超參數是學習速度，動量，衰變和nesterov項。

學習速度在每批結束時控制重量，而且動量控制先前重量如何影響當前重量。衰變表示每次更新時學習速度的降低幅度。nesterov取值「True」或「False」取決因而否要應用Nesterov動量。

這些超參數的一般數值是lr = 0.01，衰變= 1e-6，動量= 0.9，nesterov = True。

學習速度超參數會存在於優化功能中，以下所示。 Keras在SGDoptimizer中具備默認學習速度調度器，會經過隨機坡降的優化算法下降學習速度。學習速度隨着如下公式下降：

lr=lr×1/(1+decay∗epoch)

http://cs231n.github.io/neural-networks-3

接下來在Keras中實施學習速度適應時間表。先從SGD開始，學習速度數值爲0.1。而後針對模型訓練60個時期並將衰變參數設置爲0.0016（0.1 / 60）。其中還包括動量值0.8，由於它在使用、適應學習速度時運做良好。

pochs=60learning_rate = 0.1decay_rate = learning_rate / epochs
momentum = 0.8sgd = SGD(lr=learning_rate, 
momentum=momentum, decay=decay_rate, nesterov=False)複製代碼

接下來開始構建神經網絡：

# build the modelinput_dim = x_train.shape[1]lr_model = Sequential
()lr_model.add(Dense(64, activation=tf.nn.relu, kernel_initializer='uniform',                 
input_dim = input_dim)) lr_model.
add(Dropout(0.1))lr_model.add(Dense(64, kernel_initializer='uniform', 
activation=tf.nn.relu))lr_model.
add(Dense(num_classes, kernel_initializer='uniform', 
activation=tf.nn.softmax))
# compile the modellr_model.compile
(loss='categorical_crossentropy',              
optimizer=sgd,              
metrics=['acc'])複製代碼

如今能夠運行模型，看看它的表現如何。機器花費了大約20分鐘，各人的機器運行速度不一。

%%time# Fit the modelbatch_size = int
(input_dim/100)lr_model_history = lr_model.fit(x_train, y_train,                    
batch_size=batch_size,                    
epochs=epochs,                    
verbose=1,                    
validation_data=(x_test, y_test))複製代碼

運行完畢後，能夠把準確度和損失功能繪製爲訓練和測試集的時期函數，以查看網絡運行狀況。

# Plot the loss functionfig, 
ax = plt.subplots(1, 1, 
figsize=(10,6))ax.plot(np.sqrt
(lr_model_history.history['loss']), 'r', label='train')
ax.plot(np.sqrt(lr_model_history.history['val_loss']), 'b' ,
label='val')ax.set_xlabel(r'Epoch', 
fontsize=20)ax.set_ylabel
(r'Loss', fontsize=20)ax.legend()
ax.tick_params(labelsize=20)
# Plot the accuracyfig, 
ax = plt.subplots(1, 1, figsize=(10,6))ax.plot(np.sqrt
(lr_model_history.history['acc']), 'r', label='train')ax.plot
(np.sqrt(lr_model_history.history['val_acc']), 'b' ,label='val')ax.set_xlabel(r'Epoch', 
fontsize=20)ax.set_ylabel(r'Accuracy', 
fontsize=20)ax.legend()ax.tick_params(labelsize=20)複製代碼

損失函數圖以下：

準確度以下：

如今應用自定義學習速度。

使用LearningRateScheduler改變自定義學習速度

編寫一個執行指數學習速度衰變的函數，以下公式所示：

𝑙𝑟=𝑙𝑟0×𝑒^（ - 𝑘𝑡）

這與以前很是類似，所以會在一個代碼塊中執行此操做，並描述差別。

# solutionepochs = 60learning_rate = 0.1 
# initial learning ratedecay_rate = 0.1momentum = 0.8
# define the optimizer functionsgd = SGD
(lr=learning_rate, momentum=momentum, decay=decay_rate, 
nesterov=False)input_dim = x_train.
shape[1]num_classes = 10batch_size = 196# build the modelex
ponential_decay_model = Sequential()
exponential_decay_model.add(Dense(64, 
activation=tf.nn.relu, kernel_initializer='uniform', input_dim = input_dim))
exponential_decay_model.add(Dropout(0.1))
exponential_decay_model.add(Dense(64, 
kernel_initializer='uniform', activation=tf.nn.relu))
exponential_decay_model.add(Dense(num_classes, 
kernel_initializer='uniform', activation=tf.nn.softmax))
# compile the modelexponential_decay_model.
compile(loss='categorical_crossentropy',                                
 optimizer=sgd,                                 
metrics=['acc'])      
                          
# define the learning rate change def exp_decay(epoch): 
lrate = learning_rate * np.exp(-decay_rate*epoch)    

return lrate    
# learning schedule callbackloss_history = History()
lr_rate = LearningRateScheduler(exp_decay)callbacks_list = [loss_history, lr_rate]
# you invoke the LearningRateScheduler during the .fit() 
phaseexponential_decay_model_history = exponential_decay_model.
fit(x_train, y_train,                                    
batch_size=batch_size,                                    
epochs=epochs,                                    
callbacks=callbacks_list,                                    
verbose=1,                                    
validation_data=(x_test, y_test))複製代碼

此處看到，惟一改變的是被定義的exp_decay函數，以及它在LearningRateScheduler函數中的使用。注意本次還選擇向模型添加一些回叫。

如今能夠將學習速度和損失功能繪製爲時期數量的函數。學習速度圖很是平穩，由於它符合預約義的指數衰變函數。

與以前相比，損失函數更爲平穩。

這代表開發學習速度調度程序有助於提升神經網絡的性能。

第三步——選擇優化器和損失函數

在構建模型並使用它進行預測時，如爲圖像（「貓」，「平面」等）加標籤，但願經過定義「損失」函數來衡量成敗（或目標函數）。優化目標是有效計算使該損失函數最小化的參數/權重。Keras提供各類類型的損失函數。

有時「損失」函數能夠測量「距離」，經過符合問題或數據集的各類方式在兩個數據點之間定義這個「距離」。使用的距離取決於數據類型和正在處理的特定問題。例如，在天然語言處理（分析文本數據）中，漢明距離的使用更爲常見。

距離

· 歐幾里德（Euclidean）

· 曼哈頓(Manhattan)

· 如漢明等距離用於測量弦之間的距離。「carolin」和「cathrin」之間的漢明距離爲3。

損失函數

· MSE（用於迴歸）

· 分類交叉熵（用於分類）

· 二元交叉熵（用於分類）

# build the modelinput_dim = x_train.shape[1]
model = Sequential()model.add(Dense(64, 
activation=tf.nn.relu, kernel_initializer='uniform',                 
input_dim = input_dim)) 
# fully-connected layer with 64 hidden unitsmodel.add(Dropout(0.1))
model.add(Dense(64, kernel_initializer='uniform', activation=tf.nn.relu))
model.add(Dense(num_classes, kernel_initializer='uniform', activation=tf.nn.softmax))
# defining the parameters for RMSprop 
(I used the keras defaults here)rms = RMSprop
(lr=0.001, rho=0.9, epsilon=None, decay=0.0)
model.compile(loss='categorical_crossentropy',             
optimizer=rms,              
metrics=['acc'])複製代碼

第4步——肯定批量大小和時期數量

批量大小定義經過網絡傳播的樣本數。

例如，有1000個訓練樣本，而且要設置batch_size爲100。算法從訓練數據集中獲取前100個樣本（從第1到第100個）訓練網絡。接下來，須要另外100個樣本（從第101到第200）並再次訓練網絡。此過程需一直執行直至傳播完樣本。

使用批量大小的優勢<全部樣本數量的優勢：

· 所需內存更小。因爲使用較少樣本訓練網絡，總體訓練過程須要較小的內存。若是沒法將整個數據集放入機器的內存中，那麼這一點尤其重要。

· 一般，使用小批量的網絡培訓得更快，緣由是每次傳播後會更新權重。

使用批量大小的缺點<全部樣本的數量的缺點：

· 批次越小，梯度的估計就越不許確。

時期數是一個超參數，定義學習算法在整個訓練數據集中的工做次數。

一個時期意味着訓練數據集中的每一個樣本都有機會更新內部模型參數。時期由一個或多個批次組成。

選擇批量大小或時期數沒有硬性和快速的規則，而且增長時期數不必定比較少時期數產生更好的結果。

%%timebatch_size = input_dimepochs = 60model_history =
 model.fit(x_train, y_train,                    
batch_size=batch_size,                    
epochs=epochs,                    
verbose=1,                    
validation_data=(x_test, y_test))複製代碼

score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])print('Test accuracy:', score[1])fig, ax = plt.subplots
(1, 1, figsize=(10,6))ax.
plot(np.sqrt(model_history.history['acc']), 
'r', label='train_acc')
ax.plot(np.sqrt(model_history.history['val_acc']), 'b' ,
label='val_acc')ax.set_xlabel(r'Epoch', 
fontsize=20)ax.set_ylabel(r'Accuracy',
fontsize=20)ax.legend()ax.tick_params(labelsize=20)
fig, ax = plt.subplots(1, 1, 
figsize=(10,6))ax.
plot(np.sqrt(model_history.history['loss']), 'r', 
label='train')ax.plot(np.sqrt(model_history.history['val_loss']), 
'b' ,label='val')ax.set_xlabel(r'Epoch', 
fontsize=20)ax.set_ylabel(r'Loss', fontsize=20)ax.
legend()ax.tick_params(labelsize=20)複製代碼

第5步——隨機重啓

此方法彷佛沒法Keras中實現，但能夠經過更改keras.callbacks.LearningRateScheduler輕鬆完成。本文將此做爲練習留給讀者，它主要是在有限時期數以後重置學習速度。

使用交叉驗證調整超參數

如今無需手動嘗試不一樣值，而可以使用Scikit-Learn的GridSearchCV，爲超參數嘗試幾個值，並比較結果。

爲使用Keras進行交叉驗證，將運用到Scikit-Learn API的包裝器。其將Sequential Keras模型使用（僅單輸入）做爲Scikit-Learn工做流程的一部分。

如下爲兩個包裝器：

keras.wrappers.scikit_learn.KerasClassifier（build_fn = None，** sk_params），它實現了Scikit-Learn分類器接口。

keras.wrappers.scikit_learn.KerasRegressor（build_fn = None，** sk_params），它實現了Scikit-Learn迴歸量接口。

import numpyfrom sklearn.model_selection import GridSearch
CVfrom keras.wrappers.scikit_learn import KerasClassifier複製代碼

嘗試不一樣的權重初始化

將嘗試經過交叉驗證進行優化的第一個超參數是不一樣的權重初始化。

# let's create a function that creates the model (required for KerasClassifier)
 # while accepting the hyperparameters we want to tune 
# we also pass some default values such as optimizer='rmsprop'def 
create_model(init_mode='uniform'):    
# define model 
model = Sequential()    
model.add(Dense(64, kernel_initializer=init_mode, 
activation=tf.nn.relu, input_dim=784))     
model.add(Dropout(0.1))   
 model.add(Dense(64, kernel_initializer=init_mode, activation=tf.nn.relu))    
model.add(Dense(10, kernel_initializer=init_mode, activation=tf.nn.softmax))   
# compile model 
model.compile(loss='categorical_crossentropy',              
optimizer=RMSprop(),              
metrics=['accuracy'])return model複製代碼

%%timeseed = 7numpy.random.seed(seed)
batch_size = 128epochs = 10model_CV = 
KerasClassifier(build_fn=create_model, epochs=epochs,                            
batch_size=batch_size, verbose=1)
# define the grid search parametersinit_mode = 
['uniform', 'lecun_uniform', 'normal', 'zero',              
'glorot_normal', 'glorot_uniform', 'he_normal', 
'he_uniform']param_grid = dict(init_mode=init_mode)
grid = GridSearchCV(estimator=model_CV, param_grid=param_grid, n_jobs=-1, cv=3)
grid_result = grid.fit(x_train, y_train)複製代碼

# print resultsprint(f'Best Accuracy for {grid_result.best_score_} 
using {grid_result.best_params_}') means = grid_result.cv_results_['mean_test_score']stds = grid_result.cv_results_['std_test_score']params = grid_result.cv_results_['params']for mean, stdev, param in zip(means, stds, params): print(f' mean={mean:.4}, std={stdev:.4} using {param}')複製代碼

GridSearch結果以下：

能夠看到，從使用lecun_uniform初始化或glorot_uniform初始化的模型中得出最好的結果，而且能夠得到近97％的準確度。

將神經網絡模型保存爲JSON

分層數據格式（HDF5）用於存儲大陣列數據，包括神經網絡中權重的值。

能夠安裝HDF5 Python模塊：pip install h5py

Keras有助於使用JSON格式描述和保存任何模型。

from keras.models import model_from_json# serialize model to J
SONmodel_json = model.to_json()with open("model.json", "w") as json_file:    
json_file.write(model_json)
# save weights to HDF5model.save_weights("model.h5")
print("Model saved")
# when you want to retrieve the model: load json and 
create modeljson_file = open('model.json', 'r')
saved_model = json_file.read()# close the file as
good practicejson_file.close()model_from_json = 
model_from_json(saved_model)# load weights 
into new modelmodel_from_json.load_weights
("model.h5")print("Model loaded")複製代碼

使用多個超參數進行交叉驗證

一般人們對一個參數變化的方式不感興趣，而對多個參數變化如何影響結果感到好奇。能夠同時對多個參數進行交叉驗證，嘗試它們的組合。

注意：神經網絡中的交叉驗證須要大量計算。在實驗以前要三思！將須要驗證的要素數量相乘，查看有多少組合。使用k折交叉驗證評估每一個組合（k是咱們選擇的參數）。

例如，能夠選擇搜索不一樣的值：

· 批量大小

· 時期數量

· 初始化模式

選項被指定到字典中並傳遞給GridSearchCV。

如今對批量大小、時期數和初始化程序組合執行GridSearch。

# repeat some of the initial values here so we make sure they were not 
changedinput_dim = x_train.shape[1]num_classes = 10
# let's create a function that creates the model (required for KerasClassifier) 
# while accepting the hyperparameters we want to tune 
# we also pass some default values such as optimizer='rmsprop'def 
create_model_2(optimizer='rmsprop', init='glorot_uniform'):    
model = Sequential()    
model.add(Dense(64, input_dim=input_dim, kernel_initializer=init, activation='relu'))    
model.add(Dropout(0.1))   
 model.add(Dense(64, kernel_initializer=init, activation=tf.nn.relu))    
model.add(Dense(num_classes, kernel_initializer=init, activation=tf.nn.softmax))    
# compile model 
model.compile(loss='categorical_crossentropy',                   
optimizer=optimizer,                   
metrics=['accuracy'])return model複製代碼

%%time# fix random seed for reproducibility (this might work or might not work 
# depending on each library's implenentation)seed = 7numpy.random.seed(seed)
# create the sklearn model for the 
networkmodel_init_batch_epoch_CV = KerasClassifier(build_fn=create_model_2, verbose=1)
# we choose the initializers that came at the top in our previous cross-validation!!
init_mode = ['glorot_uniform', 'uniform'] batches = [128, 512]epochs = [10, 20
# grid search for initializer, batch size and number of epochsparam_grid = 
dict(epochs=epochs, batch_size=batches, init=init_mode)
grid = GridSearchCV(estimator=model_init_batch_epoch_CV,                    
 param_grid=param_grid,                    
cv=3)grid_result = grid.fit(x_train, y_train)複製代碼

# print resultsprint(f'Best Accuracy for {grid_result.best_score_:.4}
 using {grid_result.best_params_}')means = grid_result.cv_results_['mean_test_score'] stds = grid_result.cv_results_['std_test_score']params = grid_result.cv_results_ ['params']for mean, stdev, param in zip(means, stds, params): print(f'mean={mean:.4}, std={stdev:.4} using {param}')複製代碼

最後一個問題：若是在GridSearchCV中必須循環的參數數量和值的數量特別大，該怎麼辦？

這多是一個棘手的問題。想象一下，有5個參數以及爲每一個參數選擇的10個可能值。可能組合的數量是10⁵，這意味着必須訓練一個龐大的網絡。顯然，這種操做會很瘋狂，因此一般使用RandomizedCV。

RandomizedCV容許人們指定全部可能的參數。對於交叉驗證中的每一個摺疊，它選擇用於當前模型的隨機參數子集。最後，用戶能夠選擇最佳參數集並將其用做近似解。