Dropout是過去幾年很是流行的正則化技術,可有效防止過擬合的發生。但從深度學習的發展趨勢看,Batch Normalizaton(簡稱BN)正在逐步取代Dropout技術,特別是在卷積層。本文將首先引入Dropout的原理和實現,而後觀察現代深度模型Dropout的使用狀況,並與BN進行實驗比對,從原理和實測上來講明Dropout已經是過去式,你們應儘量使用BN技術。html
根據wikipedia定義,dropout是指在神經網絡中丟棄掉一些隱藏或可見單元。一般來講,是在神經網絡的訓練階段,每一次迭代時,都會隨機選擇一批單元,讓其被暫時忽略掉,所謂的忽略是不讓這些單元參與前向推理和後向傳播。git
上圖是標準的神經網絡,通過dropout後,則變成以下圖:github
通常來講,咱們在可能發生過擬合的狀況下才會使用dropout等正則化技術。那何時可能會發生呢?好比神經網絡過深,或訓練時間過長,或沒有足夠多的數據時。那爲何dropout能有效防止過擬合呢?能夠理解爲,咱們每次訓練迭代時,隨機選擇一批單元不參與訓練,這使得每一個單元不會依賴於特定的前綴單元,所以具備必定的獨立性;一樣能夠當作咱們拿一樣的數據在訓練不一樣的網絡,每一個網絡都有可能過擬合,但迭代屢次後,這種過擬合會被抵消掉。網絡
要注意的是,dropout是體如今訓練環節,訓練完成後,咱們認爲全部的單元都被訓練好了,在驗證或測試階段,咱們是拿完整的神經網絡去驗證或測試。架構
以keras爲例,其代碼爲:keras.backend.dropout(x, level, noise_shape=None, seed=None),其中x指的是輸入參數,level則是keep-prob,也就是這個單元有多少機率會被設置爲0。app
import tensorflow.keras.backend as K input = K.random_uniform_variable(shape=(3, 3), low=0, high=1) print("dropout with keep-prob 0.5:", K.eval(K.dropout(input, 0.5))) print("dropout with keep-prob 0.2:", K.eval(K.dropout(input, 0.2))) print("dropout with keep-prob 0.8:", K.eval(K.dropout(input, 0.8)))
看看輸出結果:dom
dropout with keep-prob 0.5: [[1.190095 0. 1.2999489] [0. 0.3164637 0. ] [0. 0. 0. ]] dropout with keep-prob 0.2: [0.74380934 0.67237484 0.81246805] [0.8819132 0.19778982 1.2349881 ] [1.0369372 0.5945368 0. ]] dropout with keep-prob 0.8: [[0. 0. 0. ] [0. 0. 4.9399524] [4.147749 2.3781471 0. ]]
能夠看出,level值越大,每一個單元成爲0的機率也就越大。函數
在具體的keras應用中,dropout一般放在激活函數後,好比:post
model=keras.models.Sequential() model.add(keras.layers.Dense(150, activation="relu")) model.add(keras.layers.Dropout(0.5))
隨着深度學習的發展,Dropout在現代卷積架構中,已經逐步被BN(想要了解BN,你們能夠參見我以前寫的 深度學習基礎系列(七)| Batch Normalization 一文,這裏再也不贅述)取代,BN也一樣擁有不亞於Dropout的正則化效果。性能
「We presented an algorithm for constructing, training, and performing inference with batch-normalized networks. The resulting networks can be trained with saturating nonlinearities, are more tolerant to increased training rates, and often do not require Dropout for regularization.」 -Ioffe and Svegedy 2015
至於爲什麼Dropout再也不受青睞,緣由以下:
事實上,咱們能夠看看keras實現的現代經典模型,就能夠窺之dropout目前的處境。打開keras的地址:https://github.com/keras-team/keras-applications
縱觀不管是VGG、ResNet、Inception、MobileNetV2等模型,都不見了Dropout蹤跡。惟獨在MobileNetV1模型裏,還能夠找到Dropout,但不是在卷積層;並且在MobileNetV2後,已經再也不有全鏈接層,而是被全局平均池化層所取代。以下圖所示:
其餘模型也相似,紛紛拋棄了Dropout和全鏈接層。
咱們須要作一個簡單實驗來驗證上述理論的成立,實驗分五種測試模型:
代碼以下:
import keras from keras.datasets import cifar10 from keras.preprocessing.image import ImageDataGenerator from keras.models import Sequential from keras.layers import Dense, Dropout, Activation, Flatten, BatchNormalization from keras.layers import Conv2D, MaxPooling2D from matplotlib import pyplot as plt import numpy as np # 爲保證公平起見,使用相同的隨機種子 np.random.seed(7) batch_size = 32 num_classes = 10 epochs = 40 data_augmentation = True # The data, split between train and test sets: (x_train, y_train), (x_test, y_test) = cifar10.load_data() # Convert class vectors to binary class matrices. y_train = keras.utils.to_categorical(y_train, num_classes) y_test = keras.utils.to_categorical(y_test, num_classes) x_train = x_train.astype('float32') x_test = x_test.astype('float32') x_train /= 255 x_test /= 255 def model(bn=False, dropout=False, level=0.5): model = Sequential() model.add(Conv2D(32, (3, 3), padding='same', input_shape=x_train.shape[1:])) if bn: model.add(BatchNormalization()) model.add(Activation('relu')) model.add(Conv2D(32, (3, 3))) if bn: model.add(BatchNormalization()) model.add(Activation('relu')) model.add(MaxPooling2D(pool_size=(2, 2))) if dropout: model.add(Dropout(level)) model.add(Conv2D(64, (3, 3), padding='same')) if bn: model.add(BatchNormalization()) model.add(Activation('relu')) model.add(Conv2D(64, (3, 3))) if bn: model.add(BatchNormalization()) model.add(Activation('relu')) model.add(MaxPooling2D(pool_size=(2, 2))) if dropout: model.add(Dropout(level)) model.add(Flatten()) model.add(Dense(512)) if bn: model.add(BatchNormalization()) model.add(Activation('relu')) if dropout: model.add(Dropout(level)) model.add(Dense(num_classes)) model.add(Activation('softmax')) if bn: opt = keras.optimizers.rmsprop(lr=0.001, decay=1e-6) else: opt = keras.optimizers.rmsprop(lr=0.0001, decay=1e-6) model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy']) # 使用數據加強獲取更多的訓練數據 datagen = ImageDataGenerator(width_shift_range=0.1, height_shift_range=0.1, horizontal_flip=True) datagen.fit(x_train) history = model.fit_generator(datagen.flow(x_train, y_train, batch_size=batch_size), epochs=epochs, validation_data=(x_test, y_test), workers=4) return history no_dropout_bn_history = model(False, False) dropout_low_history = model(False, True, 0.2) dropout_medium_history = model(False, True, 0.5) dropout_high_history = model(False, True, 0.8) bn_history = model(True, False) # 比較多種模型的精確度 plt.plot(no_dropout_bn_history.history['val_acc']) plt.plot(dropout_low_history.history['val_acc']) plt.plot(dropout_medium_history.history['val_acc']) plt.plot(dropout_high_history.history['val_acc']) plt.plot(bn_history.history['val_acc']) plt.title('Model accuracy') plt.ylabel('Validation Accuracy') plt.xlabel('Epoch') plt.legend(['No bn and dropout', 'Dropout with 0.2', 'Dropout with 0.5', 'Dropout with 0.8', 'BN'], loc='lower right') plt.grid(True) plt.show() # 比較多種模型的損失率 plt.plot(no_dropout_bn_history.history['val_loss']) plt.plot(dropout_low_history.history['val_loss']) plt.plot(dropout_medium_history.history['val_loss']) plt.plot(dropout_high_history.history['val_loss']) plt.plot(bn_history.history['val_loss']) plt.title('Model loss') plt.ylabel('Loss') plt.xlabel('Epoch') plt.legend(['No bn and dropout', 'Dropout with 0.2', 'Dropout with 0.5', 'Dropout with 0.8', 'BN'], loc='upper right') plt.grid(True) plt.show()
各模型的驗證準確率以下圖:
各模型的驗證損失率以下:
由上圖可知,Dropout在不一樣機率下,其表現差別較大,相對來講,Dropout with 0.2的表現接近於 No bn and dropout(能夠理解爲Dropout的keep-prob爲1的版本)。整體來講,BN在準確率和損失率上表現要優於Dropout,好比準確率上BN能達到85%,而Dropout接近爲79%。
不管是理論上的分析,仍是現代深度模型的演變,或者是實驗的結果,BN技術已顯示出其優於Dropout的正則化效果,咱們也是時候放棄Dropout,投入BN的懷抱了。