原文地址：網絡

https://blog.csdn.net/weixin_40759186/article/details/87547795app

---------------------------------------------------------------------------------------------------------------this

用pytorch作dropout和BN時須要注意的地方

pytorch作dropout:

就是train的時候使用dropout,訓練的時候不使用dropout,
pytorch裏面是經過net.eval()固定整個網絡參數，包括不會更新一些前向的參數，沒有dropout，BN參數固定，理論上對全部的validation set都要使用net.eval()
net.train()表示會歸入梯度的計算。spa

net_dropped = torch.nn.Sequential( torch.nn.Linear(1, N_HIDDEN), torch.nn.Dropout(0.5),  # drop 50% of the neuron
 torch.nn.ReLU(), torch.nn.Linear(N_HIDDEN, N_HIDDEN), torch.nn.Dropout(0.5),  # drop 50% of the neuron
 torch.nn.ReLU(), torch.nn.Linear(N_HIDDEN, 1), ) 
 for t in range(500): pred_drop = net_dropped(x) loss_drop = loss_func(pred_drop, y) optimizer_drop.zero_grad() loss_drop.backward() optimizer_drop.step() if t % 10 == 0: # change to eval mode in order to fix drop out effect
        net_dropped.eval()  # parameters for dropout differ from train mode
 test_pred_drop = net_dropped(test_x) # change back to train mode
        net_dropped.train()

pytorch作Batch Normalization:

net.eval()固定整個網絡參數，固定BN的參數，moving_mean 和moving_var，不懂這個看下圖:.net

if self.do_bn: bn = nn.BatchNorm1d(10, momentum=0.5) setattr(self, 'bn%i' % i, bn)   # IMPORTANT set layer to the Module
 self.bns.append(bn) for epoch in range(EPOCH): print('Epoch: ', epoch) for net, l in zip(nets, losses): net.eval() # set eval mode to fix moving_mean and moving_var
            pred, layer_input, pre_act = net(test_x) net.train() # free moving_mean and moving_var
        plot_histogram(*layer_inputs, *pre_acts)

moving_mean 和 moving_varcode

用tensorflow作dropout和BN時須要注意的地方

dropout和BN都有一個training的參數代表究竟是train仍是test, 代表test那dropout就是不dropout，BN就是固定住了BN的參數；

tf_is_training = tf.placeholder(tf.bool, None)  # to control dropout when training and testing

# dropout net
d1 = tf.layers.dense(tf_x, N_HIDDEN, tf.nn.relu) d1 = tf.layers.dropout(d1, rate=0.5, training=tf_is_training)   # drop out 50% of inputs

d2 = tf.layers.dense(d1, N_HIDDEN, tf.nn.relu) d2 = tf.layers.dropout(d2, rate=0.5, training=tf_is_training)   # drop out 50% of inputs

d_out = tf.layers.dense(d2, 1)
 for t in range(500): sess.run([o_train, d_train], {tf_x: x, tf_y: y, tf_is_training: True}) # train, set is_training=True

    if t % 10 == 0: # plotting
 plt.cla() o_loss_, d_loss_, o_out_, d_out_ = sess.run( [o_loss, d_loss, o_out, d_out], {tf_x: test_x, tf_y: test_y, tf_is_training: False} # test, set is_training=False
 )

def add_layer(self, x, out_size, ac=None): x = tf.layers.dense(x, out_size, kernel_initializer=self.w_init, bias_initializer=B_INIT) self.pre_activation.append(x) # the momentum plays important rule. the default 0.99 is too high in this case! if self.is_bn: x = tf.layers.batch_normalization(x, momentum=0.4, training=tf_is_train) # when have BN out = x if ac is None else ac(x) return out

當BN的training的參數爲train時，只是表示BN的參數是可變化的，並非表明BN會本身更新moving_mean 和moving_var，由於這個操做是前向更新的op,在作train以前必須確保moving_mean 和moving_var更新了，更新moving_mean 和moving_var的操做在tf.GraphKeys.UPDATE_OPS

orm

# !! IMPORTANT !! the moving_mean and moving_variance need to be updated,
        # pass the update_ops with control_dependencies to the train_op
        update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS) with tf.control_dependencies(update_ops): self.train = tf.train.AdamOptimizer(LR).minimize(self.loss)

【轉載】 深度學習總結：用pytorch作dropout和Batch Normalization時須要注意的地方，用tensorflow作dropout和BN時須要注意的地方,

用pytorch作dropout和BN時須要注意的地方

pytorch作dropout:

pytorch作Batch Normalization:

用tensorflow作dropout和BN時須要注意的地方

dropout和BN都有一個training的參數代表究竟是train仍是test, 代表test那dropout就是不dropout，BN就是固定住了BN的參數；

【轉載】深度學習總結：用pytorch作dropout和Batch Normalization時須要注意的地方，用tensorflow作dropout和BN時須要注意的地方,