CS231N Assignment4 Two Layer Net

時間 2019-12-12

標籤 cs231n assignment4 assignment layer 简体版

原文原文鏈接

CS231N Assignment4 Two Layer Netpython

Begin網絡

本文主要介紹CS231N系列課程的第四項做業，寫一個兩層神經網絡訓練模型。app

課程主頁：網易雲課堂CS231N系列課程dom

語言：Python3.6函數

1神經網絡學習

神經網絡理解起來比較簡單，在線形分類器的基礎上加一個非線性激活函數，使其能夠表示非線性含義，再增長測試

多層分類器就成爲多層神經網絡，以下圖所示，由輸入X通過第一層計算獲得W1X，在後再用隱含層的激活函數max(0,s)this

獲得隱含層的輸出。到輸出層乘以W2獲得輸出層，最後的分類計分。spa

下圖中最左側爲3072表明每幅圖像有3072個特徵，通過第一層網絡到達中間層叫作隱藏層，隱含層變爲100個特徵了，在通過第二層計算到輸出層最終獲得10個類的得分。此神經網絡叫作兩層的神經網絡（包含W一、W2）也叫有一個隱含層的神經網絡。3d

對於激活函數只有在隱含層計算時有激活函數。

對於激活函數，有不少種，以下所示，上述中咱們採用的是RELU

2編寫一個兩層神經網絡

相似於以前咱們書寫的SVM等，編寫任何一個訓練器須要包含如下幾部分

一、LOSS損失函數（前向傳播）與梯度（後向傳播）計算

二、訓練函數

三、預測函數

四、參數訓練

2.1 loss函數

損失函數計算採用softmaxu損失方法

一、首先計算前向傳輸，計算分數，就是上面那三個公式的調用

##############################
        #Computing the class scores of the input
        ##############################
        Z1 = X.dot(W1) + b1#第一層
        S1 = np.maximum(0,Z1)#隱藏層激活函數
        score = S1.dot(W2) + b2#輸出層

二、計算完以後，插入一句話，當沒有y參數時，直接輸出分數，主要用在計算預測函數時須要計算分數。

        if Y is None:
            return score
        loss = None

三、以後計算損失softmax計算，具體計算能夠參考個人做業3

        ###############################
        #TODO:forward pass 
        #computing the loss of the net 
        ################################
        exp_scores = np.exp(score)
        probs = exp_scores / np.sum(exp_scores,axis=1,keepdims=True)
        #數據損失
        data_loss = -1.0/ N * np.log(probs[np.arange(N),Y]).sum()
        #正則損失
        reg_loss = 0.5*reg*(np.sum(W1*W1) + np.sum(W2*W2))
        #總損失
        loss = data_loss + reg_loss

四、計算後向傳播梯度

        ################################
        #TODO:backward pass
        #computing the gradient
        ################################
        grads = {}
        dscores = probs
        dscores[np.arange(N),Y] -= 1
        dscores /= N
        #更新W2B2
        grads['W2'] = S1.T.dot(dscores) + reg *W2
        grads['b2'] = np.sum(dscores,axis = 0)

        #第二層

        dhidden = dscores.dot(W2.T)
        dhidden[S1<=0] = 0

        grads['W1'] = X.T.dot(dhidden) + reg *W1
        grads['b1'] = np.sum(dhidden,axis = 0)

代碼以下：

def loss(self,X,Y=None,reg=0.0):
        '''
        計算損失函數
        '''
        W1, b1 = self.params['W1'], self.params['b1']
        W2, b2 = self.params['W2'], self.params['b2']
        N, D = X.shape
        ##############################
        #Computing the class scores of the input
        ##############################
        Z1 = X.dot(W1) + b1#第一層
        S1 = np.maximum(0,Z1)#隱藏層激活函數
        score = S1.dot(W2) + b2#輸出層

        if Y is None:
            return score
        loss = None
        ###############################
        #TODO:forward pass 
        #computing the loss of the net 
        ################################
        exp_scores = np.exp(score)
        probs = exp_scores / np.sum(exp_scores,axis=1,keepdims=True)
        #數據損失
        data_loss = -1.0/ N * np.log(probs[np.arange(N),Y]).sum()
        #正則損失
        reg_loss = 0.5*reg*(np.sum(W1*W1) + np.sum(W2*W2))
        #總損失
        loss = data_loss + reg_loss
        ################################
        #TODO:backward pass
        #computing the gradient
        ################################
        grads = {}
        dscores = probs
        dscores[np.arange(N),Y] -= 1
        dscores /= N
        #更新W2B2
        grads['W2'] = S1.T.dot(dscores) + reg *W2
        grads['b2'] = np.sum(dscores,axis = 0)

        #第二層

        dhidden = dscores.dot(W2.T)
        dhidden[S1<=0] = 0

        grads['W1'] = X.T.dot(dhidden) + reg *W1
        grads['b1'] = np.sum(dhidden,axis = 0)

        return loss,grads

2.2 訓練函數

訓練參數依然是

學習率learning_rate

正則係數reg

訓練步數num_iters

每次訓練的採樣數量batch_size

一、進入循環中，首先採樣必定數據，batch_inx = np.random.choice(num_train, batch_size)

表明從0-》num_train中隨機產生batch_size 個數，這些數據其實反應這採樣樣本的索引

值，而後咱們用X_batch = X[batch_inx,:]能夠獲取到該索引所對應的數據

for it in range(num_iters):
            X_batch = None
            y_batch = None

            #########################################################################
            # TODO: Create a random minibatch of training data and labels, storing  #
            # them in X_batch and y_batch respectively.                             #
            #########################################################################
            batch_inx = np.random.choice(num_train, batch_size)
            X_batch = X[batch_inx,:]
            y_batch = y[batch_inx]

二、取樣數據後須要計算損失值和梯度。

# Compute loss and gradients using the current minibatch
            loss, grads = self.loss(X_batch, Y=y_batch, reg=reg)
            loss_history.append(loss)

三、計算完損失以後，須要根據梯度值去更新參數W一、W二、b一、b2。

梯度反映着它的最大變化方向，若是梯度是正的表示增加，咱們應該反方向去調控，因此在其基礎上

減去學習率乘以梯度值。

            #########################################################################
            # TODO: Use the gradients in the grads dictionary to update the         #
            # parameters of the network (stored in the dictionary self.params)      #
            # using stochastic gradient descent. You'll need to use the gradients   #
            # stored in the grads dictionary defined above.                         #
            #########################################################################
            self.params['W1'] -= learning_rate * grads['W1']
            self.params['b1'] -= learning_rate * grads['b1']
            self.params['W2'] -= learning_rate * grads['W2']
            self.params['b2'] -= learning_rate * grads['b2']

四、實時驗證

在神經網絡訓練中咱們加入一個實時驗證，沒訓練一次，咱們比較如下訓練集與預測值的真實程度，

驗證集與預測值的真實程度。在最後時能夠將這條曲線繪製觀測一下。　　

# Every epoch, check train and val accuracy and decay learning rate.
            if it % iterations_per_epoch == 0:
                # Check accuracy
                train_acc = (self.predict(X_batch) == y_batch).mean()
                val_acc = (self.predict(X_val) == y_val).mean()
                train_acc_history.append(train_acc)
                val_acc_history.append(val_acc)
                # Decay learning rate
                learning_rate *= learning_rate_decay

最終總代碼以下所示：

def train(self, X, y, X_val, y_val,
            learning_rate=1e-3, learning_rate_decay=0.95,
            reg=1e-5, num_iters=100,
            batch_size=200, verbose=False):
        """
        Train this neural network using stochastic gradient descent.

        Inputs:
        - X: A numpy array of shape (N, D) giving training data.
        - y: A numpy array f shape (N,) giving training labels; y[i] = c means that
        X[i] has label c, where 0 <= c < C.
        - X_val: A numpy array of shape (N_val, D) giving validation data.
        - y_val: A numpy array of shape (N_val,) giving validation labels.
        - learning_rate: Scalar giving learning rate for optimization.
        - learning_rate_decay: Scalar giving factor used to decay the learning rate
        after each epoch.
        - reg: Scalar giving regularization strength.
        - num_iters: Number of steps to take when optimizing.
        - batch_size: Number of training examples to use per step.
        - verbose: boolean; if true print progress during optimization.
        """
        self.hyper_params = {}
        self.hyper_params['learning_rate'] = learning_rate
        self.hyper_params['reg'] = reg
        self.hyper_params['batch_size'] = batch_size
        self.hyper_params['hidden_size'] = self.params['W1'].shape[1]
        self.hyper_params['num_iter'] = num_iters

        num_train = X.shape[0]
        iterations_per_epoch = max(num_train / batch_size, 1)

        # Use SGD to optimize the parameters in self.model
        loss_history = []
        train_acc_history = []
        val_acc_history = []

        for it in range(num_iters):
            X_batch = None
            y_batch = None

            #########################################################################
            # TODO: Create a random minibatch of training data and labels, storing  #
            # them in X_batch and y_batch respectively.                             #
            #########################################################################
            batch_inx = np.random.choice(num_train, batch_size)
            X_batch = X[batch_inx,:]
            y_batch = y[batch_inx]
            #########################################################################
            #                             END OF YOUR CODE                          #
            #########################################################################

            # Compute loss and gradients using the current minibatch
            loss, grads = self.loss(X_batch, Y=y_batch, reg=reg)
            loss_history.append(loss)

            #########################################################################
            # TODO: Use the gradients in the grads dictionary to update the         #
            # parameters of the network (stored in the dictionary self.params)      #
            # using stochastic gradient descent. You'll need to use the gradients   #
            # stored in the grads dictionary defined above.                         #
            #########################################################################
            self.params['W1'] -= learning_rate * grads['W1']
            self.params['b1'] -= learning_rate * grads['b1']
            self.params['W2'] -= learning_rate * grads['W2']
            self.params['b2'] -= learning_rate * grads['b2']
            #########################################################################
            #                             END OF YOUR CODE                          #
            #########################################################################

            if verbose and it % 100 == 0:
                print ('iteration %d / %d: loss %f' % (it, num_iters, loss))

            # Every epoch, check train and val accuracy and decay learning rate.
            if it % iterations_per_epoch == 0:
                # Check accuracy
                train_acc = (self.predict(X_batch) == y_batch).mean()
                val_acc = (self.predict(X_val) == y_val).mean()
                train_acc_history.append(train_acc)
                val_acc_history.append(val_acc)
                # Decay learning rate
                learning_rate *= learning_rate_decay

        return {
        'loss_history': loss_history,
        'train_acc_history': train_acc_history,
        'val_acc_history': val_acc_history,
        }

訓練時間可能稍微較長，等待一段時間後能夠看到以下結果

2.3 predict函數

預測和以前相似，將數據帶入損失，找分數最大值便可

def predict(self, X):

        y_pred = None

        scores = self.loss(X)
        y_pred = np.argmax(scores, axis=1)

        return y_pred

訓練結果以下所示

2.4 可視化結果

訓練完以後咱們能夠進行可視化觀察，咱們把訓練時的loss顯示出來，還有實時比較的偏差拿出來看看。

測試代碼以下：

#step1 數據裁剪
#數據量太大，咱們從新整理數據，提取一部分訓練數據、測試數據、驗證數據

num_training = 49000#訓練集數量
num_validation = 1000#驗證集數量
num_test = 1000#測試集數量
num_dev = 500
Data = load_CIFAR10()
CIFAR10_Data = './'
X_train,Y_train,X_test,Y_test = Data.load_CIFAR10(CIFAR10_Data)#load the data

#從訓練集中截取一部分數據做爲驗證集
mask = range(num_training,num_training + num_validation)
X_val = X_train[mask]
Y_val = Y_train[mask]

#訓練集前一部分數據保存爲訓練集
mask = range(num_training)
X_train = X_train[mask]
Y_train = Y_train[mask]

#訓練集數量太大，咱們實驗只要一部分做爲開發集
mask = np.random.choice(num_training,num_dev,replace = False)
X_dev = X_train[mask]
Y_dev = Y_train[mask]

#測試集也太大，變小
mask = range(num_test)
X_test = X_test[mask]
Y_test = Y_test[mask]


#step2 數據預處理
#全部數據準變爲二位數據，方便處理
X_train = np.reshape(X_train,(X_train.shape[0],-1))
X_val = np.reshape(X_val,(X_val.shape[0],-1))
X_test = np.reshape(X_test,(X_test.shape[0],-1))
X_dev = np.reshape(X_dev,(X_dev.shape[0],-1))

print('Traing data shape', X_train.shape)
print('Validation data shape',X_val.shape)
print('Test data shape',X_test.shape)
print('Dev data shape',X_dev.shape)

#step3訓練數據
input_size = 32*32*3
hidden_size = 50
num_classes = 10

net = TwoLayerNet(input_size,hidden_size,num_classes)
#訓練
sta = net.train(X_train,Y_train,X_val,Y_val,num_iters=1000,batch_size=200,learning_rate=4e-4,learning_rate_decay=0.95,reg=0.7,verbose=True)

#step4預測數據
val = (net.predict(X_val) == Y_val).mean()
print(val)

#step5可視化效果
plt.subplot(2,1,1)
plt.plot(sta['loss_history'])
plt.ylabel('loss')
plt.xlabel('Iteration')
plt.title('Loss_History')

plt.subplot(2,1,2)
plt.plot(sta['train_acc_history'],label = 'train')
plt.plot(sta['val_acc_history'],label = 'val')
plt.xlabel('epoch')
plt.ylabel('Classfication accuracy')
plt.show()

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。