【深度學習】Pytorch 學習筆記

時間 2019-11-11

標籤深度學習 pytorch 學習筆記简体版

原文原文鏈接

目錄html

學習網址：https://www.youtube.com/watch?v=ogZi5oIo4fI
有道雲筆記:http://note.youdao.com/noteshare?id=d86bd8fc60cb4fe87005a2d2e2d5b70d&sub=6911732F9FA44C68AD53A09072155ED3python

Pytorch Leture 05: Linear Rregression in the Pytorch Way

第一部分，使用一個類來構建你的模型，須要寫forward函數git

import torch
from torch.autograd import Variable
import matplotlib.pyplot as plt

x_data = Variable(torch.Tensor([[1.0], [2.0], [3.0]]))
y_data = Variable(torch.Tensor([[2.0], [4.0], [6.0]]))


class Model(torch.nn.Module):

    def __init__(self):
        """
        In the constructor we instantiate two nn.Linear module
        """
        super(Model, self).__init__()
        self.linear = torch.nn.Linear(1, 1)  # One in and one out

    def forward(self, x):
        """
        In the forward function we accept a Variable of input data and we must return
        a Variable of output data. We can use Modules defined in the constructor as
        well as arbitrary operators on Variables.
        """
        y_pred = self.linear(x)
        return y_pred

# our model
model = Model()

第二部分，構建loss和優化器來進行參數計算github

# Construct our loss function and an Optimizer. The call to model.parameters()
# in the SGD constructor will contain the learnable parameters of the two
# nn.Linear modules which are members of the model.
# criterion 標準準則 主要用來計算loss
criterion = torch.nn.MSELoss(size_average=False)
# 優化器
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

第三部分，進行訓練，forward -> backward -> update parameters網絡

# Training loop
for epoch in range(1000):
    # Forward pass: Compute predicted y by passing x to the model
    y_pred = model(x_data)

    # Compute and print loss
    loss = criterion(y_pred, y_data)
    print(epoch, loss.data[0])

    # Zero gradients, perform a backward pass, and update the weights.
    # initialize the gradients
    optimizer.zero_grad()
    # 反向傳遞
    loss.backward()
    # 更新優化器中的權重，即model.parrameters
    optimizer.step()

第四部分，測試cors

# After training
hour_var = Variable(torch.Tensor([[4.0]]))
y_pred = model(hour_var)
print("predict (after training)",  4, model(hour_var).data[0][0])

總結一下基本的訓練框架：框架

經過寫一個類，來構造你的模型
構建loss和優化器
開始訓練 Forward -> compute loss -> backward -> update

Forward: y_pred = model(x_data)
Compute loss: loss = criterion(y_pred,y_data)
Backward: optimizer.zero_grad() && loss.backward()
Update: optimizer.step()

做業測試其餘optimizers:ide

torch.optim.Adagrad
torch.optim.Adam
torch.optim.Adamax
torch.optim.ASGD
torch.optim.LBFGS
torch.optim.RRRMSprop
torch.optim.Rprop
torch.optim.SGD

Logistic Regression 邏輯迴歸 - 二分類

原來的：函數

graph LR

x-->Linear
Linear-->y

\hat{y} = x * w + b

loss = \frac{1}{N}\sum_{n=1}^{N}(\hat{y_n}-y_n)^2

激活函數：oop

using sigmoid functions:

graph LR

x --> Linear
Linear --> Sigmoid
Sigmoid --> y

Y 介於 [0,1] 之間，這樣作能夠用來壓縮計算量，讓計算更加容易

\sigma(z) = \frac{1}{1+e^{-z}}

\hat{y} = \sigma(x*w+b)

loss=-\frac{1}{N}\sum_{n=1}^{N}y_nlog\hat{y_n} + (1-y_n)log(1-\hat{y_n})

代碼：

import torch
from torch.autograd import Variable
import torch.nn.functional as F

x_data = Variable(torch.Tensor([[1.0], [2.0], [3.0], [4.0],[5.0]]))
y_data = Variable(torch.Tensor([[0.], [0.], [1.], [1.],[1.]]))


class Model(torch.nn.Module):

    def __init__(self):
        """
        In the constructor we instantiate nn.Linear module
        """
        super(Model, self).__init__()
        self.linear = torch.nn.Linear(1, 1)  # One in and one out

    def forward(self, x):
        """
        In the forward function we accept a Variable of input data and we must return
        a Variable of output data.
        """
        y_pred = F.sigmoid(self.linear(x))
        return y_pred

# our model
model = Model()


# Construct our loss function and an Optimizer. The call to model.parameters()
# in the SGD constructor will contain the learnable parameters of the two
# nn.Linear modules which are members of the model.
criterion = torch.nn.BCELoss(size_average=True)
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

# Training loop
for epoch in range(400):
        # Forward pass: Compute predicted y by passing x to the model
    y_pred = model(x_data)

    # Compute and print loss
    loss = criterion(y_pred, y_data)
    print(epoch, loss.data[0])

    # Zero gradients, perform a backward pass, and update the weights.
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

# After training
hour_var = Variable(torch.Tensor([[0.0]]))
print("predict 1 hour ", 0.0, model(hour_var).data[0][0] > 0.5)
hour_var = Variable(torch.Tensor([[7.0]]))
print("predict 7 hours", 7.0, model(hour_var).data[0][0] > 0.5)

新增激活函數：

Design your model using class

y_Pred = F.sigmoid(self.linear(x))

Construct loss and optimizer

change loss into:

criterion = torch.nn.BCELoss(size_average=True)

Training cycle (forward,backward,update)

做業：嘗試其餘激活函數：

ReLu

ReLU是修正線性單元（The Rectified Linear Unit）的簡稱，近些年來在深度學習中使用得不少，能夠解決梯度彌散問題，由於它的導數等於1或者就是0。相對於sigmoid和tanh激勵函數，對ReLU求梯度很是簡單，計算也很簡單，能夠很是大程度地提高隨機梯度降低的收斂速度。（由於ReLU是線性的，而sigmoid和tanh是非線性的）。但ReLU的缺點是比較脆弱，隨着訓練的進行，可能會出現神經元死亡的狀況，例若有一個很大的梯度流經ReLU單元后，那權重的更新結果多是，在此以後任何的數據點都沒有辦法再激活它了。若是發生這種狀況，那麼流經神經元的梯度從這一點開始將永遠是0。也就是說，ReLU神經元在訓練中不可逆地死亡了。

ReLu6
ELU

ELU在正值區間的值爲x自己，這樣減輕了梯度彌散問題（x>0區間導數到處爲1），這點跟ReLU、Leaky ReLU類似。而在負值區間，ELU在輸入取較小值時具備軟飽和的特性，提高了對噪聲的魯棒性

SELU
PReLU
LeakyReLu

Leaky ReLU主要是爲了不梯度消失，當神經元處於非激活狀態時，容許一個非0的梯度存在，這樣不會出現梯度消失，收斂速度快。它的優缺點跟ReLU相似。

Threshold
Hardtanh

tanh函數將輸入值壓縮至-1到1之間。該函數與Sigmoid相似，也存在着梯度彌散或梯度飽和的缺點。

Sigmoid

這應該是神經網絡中使用最頻繁的激勵函數了，它把一個實數壓縮至0到1之間，當輸入的數字很是大的時候，結果會接近1，當輸入很是大的負數時，則會獲得接近0的結果。在早期的神經網絡中使用得很是多，由於它很好地解釋了神經元受到刺激後是否被激活和向後傳遞的場景（0：幾乎沒有被激活，1：徹底被激活），不過近幾年在深度學習的應用中比較少見到它的身影，由於使用sigmoid函數容易出現梯度彌散或者梯度飽和。當神經網絡的層數不少時，若是每一層的激勵函數都採用sigmoid函數的話，就會產生梯度彌散的問題，由於利用反向傳播更新參數時，會乘以它的導數，因此會一直減少。若是輸入的是比較大或者比較小的數（例如輸入100，經Sigmoid函數後結果接近於1，梯度接近於0），會產生飽和效應，致使神經元相似於死亡狀態。

Tanh

Lecture07: How to make netural network wide and deep ?

graph LR
a-->Linear
b-->Linear
Linear-->Sigmoid
Sigmoid-->y

多維度，更層次的網絡，主要在Design your model using class 中進行的改變

import torch
from torch.autograd import Variable
import numpy as np

xy = np.loadtxt('./data/diabetes.csv.gz', delimiter=',', dtype=np.float32)
x_data = Variable(torch.from_numpy(xy[:, 0:-1]))
y_data = Variable(torch.from_numpy(xy[:, [-1]]))

print(x_data.data.shape)
print(y_data.data.shape)


class Model(torch.nn.Module):

    def __init__(self):
        """
        In the constructor we instantiate two nn.Linear module
        """
        super(Model, self).__init__()
        self.l1 = torch.nn.Linear(8, 6)
        self.l2 = torch.nn.Linear(6, 4)
        self.l3 = torch.nn.Linear(4, 1)

        self.sigmoid = torch.nn.Sigmoid()

    def forward(self, x):
        """
        In the forward function we accept a Variable of input data and we must return
        a Variable of output data. We can use Modules defined in the constructor as
        well as arbitrary operators on Variables.
        """
        out1 = self.sigmoid(self.l1(x))
        out2 = self.sigmoid(self.l2(out1))
        y_pred = self.sigmoid(self.l3(out2))
        return y_pred

# our model
model = Model()


# Construct our loss function and an Optimizer. The call to model.parameters()
# in the SGD constructor will contain the learnable parameters of the two
# nn.Linear modules which are members of the model.
#criterion = torch.nn.BCELoss(size_average=True)
criterion = torch.nn.BCELoss(reduction='elementwise_mean')
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)

# Training loop
for epoch in range(1200000):
        # Forward pass: Compute predicted y by passing x to the model
    y_pred = model(x_data)

    # Compute and print loss
    loss = criterion(y_pred, y_data)
    print(epoch, loss.item())

    # Zero gradients, perform a backward pass, and update the weights.
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

做業：

10層以上的更深層測的網絡進行訓練
發現並無由於更深，效果變好
更改激勵函數

Lecture 08: Pytorch DataLoader

構造Datasets主要分爲三個過程：

繼承自Dataset

download, rerad data etc
return one item on the index
return the data length

實例化一個dataset,在Dataloader中使用：

train_loader = DataLoader(dataset=dataset,
                          batch_size=1,
                          shuffle=True,
                          num_workers=1)

Code:

# References
# https://github.com/yunjey/pytorch-tutorial/blob/master/tutorials/01-basics/pytorch_basics/main.py
# http://pytorch.org/tutorials/beginner/data_loading_tutorial.html#dataset-class
import torch
import numpy as np
from torch.autograd import Variable
from torch.utils.data import Dataset, DataLoader


class DiabetesDataset(Dataset):
    """ Diabetes dataset."""

    # Initialize your data, download, etc.
    def __init__(self):
        xy = np.loadtxt('./data/diabetes.csv.gz',
                        delimiter=',', dtype=np.float32)
        self.len = xy.shape[0]
        self.x_data = torch.from_numpy(xy[:, 0:-1])
        self.y_data = torch.from_numpy(xy[:, [-1]])

    def __getitem__(self, index):
        return self.x_data[index], self.y_data[index]

    def __len__(self):
        return self.len


dataset = DiabetesDataset()
train_loader = DataLoader(dataset=dataset,
                          batch_size=1,
                          shuffle=True,
                          num_workers=1)

for epoch in range(2):
    for i, data in enumerate(train_loader, 0):
        # get the inputs
        inputs, labels = data

        # wrap them in Variable
        inputs, labels = Variable(inputs), Variable(labels)

        # Run your training process
        print(epoch, i, "inputs", inputs.data, "labels", labels.data)

課後做業：
使用其餘數據集，MNIST，參考了官網的代碼：

總結一下訓練的思路：

構造繼承自Dataset的本身的datasets類

[ ] 讀取數據集，np.loadtxt("datas.csv") ，構建trainset testset
[ ] 構建DataLoader: 獲得trainLoader , testLoader
[ ] 從DataLoader中獲取數據： dataiter = iter(trainloader) images, labels = dataiter.next()
[ ] 訓練
[ ] 測試

Lecture 09: softmax Classifier

part one

MNist softmax

before:

graph LR

x{x} --> Linear
Linear --> Activation
Activation --> ...
... --> Linear2
Linear2-->Activation2
Activation2-->h{y}

now:

graph LR

x{x} --> Linear
Linear --> Activation
Activation --> ...
... --> Linear2
Linear2-->Activation2
Activation2-->P_y=0
Activation2-->P_y=1
Activation2-->....
Activation2-->P_y=10

what is softmax?

\sigma(z)_j = \frac{e^{z_j}}{\sum_{k=1}^{K}e^{z_k}} for j=1,2,...,k

using softmax to get probabilities.

what is corss entropy?

loss = \frac{1}{N}\sum_i D(Softmax(wx_i+b),Y_i)

D(\hat{Y},Y) = -Ylog\hat{Y}

整個過程:

graph LR
x--LinearModel-->Z
Z--Softmax-->y'
y'--Cross_Entropy-->Y

Pytorch中的實現：

loss = torch.nn.CrossEntropyLoss()
這個中既包括了Softmax也包括了Cross_Entropy

graph LR
X--Softmax-->y'
y'--Cross_Entropy-->Y

Code:

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from torch.autograd import Variable


# Cross entropy example
import numpy as np
# One hot
# 0: 1 0 0
# 1: 0 1 0
# 2: 0 0 1
Y = np.array([1, 0, 0])

Y_pred1 = np.array([0.7, 0.2, 0.1])
Y_pred2 = np.array([0.1, 0.3, 0.6])
print("loss1 = ", np.sum(-Y * np.log(Y_pred1)))
print("loss2 = ", np.sum(-Y * np.log(Y_pred2)))

################################################################################

# Softmax + CrossEntropy (logSoftmax + NLLLoss)
loss = nn.CrossEntropyLoss()

# target is of size nBatch
# each element in target has to have 0 <= value < nClasses (0-2)
# Input is class, not one-hot
Y = Variable(torch.LongTensor([0]), requires_grad=False)

# input is of size nBatch x nClasses = 1 x 4
# Y_pred are logits (not softmax)
Y_pred1 = Variable(torch.Tensor([[2.0, 1.0, 0.1]]))
Y_pred2 = Variable(torch.Tensor([[0.5, 2.0, 0.3]]))

l1 = loss(Y_pred1, Y)
l2 = loss(Y_pred2, Y)

print("PyTorch Loss1 = ", l1.data, "\nPyTorch Loss2=", l2.data)

print("Y_pred1=", torch.max(Y_pred1.data, 1)[1])
print("Y_pred2=", torch.max(Y_pred2.data, 1)[1])


################################################################################
"""Batch loss"""
# target is of size nBatch
# each element in target has to have 0 <= value < nClasses (0-2)
# Input is class, not one-hot
Y = Variable(torch.LongTensor([2, 0, 1]), requires_grad=False)

# input is of size nBatch x nClasses = 2 x 4
# Y_pred are logits (not softmax)
Y_pred1 = Variable(torch.Tensor([[0.1, 0.2, 0.9],
                                 [1.1, 0.1, 0.2],
                                 [0.2, 2.1, 0.1]]))


Y_pred2 = Variable(torch.Tensor([[0.8, 0.2, 0.3],
                                 [0.2, 0.3, 0.5],
                                 [0.2, 0.2, 0.5]]))

l1 = loss(Y_pred1, Y)
l2 = loss(Y_pred2, Y)

print("Batch Loss1 = ", l1.data, "\nBatch Loss2=", l2.data)

做業：CrossEntropyLoss VS NLLLoss ?

part two : real problem - MNIST input

MNIST Network

graph LR
inputLayer -.-> HiddenLayer
HiddenLayer -.-> OutputLayer

Code:

# https://github.com/pytorch/examples/blob/master/mnist/main.py
from __future__ import print_function
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from torch.autograd import Variable

# Training settings
batch_size = 16

# MNIST Dataset
train_dataset = datasets.MNIST(root='./mnist_data/',
                               train=True,
                               transform=transforms.ToTensor(),
                               download=True)

test_dataset = datasets.MNIST(root='./mnist_data/',
                              train=False,
                              transform=transforms.ToTensor())

# Data Loader (Input Pipeline)
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                           batch_size=batch_size,
                                           shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size,
                                          shuffle=False)


class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        self.l1 = nn.Linear(784, 520)
        self.l2 = nn.Linear(520, 320)
        self.l3 = nn.Linear(320, 240)
        self.l4 = nn.Linear(240, 120)
        self.l5 = nn.Linear(120, 10)

    def forward(self, x):
        x = x.view(-1, 784)  # Flatten the data (n, 1, 28, 28)-> (n, 784)
        x = F.relu(self.l1(x))
        x = F.relu(self.l2(x))
        x = F.relu(self.l3(x))
        x = F.relu(self.l4(x))
        return self.l5(x)


model = Net()

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.5)


def train(epoch):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = Variable(data), Variable(target)
        optimizer.zero_grad()
        output = model(data)
        loss = criterion(output, target)
        loss.backward()
        optimizer.step()
        if batch_idx % 10 == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                100. * batch_idx / len(train_loader), loss.data[0]))


def test():
    model.eval()
    test_loss = 0
    correct = 0
    for data, target in test_loader:
        data, target = Variable(data, volatile=True), Variable(target)
        output = model(data)
        # sum up batch loss
        test_loss += criterion(output, target).data[0]
        # get the index of the max
        pred = output.data.max(1, keepdim=True)[1]
        correct += pred.eq(target.data.view_as(pred)).cpu().sum()

    test_loss /= len(test_loader.dataset)
    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))


for epoch in range(1, 10):
    train(epoch)
    test()

做業：

Use DataLoader

Lecture 10 : basic CNN

Simple convolution layer

for Example:

graph LR
3*3*1_image-->2*2*1_filter_W
3*3*1_image-->1*1_Stride
3*3*1_image-->NoPadding
NoPadding-->2*2_featureMap
2*2*1_filter_W-->2*2_featureMap
1*1_Stride-->2*2_featureMap

How to compute multi-dimension pictures ?

32 * 32 * 3 image
5 * 5 * 3 filter W

w^T + b

Get: 28 * 28 * 1 feature map * N (how many filters you used)

計算公式

OutputSize = \frac{(InputSize+PaddingSize*2-FilterSize)}{Stride} + 1

幾個須要解釋的參數：

CONV

卷積層，須要配合激活函數使用
filter and padding and filterSize using function above to calculate

torch.nn.Conv2d(in_channels,out_channels,kernel_size)

self.conv1=nn.Conv2d(1,10,kernel_size=5)

激活函數

activate functions

Max Pooling

選取一個n*m的Filter中最大的值做爲pooling的結果
還有相似的avg Pooling

nn.MaxPool2d(kernel_size)

self.mp = nn.MaxPool2d(2)

全鏈接層

self.fc = nn.Linear(320,10)

CNN & Fully Connected network 區別

CNN中的神經元不是跟每一個像素都相連

Fully Connected network中的神經元是跟每一個像素都相連。

implement of Simple CNN

graph TB


ConvolutionalLayer1 --> PoolingLayer1
PoolingLayer1 --> ConvolutionalLayer2
ConvolutionalLayer2 --> PoolingLayer2
PoolingLayer2 --> Fully-ConnectedLayer

Model:

class Net(nn.Module):
    def __init__(self):
        super(Net,self).__init__()
        self.conv1 = nn.Conv2d(1,10,kernel_size=5)
        self.conv2 = nn.Conv2d(10,20,kernel_size=5)
        self.mp = nn.MaxPool2d(2)
        self.fc = nn.Linear(???,10)
    def forward(self,x):
        in_size = x.size(0)
        x = F.relu(self.mp(self.conv1(x)))
        x = F.relu(self.mp(self.conv2(x)))
        x = x.view(in_size,-1) # flatten the tensor
        x = self.fc(x)
        return F.log_softmax(x)

??? 處如何填寫

??? 處能夠隨意先填一個數值，而後經過程序的報錯來填寫
還能夠在forward函數中print(x.size())獲得tensor的維度

做業：

嘗試更深層次的網絡，更深的全鏈接層

Lecture 11 Advanced CNN

Why 1*1 convolution ?

using 32 1*1 filters to turn 64-dimension pic into 32-dimension pic.

using 1*1 filters can significantly save our computations.

Inception Module

graph LR

Filter_concat_in --> 1*1Conv0_16
Filter_concat_in --> 1*1Conv1_16
Filter_concat_in --> 1*1Conv2_16
Filter_concat_in --> AvgPooling
AvgPooling --> 1*1Conv3_16
1*1Conv0_16 --> 3*3Conv0_24
3*3Conv0_24 --> 3*3Conv1_24
3*3Conv1_24 --> Filter_Concat_out
1*1Conv1_16 --> 5*5Conv_24
5*5Conv_24 --> Filter_Concat_out
1*1Conv3_16 --> Filter_Concat_out
1*1Conv2_16 --> Filter_Concat_out

Implement

最下邊的實現（第四道）

self.brach1x1 = nn.Conv2d(in_channels,16,kernel_size=1)

branch1x1 = self.branch1x1(x)

倒數第二道

self.branch_pool = nn.Conv2d(in_channels,24,kernel_size=1)

branch_pool = F.avg_pool2d(x,kernel_size=3,stride=1,padding=1)
branch_pool = self.branch_pool(branch_pool)

正數第二道

self.branch5x5_1 = nn.Conv2d(in_channels,16,kernel_size=1)
self.branch5x5_2 = nn.Conv2d(16,24,kernel_size=1,padding=2)

branch5x5 = self.branch5x5_1(x)
branch5x5 = self.branch5x5_2(branch5x5)

第一道

self.branch3x3_1=nn.Conv2d(in_channels,16,kernel_size=1)
self.branch3x3_2=nn.Conv2d(16,24,kernel_size=3,padding=1)
self.branch3x3_3=nn.Conv2d(24,24,kernel_size=3,padding=1)

branch3x3 = self.branch3x3_1(x)
branch3x3 = self.branch3x3_2(branch3x3)
branch3x3 = self.branch3x3_3(branch3x3)

output

outputs = [branch1x1,branch_pool,branch5x5,branch3x3]

ALL CODE:

# https://github.com/pytorch/examples/blob/master/mnist/main.py
from __future__ import print_function
import argparse
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from torch.autograd import Variable

# Training settings
batch_size = 64

# MNIST Dataset
train_dataset = datasets.MNIST(root='./data/',
                               train=True,
                               transform=transforms.ToTensor(),
                               download=True)

test_dataset = datasets.MNIST(root='./data/',
                              train=False,
                              transform=transforms.ToTensor())

# Data Loader (Input Pipeline)
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                           batch_size=batch_size,
                                           shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size,
                                          shuffle=False)


class InceptionA(nn.Module):

    def __init__(self, in_channels):
        super(InceptionA, self).__init__()
        self.branch1x1 = nn.Conv2d(in_channels, 16, kernel_size=1)

        self.branch5x5_1 = nn.Conv2d(in_channels, 16, kernel_size=1)
        self.branch5x5_2 = nn.Conv2d(16, 24, kernel_size=5, padding=2)

        self.branch3x3dbl_1 = nn.Conv2d(in_channels, 16, kernel_size=1)
        self.branch3x3dbl_2 = nn.Conv2d(16, 24, kernel_size=3, padding=1)
        self.branch3x3dbl_3 = nn.Conv2d(24, 24, kernel_size=3, padding=1)

        self.branch_pool = nn.Conv2d(in_channels, 24, kernel_size=1)

    def forward(self, x):
        branch1x1 = self.branch1x1(x)

        branch5x5 = self.branch5x5_1(x)
        branch5x5 = self.branch5x5_2(branch5x5)

        branch3x3dbl = self.branch3x3dbl_1(x)
        branch3x3dbl = self.branch3x3dbl_2(branch3x3dbl)
        branch3x3dbl = self.branch3x3dbl_3(branch3x3dbl)

        branch_pool = F.avg_pool2d(x, kernel_size=3, stride=1, padding=1)
        branch_pool = self.branch_pool(branch_pool)

        outputs = [branch1x1, branch5x5, branch3x3dbl, branch_pool]
        return torch.cat(outputs, 1)


class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
        self.conv2 = nn.Conv2d(88, 20, kernel_size=5)

        self.incept1 = InceptionA(in_channels=10)
        self.incept2 = InceptionA(in_channels=20)

        self.mp = nn.MaxPool2d(2)
        self.fc = nn.Linear(1408, 10)

    def forward(self, x):
        in_size = x.size(0)
        x = F.relu(self.mp(self.conv1(x)))
        x = self.incept1(x)
        x = F.relu(self.mp(self.conv2(x)))
        x = self.incept2(x)
        x = x.view(in_size, -1)  # flatten the tensor
        x = self.fc(x)
        return F.log_softmax(x)


model = Net()

optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.5)


def train(epoch):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = Variable(data), Variable(target)
        optimizer.zero_grad()
        output = model(data)
        loss = F.nll_loss(output, target)
        loss.backward()
        optimizer.step()
        if batch_idx % 10 == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                100. * batch_idx / len(train_loader), loss.data[0]))


def test():
    model.eval()
    test_loss = 0
    correct = 0
    for data, target in test_loader:
        data, target = Variable(data, volatile=True), Variable(target)
        output = model(data)
        # sum up batch loss
        test_loss += F.nll_loss(output, target, size_average=False).data[0]
        # get the index of the max log-probability
        pred = output.data.max(1, keepdim=True)[1]
        correct += pred.eq(target.data.view_as(pred)).cpu().sum()

    test_loss /= len(test_loader.dataset)
    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))


for epoch in range(1, 10):
    train(epoch)
    test()

Lecture 12: RNN

Recurrrent NN

graph LR

X1 --> A1
A1 --> h1

X2 --> A2
A2 --> h2

X3 --> A3
A3 --> h3

X4 --> A4
A4 --> h4

A1 --> A2
A2 --> A3
A3 --> A4

Pytorch提供了RNN函數，能夠直接使用

different RNN implementations

cell = nn.RNN(input_size=4,hidden_size=2,batch_first=True)
cell = nn.GRU(input_size=4,hidden_size=2,batch_first=True)
cell = nn.LSTM(input_size=4,hidden_size=2,batch_first=True)

How to use RNN?

cell = nn.RNN(input_size=4,hidden_size=2,batch_first=True)


inputs = ... # batch_size, seq_len,inputSize
hidden = (...) # numLayers,batch_size, hidden_size

out, hidden = cell(inputs,hidden)

有兩個輸出，一個是output, 一個是hidden layer的output

# Lab 12 RNN
import sys
import torch
import torch.nn as nn
from torch.autograd import Variable

torch.manual_seed(777)  # reproducibility
#            0    1    2    3    4
idx2char = ['h', 'i', 'e', 'l', 'o']

# Teach hihell -> ihello
x_data = [0, 1, 0, 2, 3, 3]   # hihell
one_hot_lookup = [[1, 0, 0, 0, 0],  # 0
                  [0, 1, 0, 0, 0],  # 1
                  [0, 0, 1, 0, 0],  # 2
                  [0, 0, 0, 1, 0],  # 3
                  [0, 0, 0, 0, 1]]  # 4

y_data = [1, 0, 2, 3, 3, 4]    # ihello
x_one_hot = [one_hot_lookup[x] for x in x_data]

# As we have one batch of samples, we will change them to variables only once
inputs = Variable(torch.Tensor(x_one_hot))
labels = Variable(torch.LongTensor(y_data))

num_classes = 5
input_size = 5  # one-hot size
hidden_size = 5  # output from the RNN. 5 to directly predict one-hot
batch_size = 1   # one sentence
sequence_length = 1  # One by one
num_layers = 1  # one-layer rnn


class Model(nn.Module):

    def __init__(self):
        super(Model, self).__init__()
        self.rnn = nn.RNN(input_size=input_size,
                          hidden_size=hidden_size, batch_first=True)

    def forward(self, hidden, x):
        # Reshape input (batch first)
        x = x.view(batch_size, sequence_length, input_size)

        # Propagate input through RNN
        # Input: (batch, seq_len, input_size)
        # hidden: (num_layers * num_directions, batch, hidden_size)
        out, hidden = self.rnn(x, hidden)
        return hidden, out.view(-1, num_classes)

    def init_hidden(self):
        # Initialize hidden and cell states
        # (num_layers * num_directions, batch, hidden_size)
        return Variable(torch.zeros(num_layers, batch_size, hidden_size))


# Instantiate RNN model
model = Model()
print(model)

# Set loss and optimizer function
# CrossEntropyLoss = LogSoftmax + NLLLoss
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.1)

# Train the model
for epoch in range(100):
    optimizer.zero_grad()
    loss = 0
    hidden = model.init_hidden()

    sys.stdout.write("predicted string: ")
    for input, label in zip(inputs, labels):
        # print(input.size(), label.size())
        hidden, output = model(hidden, input)
        val, idx = output.max(1)
        sys.stdout.write(idx2char[idx.data[0]])
        loss += criterion(output, label)

    print(", epoch: %d, loss: %1.3f" % (epoch + 1, loss.data[0]))

    loss.backward()
    optimizer.step()

print("Learning finished!")