resnet是何凱明大神在2015年提出的.而且得到了當年的ImageNet比賽的冠軍. 殘差網絡具備里程碑的意義,爲之後的網絡設計提出了一個新的思路.
googlenet的思路是加寬每個layer,resnet的思路是加深layer.python
論文地址:https://arxiv.org/abs/1512.03385
論文裏指出,隨着網絡深度的增長,模型表現並無更好,即所謂的網絡退化.注意,不是過擬合,而是更深層的網絡即使是train error也比淺層網絡更高.
這說明,深層網絡沒有學習到合理的參數.git
而後,大神就開始開腦洞了,提出了殘差結構,也叫shortcut connection:
之前學習的是F(x)(就是每一層的映射關係,輸入x,輸出F(x)),如今學的是F(x)-x,那爲啥學習F(x)-x就更容易呢?
關於殘差網絡爲什麼有效的分析,參考:https://zhuanlan.zhihu.com/p/80226180
目前並無一個統一的結論,我比較傾向於模型集成這個說法.
github
殘差網絡就能夠被看做是一系列路徑集合組裝而成的一個集成模型,其中不一樣的路徑包含了不一樣的網絡層子集。Andreas Veit等人展開了幾組實驗(Lesion study),在測試時,刪去殘差網絡的部分網絡層(即丟棄一部分路徑)、或交換某些網絡模塊的順序(改變網絡的結構,丟棄一部分路徑的同時引入新路徑)。實驗結果代表,網絡的表現與正確網絡路徑數平滑相關(在路徑變化時,網絡表現沒有劇烈變化),這代表殘差網絡展開後的路徑具備必定的獨立性和冗餘性,使得殘差網絡表現得像一個集成模型(ensemble)網絡
大神的思路咱跟不上,管他孃的爲啥有效呢,深度學習裏的玄學事情還少嗎,這種問題留給科學家去研究吧. 我們用深度學習是來作產品的,實際提升生產力的.
咱們來看看resnet模型結構.
ide
按照論文裏的34-layer這個來實現.
仔細看上面兩個圖可知,殘差塊用的卷積核爲kernel_size=3.模型的conv3_1,conv4_1,conv5_1以前作了寬高減半的downsample.conv2_x是經過maxpool(stride=2)完成的下采樣.其他的是經過conv2d(stride=2)完成的.函數
resnet採起了和vgg相似的堆疊結構,只不過vgg堆疊的是連續卷積核,resnet堆疊的是連續殘差塊.和vgg同樣,越日後面的層,channel相較於前面的layer翻倍,h,w減半.學習
代碼不是一蹴而就的,具體如何一步步實現能夠去看github提交的history.測試
殘差塊的實現注意兩點優化
class Residual(nn.Module): def __init__(self,in_channels,out_channels,stride=1): super(Residual,self).__init__() self.stride = stride self.conv1 = nn.Conv2d(in_channels,out_channels,kernel_size=3,stride=stride,padding=1) self.bn1 = nn.BatchNorm2d(out_channels) self.relu = nn.ReLU(inplace=True) self.conv2 = nn.Conv2d(out_channels,out_channels,kernel_size=3,padding=1) self.bn2 = nn.BatchNorm2d(out_channels) # x卷積後shape發生改變,好比:x:[1,64,56,56] --> [1,128,28,28],則須要1x1卷積改變x if in_channels != out_channels: self.conv1x1 = nn.Conv2d(in_channels,out_channels,kernel_size=1,stride=stride) else: self.conv1x1 = None def forward(self,x): # print(x.shape) o1 = self.relu(self.bn1(self.conv1(x))) # print(o1.shape) o2 = self.bn2(self.conv2(o1)) # print(o2.shape) if self.conv1x1: x = self.conv1x1(x) out = self.relu(o2 + x) return out
在卷積層完成特徵提取後, 每張圖能夠獲得512個7x7的feature map.作全局平均池化後獲得512個feature.再傳入全鏈接層作特徵的線性組合獲得num_classes個類別.google
咱們來實現34-layer的resnet
class ResNet(nn.Module): def __init__(self,in_channels,num_classes): super(ResNet,self).__init__() self.conv1 = nn.Sequential( nn.Conv2d(in_channels,64,kernel_size=7,stride=2,padding=3), nn.BatchNorm2d(64), nn.ReLU(inplace=True) ) self.conv2 = nn.Sequential( nn.MaxPool2d(kernel_size=3,stride=2,padding=1), Residual(64,64), Residual(64,64), Residual(64,64), ) self.conv3 = nn.Sequential( Residual(64,128,stride=2), Residual(128,128), Residual(128,128), Residual(128,128), Residual(128,128), ) self.conv4 = nn.Sequential( Residual(128,256,stride=2), Residual(256,256), Residual(256,256), Residual(256,256), Residual(256,256), Residual(256,256), ) self.conv5 = nn.Sequential( Residual(256,512,stride=2), Residual(512,512), Residual(512,512), ) # self.avg_pool = nn.AvgPool2d(kernel_size=7) self.avg_pool = nn.AdaptiveAvgPool2d(1) #代替AvgPool2d以適應不一樣size的輸入 self.fc = nn.Linear(512,num_classes) def forward(self,x): out = self.conv1(x) out = self.conv2(out) out = self.conv3(out) out = self.conv4(out) out = self.conv5(out) out = self.avg_pool(out) out = out.view((x.shape[0],-1)) out = self.fc(out) return out
接下來就仍是熟悉的套路
batch_size,num_workers=32,2 train_iter,test_iter = learntorch_utils.load_data(batch_size,num_workers,resize=48) print('load data done,batch_size:%d' % batch_size)
net = ResNet(1,10).cuda()
l = nn.CrossEntropyLoss()
opt = torch.optim.Adam(net.parameters(),lr=0.01)
num_epochs=5 def test(): acc_sum = 0 batch = 0 for X,y in test_iter: X,y = X.cuda(),y.cuda() y_hat = net(X) acc_sum += (y_hat.argmax(dim=1) == y).float().sum().item() batch += 1 test_acc = acc_sum/(batch*batch_size) # print('test acc:%f' % test_acc) return test_acc
def train(): for epoch in range(num_epochs): train_l_sum,batch,train_acc_sum=0,1,0 start = time.time() for X,y in train_iter: X,y = X.cuda(),y.cuda() #把tensor放到顯存 y_hat = net(X) #前向傳播 loss = l(y_hat,y) #計算loss,nn.CrossEntropyLoss中會有softmax的操做 opt.zero_grad()#梯度清空 loss.backward()#反向傳播,求出梯度 opt.step()#根據梯度,更新參數 # 數據統計 train_l_sum += loss.item() train_acc_sum += (y_hat.argmax(dim=1) == y).float().sum().item() train_loss = train_l_sum/(batch*batch_size) train_acc = train_acc_sum/(batch*batch_size) if batch % 100 == 0: #每100個batch輸出一次訓練數據 print('epoch %d,batch %d,train_loss %.3f,train_acc:%.3f' % (epoch,batch,train_loss,train_acc)) if batch % 300 == 0: #每300個batch測試一次 test_acc = test() print('epoch %d,batch %d,test_acc:%.3f' % (epoch,batch,test_acc)) batch += 1 end = time.time() time_per_epoch = end - start print('epoch %d,batch_size %d,train_loss %f,time %f' % (epoch + 1,batch_size ,train_l_sum/(batch*batch_size),time_per_epoch)) test() train()
輸出以下:
load data done,batch_size:32 epoch 0,batch 100,train_loss 0.082,train_acc:0.185 epoch 0,batch 200,train_loss 0.065,train_acc:0.297 epoch 0,batch 300,train_loss 0.053,train_acc:0.411 epoch 0,batch 300,test_acc:0.684 epoch 0,batch 400,train_loss 0.046,train_acc:0.487 epoch 0,batch 500,train_loss 0.041,train_acc:0.539 epoch 0,batch 600,train_loss 0.038,train_acc:0.578 epoch 0,batch 600,test_acc:0.763 epoch 0,batch 700,train_loss 0.035,train_acc:0.604 epoch 0,batch 800,train_loss 0.033,train_acc:0.628 epoch 0,batch 900,train_loss 0.031,train_acc:0.647 epoch 0,batch 900,test_acc:0.729 epoch 0,batch 1000,train_loss 0.030,train_acc:0.661 epoch 0,batch 1100,train_loss 0.029,train_acc:0.674 epoch 0,batch 1200,train_loss 0.028,train_acc:0.686 epoch 0,batch 1200,test_acc:0.802 epoch 0,batch 1300,train_loss 0.027,train_acc:0.696