1、說明python
SIFT Flow 是一個標註的語義分割的數據集,有兩個label,一個是語義分類(33類),另外一個是場景標籤(3類)。git
Semantic and geometric segmentation classes for scenes. Semantic: 0 is void and 1–33 are classes. 01 awning 02 balcony 03 bird 04 boat 05 bridge 06 building 07 bus 08 car 09 cow 10 crosswalk 11 desert 12 door 13 fence 14 field 15 grass 16 moon 17 mountain 18 person 19 plant 20 pole 21 river 22 road 23 rock 24 sand 25 sea 26 sidewalk 27 sign 28 sky 29 staircase 30 streetlight 31 sun 32 tree 33 window Geometric: -1 is void and 1–3 are classes. 01 sky 02 horizontal 03 vertical
2、模型訓練github
一、源碼下載網絡
git clone git@github.com:shelhamer/fcn.berkeleyvision.org.git
二、數據準備ide
下載標註好的SiftFlowDataset.zip數據集,地址:http://www.cs.unc.edu/~jtighe/Papers/ECCV10/siftflow/SiftFlowDataset.zip測試
將壓縮包解壓至data/sift-flow文件夾下。ui
三、代碼修改lua
git clone git@github.com:litingpan/fcn.git
或從https://github.com/litingpan/fcn 下載,替換掉siftflow-fcn32s整個文件夾。spa
其中solve.py修改以下:.net
import caffe import surgery, score import numpy as np import os import sys try: import setproctitle setproctitle.setproctitle(os.path.basename(os.getcwd())) except: pass # weights = '../ilsvrc-nets/vgg16-fcn.caffemodel' vgg_weights = '../ilsvrc-nets/VGG_ILSVRC_16_layers.caffemodel' vgg_proto = '../ilsvrc-nets/VGG_ILSVRC_16_layers_deploy.prototxt' # init # caffe.set_device(int(sys.argv[1])) caffe.set_device(0) caffe.set_mode_gpu() # solver = caffe.SGDSolver('solver.prototxt') # solver.net.copy_from(weights) solver = caffe.SGDSolver('solver.prototxt') vgg_net = caffe.Net(vgg_proto, vgg_weights, caffe.TRAIN) surgery.transplant(solver.net, vgg_net) del vgg_net # surgeries interp_layers = [k for k in solver.net.params.keys() if 'up' in k] surgery.interp(solver.net, interp_layers) # scoring test = np.loadtxt('../data/sift-flow/test.txt', dtype=str) for _ in range(50): solver.step(2000) # N.B. metrics on the semantic labels are off b.c. of missing classes; # score manually from the histogram instead for proper evaluation score.seg_tests(solver, False, test, layer='score_sem', gt='sem') score.seg_tests(solver, False, test, layer='score_geo', gt='geo')
四、下載預訓練模型
Revisions · ILSVRC-2014 model (VGG team) with 16 weight layers https://gist.github.com/ksimonyan/211839e770f7b538e2d8/revisions
同時下載VGG_ILSVRC_16_layers.caffemodel和VGG_ILSVRC_16_layers_deploy.prototxt放在ilsvrc-nets目錄下
五、訓練
python solve.py
訓練完成後,在snapshot目錄下train_iter_100000.caffemodel即爲訓練好的模型。
3、預測
一、模型準備
能夠使用咱們前面訓練好的模型,若是不想本身訓練,則能夠直接下載訓練好的模型http://dl.caffe.berkeleyvision.org/siftflow-fcn32s-heavy.caffemodel
二、deploy.prototxt
由test.prototxt修改過來的,主要修改了有三個地方,
(1)輸入層
layer { name: "input" type: "Input" top: "data" input_param { # These dimensions are purely for sake of example; # see infer.py for how to reshape the net to the given input size. shape { dim: 1 dim: 3 dim: 256 dim: 256 } } }
注意Input中,要與被測圖片的尺寸一致。
(2)刪掉了drop層
(3)刪除了含有loss層相關層
三、infer.py
import numpy as np from PIL import Image import matplotlib.pyplot as plt import sys import caffe # the demo image is "2007_000129" from PASCAL VOC # load image, switch to BGR, subtract mean, and make dims C x H x W for Caffe im = Image.open('coast_bea14.jpg') in_ = np.array(im, dtype=np.float32) in_ = in_[:,:,::-1] in_ -= np.array((104.00698793,116.66876762,122.67891434)) in_ = in_.transpose((2,0,1)) # load net net = caffe.Net('deploy.prototxt', 'snapshot/train_iter_100000.caffemodel', caffe.TEST) # shape for input (data blob is N x C x H x W), set data net.blobs['data'].reshape(1, *in_.shape) net.blobs['data'].data[...] = in_ # run net and take argmax for prediction net.forward() sem_out = net.blobs['score_sem'].data[0].argmax(axis=0) # plt.imshow(out,cmap='gray'); plt.imshow(sem_out) plt.axis('off') plt.savefig('coast_bea14_sem_out.png') sem_out_img = Image.fromarray(sem_out.astype('uint8')).convert('RGB') sem_out_img.save('coast_bea14_sem_img_out.png') geo_out = net.blobs['score_geo'].data[0].argmax(axis=0) plt.imshow(geo_out) plt.axis('off') plt.savefig('coast_bea14_geo_out.png') geo_out_img = Image.fromarray(geo_out.astype('uint8')).convert('RGB') geo_out_img.save('coast_bea14_geo_img_out.png')
其中,sem_out_img保存着語義分割的結果,geo_out_img保存場景標識的結果。
四、測試
python infer.py
Sift-flow中的圖片都爲256*256*3的彩色圖片
images保存的是數據,semanticlabels保存的是語義分割標籤,一共33類(而標註的數據會多一個無效類)。geolabels保存場景識別標籤,共3類(而標註的數據會多一個無效類)。
因此是分別訓練了兩個網絡,網絡的前七層同樣。
其中coast_bea14_sem_out.png爲語義分割的結果, coast_bea14_geo_out.png爲場景標識的結果,
原圖 語義分割 場景標識
end