siftflow-fcn32s訓練及預測

1、說明python

SIFT Flow 是一個標註的語義分割的數據集,有兩個label,一個是語義分類(33類),另外一個是場景標籤(3類)。git

Semantic and geometric segmentation classes for scenes.

Semantic: 0 is void and 133 are classes.

01 awning
02 balcony
03 bird
04 boat
05 bridge
06 building
07 bus
08 car
09 cow
10 crosswalk
11 desert
12 door
13 fence
14 field
15 grass
16 moon
17 mountain
18 person
19 plant
20 pole
21 river
22 road
23 rock
24 sand
25 sea
26 sidewalk
27 sign
28 sky
29 staircase
30 streetlight
31 sun
32 tree
33 window

Geometric: -1 is void and 13 are classes.

01 sky
02 horizontal
03 vertical


2、模型訓練github

一、源碼下載網絡

git clone git@github.com:shelhamer/fcn.berkeleyvision.org.git

二、數據準備ide

下載標註好的SiftFlowDataset.zip數據集,地址:http://www.cs.unc.edu/~jtighe/Papers/ECCV10/siftflow/SiftFlowDataset.zip測試

將壓縮包解壓至data/sift-flow文件夾下。ui

三、代碼修改lua

git clone git@github.com:litingpan/fcn.git

或從https://github.com/litingpan/fcn 下載,替換掉siftflow-fcn32s整個文件夾。spa

其中solve.py修改以下:.net

import caffe
import surgery, score

import numpy as np
import os
import sys

try:
    import setproctitle
    setproctitle.setproctitle(os.path.basename(os.getcwd()))
except:
    pass

# weights = '../ilsvrc-nets/vgg16-fcn.caffemodel'
vgg_weights = '../ilsvrc-nets/VGG_ILSVRC_16_layers.caffemodel'
vgg_proto = '../ilsvrc-nets/VGG_ILSVRC_16_layers_deploy.prototxt'

# init
# caffe.set_device(int(sys.argv[1]))
caffe.set_device(0)
caffe.set_mode_gpu()

# solver = caffe.SGDSolver('solver.prototxt')
# solver.net.copy_from(weights)
solver = caffe.SGDSolver('solver.prototxt')
vgg_net = caffe.Net(vgg_proto, vgg_weights, caffe.TRAIN)
surgery.transplant(solver.net, vgg_net)
del vgg_net

# surgeries
interp_layers = [k for k in solver.net.params.keys() if 'up' in k]
surgery.interp(solver.net, interp_layers)

# scoring
test = np.loadtxt('../data/sift-flow/test.txt', dtype=str)

for _ in range(50):
    solver.step(2000)
    # N.B. metrics on the semantic labels are off b.c. of missing classes;
    # score manually from the histogram instead for proper evaluation
    score.seg_tests(solver, False, test, layer='score_sem', gt='sem')
    score.seg_tests(solver, False, test, layer='score_geo', gt='geo')

四、下載預訓練模型

Revisions · ILSVRC-2014 model (VGG team) with 16 weight layers  https://gist.github.com/ksimonyan/211839e770f7b538e2d8/revisions

同時下載VGG_ILSVRC_16_layers.caffemodel和VGG_ILSVRC_16_layers_deploy.prototxt放在ilsvrc-nets目錄下

五、訓練

python solve.py

訓練完成後,在snapshot目錄下train_iter_100000.caffemodel即爲訓練好的模型。


3、預測

一、模型準備

能夠使用咱們前面訓練好的模型,若是不想本身訓練,則能夠直接下載訓練好的模型http://dl.caffe.berkeleyvision.org/siftflow-fcn32s-heavy.caffemodel

二、deploy.prototxt

由test.prototxt修改過來的,主要修改了有三個地方,

(1)輸入層

layer {
  name: "input"
  type: "Input"
  top: "data"
  input_param {
    # These dimensions are purely for sake of example;
    # see infer.py for how to reshape the net to the given input size.
    shape { dim: 1 dim: 3 dim: 256 dim: 256 }
  }
}

注意Input中,要與被測圖片的尺寸一致。

(2)刪掉了drop層

(3)刪除了含有loss層相關層

三、infer.py

import numpy as np
from PIL import Image
import matplotlib.pyplot as plt 
import sys   
import caffe

# the demo image is "2007_000129" from PASCAL VOC

# load image, switch to BGR, subtract mean, and make dims C x H x W for Caffe
im = Image.open('coast_bea14.jpg')
in_ = np.array(im, dtype=np.float32)
in_ = in_[:,:,::-1]
in_ -= np.array((104.00698793,116.66876762,122.67891434))
in_ = in_.transpose((2,0,1))

# load net
net = caffe.Net('deploy.prototxt', 'snapshot/train_iter_100000.caffemodel', caffe.TEST)
# shape for input (data blob is N x C x H x W), set data
net.blobs['data'].reshape(1, *in_.shape)
net.blobs['data'].data[...] = in_
# run net and take argmax for prediction
net.forward()
sem_out = net.blobs['score_sem'].data[0].argmax(axis=0)
   
# plt.imshow(out,cmap='gray');
plt.imshow(sem_out)
plt.axis('off')
plt.savefig('coast_bea14_sem_out.png')
sem_out_img = Image.fromarray(sem_out.astype('uint8')).convert('RGB')
sem_out_img.save('coast_bea14_sem_img_out.png')

geo_out = net.blobs['score_geo'].data[0].argmax(axis=0)
plt.imshow(geo_out)
plt.axis('off')
plt.savefig('coast_bea14_geo_out.png')
geo_out_img = Image.fromarray(geo_out.astype('uint8')).convert('RGB')
geo_out_img.save('coast_bea14_geo_img_out.png')

其中,sem_out_img保存着語義分割的結果,geo_out_img保存場景標識的結果。

四、測試

python infer.py

Sift-flow中的圖片都爲256*256*3的彩色圖片

images保存的是數據,semanticlabels保存的是語義分割標籤,一共33類(而標註的數據會多一個無效類)。geolabels保存場景識別標籤,共3類(而標註的數據會多一個無效類)。

因此是分別訓練了兩個網絡,網絡的前七層同樣。

其中coast_bea14_sem_out.png爲語義分割的結果, coast_bea14_geo_out.png爲場景標識的結果,

coast_bea14coast_bea14_sem_outcoast_bea14_geo_out

                      原圖                                                  語義分割                                                 場景標識




end

相關文章
相關標籤/搜索