Caffe應該是目前深度學習領域應用最普遍的幾大框架之一了,尤爲是視覺領域。絕大多數用Caffe的人,應該用的都是基於分類的網絡,但有的時候也許會有基於迴歸的視覺應用的須要,查了一下Caffe官網,還真沒有很現成的例子。這篇舉個簡單的小例子說明一下如何用Caffe和卷積神經網絡(CNN: Convolutional Neural Networks)作基於迴歸的應用。html
最經典的CNN結構通常都是幾個卷積層,後面接全鏈接(FC: Fully Connected)層,最後接一個Softmax層輸出預測的分類機率。若是把圖像的矩陣也當作是一個向量的話,CNN中不管是卷積仍是FC,就是不斷地把一個向量變換成另外一個向量(事實上對於單個的filter/feature channel,Caffe裏最基礎的卷積實現就是向量和矩陣的乘法:Convolution in Caffe: a memo),最後輸出就是一個把制定分類的類目數做爲維度的機率向量。由於神經網絡的風格算是黑盒子學習,因此很直接的想法就是把最後輸出的向量的值直接拿來作迴歸,最後優化的目標函數再也不是cross entropy等,而是直接基於實數值的偏差。python
Caffe內置的EuclideanLossLayer就是用來解決上面提到的實值迴歸的一個辦法。EuclideanLossLayer計算以下的偏差:git
\begin{align}\notag \frac 1 {2N} \sum_{i=1}^N \| x^1_i - x^2_i \|_2^2\end{align}github
因此很簡單,把標註的值和網絡計算出來的值放到EuclideanLossLayer比較差別就能夠了。算法
用一個給圖像混亂程度打分的簡單例子來講明如何使用Caffe和EuclideanLossLayer進行迴歸。網絡
這裏採用統計物理裏很是經典的Ising模型的模擬來生成圖片,Ising模型多是統計物理裏被人研究最多的模型之一,不過這篇不是講物理,就略過細節,總之基於這個模型的模擬能夠生成以下的圖片:app
圖片中第一個字段是編號,第二個字段對應的分數能夠大體認爲是圖片的有序程度,範圍0~1,而這個例子要作的事情就是用一個CNN學習圖片的有序程度並預測。框架
生成圖片的Python腳本源於Monte Carlo Simulation of the Ising Model using Python,基於Metropolis算法對Ising模型的模擬,作了一些並行和隨機生成圖片的修改,在每次模擬的時候隨機取一個時間(1e3到1e7之間)點輸出到圖片,代碼以下:dom
import os import sys import datetime from multiprocessing import Process import numpy as np from matplotlib import pyplot LATTICE_SIZE = 100 SAMPLE_SIZE = 12000 STEP_ORDER_RANGE = [3, 7] SAMPLE_FOLDER = 'samples' #----------------------------------------------------------------------# # Check periodic boundary conditions #----------------------------------------------------------------------# def bc(i): if i+1 > LATTICE_SIZE-1: return 0 if i-1 < 0: return LATTICE_SIZE - 1 else: return i #----------------------------------------------------------------------# # Calculate internal energy #----------------------------------------------------------------------# def energy(system, N, M): return -1 * system[N,M] * (system[bc(N-1), M] \ + system[bc(N+1), M] \ + system[N, bc(M-1)] \ + system[N, bc(M+1)]) #----------------------------------------------------------------------# # Build the system #----------------------------------------------------------------------# def build_system(): system = np.random.random_integers(0, 1, (LATTICE_SIZE, LATTICE_SIZE)) system[system==0] = - 1 return system #----------------------------------------------------------------------# # The Main monte carlo loop #----------------------------------------------------------------------# def main(T, index): score = np.random.random() order = score*(STEP_ORDER_RANGE[1]-STEP_ORDER_RANGE[0]) + STEP_ORDER_RANGE[0] stop = np.int(np.round(np.power(10.0, order))) print('Running sample: {}, stop @ {}'.format(index, stop)) sys.stdout.flush() system = build_system() for step in range(stop): M = np.random.randint(0, LATTICE_SIZE) N = np.random.randint(0, LATTICE_SIZE) E = -2. * energy(system, N, M) if E <= 0.: system[N,M] *= -1 elif np.exp(-1./T*E) > np.random.rand(): system[N,M] *= -1 #if step % 100000 == 0: # print('.'), # sys.stdout.flush() filename = '{}/'.format(SAMPLE_FOLDER) + '{:0>5d}'.format(index) + '_{}.jpg'.format(score) pyplot.imsave(filename, system, cmap='gray') print('Saved to {}!\n'.format(filename)) sys.stdout.flush() #----------------------------------------------------------------------# # Run the menu for the monte carlo simulation #----------------------------------------------------------------------# def run_main(index, length): np.random.seed(datetime.datetime.now().microsecond) for i in xrange(index, index+length): main(0.1, i) def run(): cmd = 'mkdir -p {}'.format(SAMPLE_FOLDER) os.system(cmd) n_processes = 8 length = int(SAMPLE_SIZE/n_processes) processes = [Process(target=run_main, args=(x, length)) for x in np.arange(n_processes)*length] for p in processes: p.start() for p in processes: p.join() if __name__ == '__main__': run()
在這個例子中一共隨機生成了12000張100x100的灰度圖片,命名的規則是[編號]_[有序程度].jpg。至於有序程度爲何用0~1之間的隨機數而不是模擬的時間步數,是由於雖然說理論上三層神經網絡就能逼近任意函數,不過具體到實際訓練中仍是應該對數據進行預處理,尤爲是當目標函數是L2 norm的形式時,若是能保持數據分佈均勻,模型的收斂性和可靠性都會提升,範圍0到1之間是爲了方便最後一層Sigmoid輸出對比,同時也方便估算模型偏差。還有一點須要注意是,由於圖片自己就是模特卡羅模擬產生的,因此即便是一樣的有序度的圖片,其實看上去不論是主觀仍是客觀的有序程度都是有差異的。ide
把Ising模擬生成的12000張圖片劃分爲三部分:1w做爲訓練數據;1k做爲驗證集;剩下1k做爲測試集。下面的Python代碼用來生成這樣的訓練集和驗證集的列表:
import os import numpy filename2score = lambda x: x[:x.rfind('.')].split('_')[-1] img_files = sorted(os.listdir('samples')) with open('train.txt', 'w') as train_txt: for f in img_files[:10000]: score = filename2score(f) line = 'samples/{} {}\n'.format(f, score) train_txt.write(line) with open('val.txt', 'w') as val_txt: for f in img_files[10000:11000]: score = filename2score(f) line = 'samples/{} {}\n'.format(f, score) val_txt.write(line) with open('test.txt', 'w') as test_txt: for f in img_files[11000:]: line = 'samples/{}\n'.format(f) test_txt.write(line)
lmdb雖然又快又省空間,但是Caffe默認的生成lmdb的工具(convert_imageset)不支持浮點類型的數據,雖然caffe.proto裏Datum的定義彷佛是支持的,不過相應的代碼改動仍是比較麻煩。相比起來HDF又慢又佔空間,但簡單好用,若是不是海量數據,仍是個不錯的選擇,這裏用HDF來存儲用於迴歸訓練和驗證的數據,下面是一個生成HDF文件和供Caffe讀取文件列表的腳本:
import sys import numpy from matplotlib import pyplot import h5py IMAGE_SIZE = (100, 100) MEAN_VALUE = 128 filename = sys.argv[1] setname, ext = filename.split('.') with open(filename, 'r') as f: lines = f.readlines() numpy.random.shuffle(lines) sample_size = len(lines) imgs = numpy.zeros((sample_size, 1,) + IMAGE_SIZE, dtype=numpy.float32) scores = numpy.zeros(sample_size, dtype=numpy.float32) h5_filename = '{}.h5'.format(setname) with h5py.File(h5_filename, 'w') as h: for i, line in enumerate(lines): image_name, score = line[:-1].split() img = pyplot.imread(image_name)[:, :, 0].astype(numpy.float32) img = img.reshape((1, )+img.shape) img -= MEAN_VALUE imgs[i] = img scores[i] = float(score) if (i+1) % 1000 == 0: print('processed {} images!'.format(i+1)) h.create_dataset('data', data=imgs) h.create_dataset('score', data=scores) with open('{}_h5.txt'.format(setname), 'w') as f: f.write(h5_filename)
須要注意的是Caffe中HDF的DataLayer不支持transform,因此數據存儲前就提早進行了減去均值的步驟。保存爲gen_hdf.py,依次運行命令生成訓練集和驗證集:
python gen_hdf.py train.txt
python gen_hdf.py val.txt
用一個簡單的小網絡訓練這個基於迴歸的模型:
網絡結構的train_val.prototxt以下:
name: "RegressionExample" layer { name: "data" type: "HDF5Data" top: "data" top: "score" include { phase: TRAIN } hdf5_data_param { source: "train_h5.txt" batch_size: 64 } } layer { name: "data" type: "HDF5Data" top: "data" top: "score" include { phase: TEST } hdf5_data_param { source: "val_h5.txt" batch_size: 64 } } layer { name: "conv1" type: "Convolution" bottom: "data" top: "conv1" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 1 decay_mult: 0 } convolution_param { num_output: 96 kernel_size: 5 stride: 2 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } } layer { name: "relu1" type: "ReLU" bottom: "conv1" top: "conv1" } layer { name: "pool1" type: "Pooling" bottom: "conv1" top: "pool1" pooling_param { pool: MAX kernel_size: 3 stride: 2 } } layer { name: "conv2" type: "Convolution" bottom: "pool1" top: "conv2" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 1 decay_mult: 0 } convolution_param { num_output: 96 pad: 2 kernel_size: 3 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } } layer { name: "relu2" type: "ReLU" bottom: "conv2" top: "conv2" } layer { name: "pool2" type: "Pooling" bottom: "conv2" top: "pool2" pooling_param { pool: MAX kernel_size: 3 stride: 2 } } layer { name: "conv3" type: "Convolution" bottom: "pool2" top: "conv3" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 1 decay_mult: 0 } convolution_param { num_output: 128 pad: 1 kernel_size: 3 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } } layer { name: "relu3" type: "ReLU" bottom: "conv3" top: "conv3" } layer { name: "pool3" type: "Pooling" bottom: "conv3" top: "pool3" pooling_param { pool: MAX kernel_size: 3 stride: 2 } } layer { name: "fc4" type: "InnerProduct" bottom: "pool3" top: "fc4" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 1 decay_mult: 0 } inner_product_param { num_output: 192 weight_filler { type: "gaussian" std: 0.005 } bias_filler { type: "constant" value: 0 } } } layer { name: "relu4" type: "ReLU" bottom: "fc4" top: "fc4" } layer { name: "drop4" type: "Dropout" bottom: "fc4" top: "fc4" dropout_param { dropout_ratio: 0.35 } } layer { name: "fc5" type: "InnerProduct" bottom: "fc4" top: "fc5" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 1 decay_mult: 0 } inner_product_param { num_output: 1 weight_filler { type: "gaussian" std: 0.005 } bias_filler { type: "constant" value: 0 } } } layer { name: "sigmoid5" type: "Sigmoid" bottom: "fc5" top: "pred" } layer { name: "loss" type: "EuclideanLoss" bottom: "pred" bottom: "score" top: "loss" }
其中迴歸部分由EuclideanLossLayer中???較最後一層的輸出和train.txt/val.txt中的分數差並做爲目標函數實現。須要提一句的是基於實數值的迴歸問題,對於方差這種目標函數,SGD的性能和穩定性通常來講都不是很好,Caffe文檔裏也有提到過這點。不過具體到Caffe中,能用就行。。solver.prototxt以下:
net: "./train_val.prototxt" test_iter: 2000 test_interval: 500 base_lr: 0.01 lr_policy: "step" gamma: 0.1 stepsize: 50000 display: 50 max_iter: 10000 momentum: 0.85 weight_decay: 0.0005 snapshot: 1000 snapshot_prefix: "./example_ising" solver_mode: GPU type: "Nesterov"
而後訓練:
/path/to/caffe/build/tools/caffe train -solver solver.prototxt
隨便訓了10000個iteration,反正是收斂了
把train_val.prototxt的兩個data layer替換成input_shape,而後去掉最後一層EuclideanLoss就能夠了,input_shape定義以下:
input: "data" input_shape { dim: 1 dim: 1 dim: 100 dim: 100 }
改好後另存爲deploy.prototxt,而後把訓好的模型拿來在測試集上作測試,pycaffe提供了很是方便的接口,用下面腳本輸出一個文件列表裏全部文件的預測結果:
import sys import numpy sys.path.append('/opt/caffe/python') import caffe WEIGHTS_FILE = 'example_ising_iter_10000.caffemodel' DEPLOY_FILE = 'deploy.prototxt' IMAGE_SIZE = (100, 100) MEAN_VALUE = 128 caffe.set_mode_cpu() net = caffe.Net(DEPLOY_FILE, WEIGHTS_FILE, caffe.TEST) net.blobs['data'].reshape(1, 1, *IMAGE_SIZE) transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape}) transformer.set_transpose('data', (2,0,1)) transformer.set_mean('data', numpy.array([MEAN_VALUE])) transformer.set_raw_scale('data', 255) image_list = sys.argv[1] with open(image_list, 'r') as f: for line in f.readlines(): filename = line[:-1] image = caffe.io.load_image(filename, False) transformed_image = transformer.preprocess('data', image) net.blobs['data'].data[...] = transformed_image output = net.forward() score = output['pred'][0][0] print('The predicted score for {} is {}'.format(filename, score))
對test.txt執行後,前20個文件的結果:
The predicted score for samples/11000_0.30434289374.jpg is 0.296356916428
The predicted score for samples/11001_0.865486910668.jpg is 0.823452055454
The predicted score for samples/11002_0.566940975024.jpg is 0.566108822823
The predicted score for samples/11003_0.447787648857.jpg is 0.443993896246
The predicted score for samples/11004_0.688095649282.jpg is 0.714970111847
The predicted score for samples/11005_0.0834013155212.jpg is 0.0675165131688
The predicted score for samples/11006_0.421206628337.jpg is 0.419887691736
The predicted score for samples/11007_0.579389741639.jpg is 0.58779758215
The predicted score for samples/11008_0.428772434501.jpg is 0.422569811344
The predicted score for samples/11009_0.188864264594.jpg is 0.18296033144
The predicted score for samples/11010_0.328103100948.jpg is 0.325099766254
The predicted score for samples/11011_0.131306426901.jpg is 0.119059860706
The predicted score for samples/11012_0.627027363247.jpg is 0.622474730015
The predicted score for samples/11013_0.0857273267817.jpg is 0.0735778361559
The predicted score for samples/11014_0.870007364446.jpg is 0.883266746998
The predicted score for samples/11015_0.0515036691772.jpg is 0.0575885437429
The predicted score for samples/11016_0.799989222638.jpg is 0.750781834126
The predicted score for samples/11017_0.22049410733.jpg is 0.208014890552
The predicted score for samples/11018_0.882973794598.jpg is 0.891137182713
The predicted score for samples/11019_0.686353385772.jpg is 0.671325206757
The predicted score for samples/11020_0.385639405472.jpg is 0.385150641203
看上去還不錯,挑幾張看看:
再輸出第一層的卷積核看看:
能夠看到第一層的卷積核成功學到了高頻和低頻的成分,這也是這個例子中判斷有序程度的關鍵,其實就是高頻的圖像就混亂,低頻的就相對有序一些。Ising的自旋圖雖然都是二值的,不過學出來的模型也能夠隨便拿一些別的圖片試試:
嗯。。定性看仍是差很少的。