以前學習了CNN的相關知識,提到Yoon Kim(2014)的論文,利用CNN進行文本分類,雖然該CNN網絡結構簡單效果可觀,但論文沒有給出具體訓練時間,這便值得進一步探討。html
Yoon Kim代碼:https://github.com/yoonkim/CNN_sentencepython
利用做者提供的源碼進行學習,在本人機子上訓練時,作一次CV的平均訓練時間以下,縱座標爲min/CV(供參考):linux
機子配置:Intel(R) Core(TM) i3-4150 CPU @ 3.50GHz, 32G,x64git
顯然,訓練很是慢慢慢!!!在CPU上訓練,作10次CV,得10多個小時啊,朋友發郵件和Yoon Kim求證過,他說確實很慢慢慢,難怪論文中沒有出現訓練時間數據~.~github
考慮改進的話,要麼就是多線程做並行,卷積層可作並行,但代碼不容易寫啊:(,因此我考慮GPU加速。ubuntu
流程:一、安裝NVIDIA驅動;二、安裝配置CUDA;三、修改程序用GPU跑;vim
一、安裝NVIDA驅動bash
(0)看看你有沒有符合的顯卡:lspci | grep -i nvidia,參考教程網絡
(1)下載對應顯卡的nVidia驅動:http://www.nvidia.com/Download/index.aspx?lang=en-us多線程
本人機子GPU:GeForce GTX 660 Ti,對應下載的驅動爲NVIDIA-Linux-x86_64-352.63.run
(2)添加可執行權限: sudo chmod +x NVIDIA-Linux-x86_64-352.63.run
(3)關閉X-window:sudo service lightdm stop,而後切換到tty1:Ctrl+Alt+F1
(4)安裝驅動:sudo ./NVIDIA-Linux-x86_64-352.63.run。按照其中提示進行安裝,可能要設置compat32-libdir
(5)重啓X-window:sudo service lightdm start.
(6)驗證驅動安裝是否成功:cat /proc/driver/nvidia/version
二、安裝配置CUDA
(1)安裝教程:http://docs.nvidia.com/cuda/cuda-getting-started-guide-for-linux/index.html#ubuntu-installation
(2)下載cuda-toolkit:https://developer.nvidia.com/cuda-downloads。選擇和你配置符合的cuda下載:cuda-repo-ubuntu1404-7-5-local_7.5-18_amd64.deb
(3)注意不一樣系統的安裝命令不一樣,下面是ubuntu14.04安裝命令。有什麼問題看上面的教程能夠搞定。
sudo dpkg -i cuda-repo-ubuntu1404-7-5-local_7.5-18_amd64.deb sudo apt-get update sudo apt-get install cuda
(4)驗證toolkit是否成功:nvcc -V
(5)配置路徑:vim .bashrc
PATH=$PATH:/usr/local/cuda-7.0/bin LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda7.0/lib64 export PATH export LD_LIBRARY_PATH
三、修改程序用GPU跑
根據theano官方文檔:http://deeplearning.net/software/theano/tutorial/using_gpu.html
能夠先用下列代碼測試CUDA配置是否正確,可否正常使用GPU。
from theano import function, config, shared, sandbox import theano.tensor as T import numpy import time vlen = 10 * 30 * 768 # 10 x #cores x # threads per core iters = 1000 rng = numpy.random.RandomState(22) x = shared(numpy.asarray(rng.rand(vlen), config.floatX)) f = function([], T.exp(x)) print(f.maker.fgraph.toposort()) t0 = time.time() for i in xrange(iters): r = f() t1 = time.time() print("Looping %d times took %f seconds" % (iters, t1 - t0)) print("Result is %s" % (r,)) if numpy.any([isinstance(x.op, T.Elemwise) for x in f.maker.fgraph.toposort()]): print('Used the cpu') else: print('Used the gpu')
將上述代碼保存爲check_GPU.py,使用如下命令進行測試,根據測試結果可知gpu可否正常使用,若出錯有多是上面路徑配置問題。
$ THEANO_FLAGS=mode=FAST_RUN,device=cpu,floatX=float32 python check1.py [Elemwise{exp,no_inplace}(<TensorType(float32, vector)>)] Looping 1000 times took 3.06635117531 seconds Result is [ 1.23178029 1.61879337 1.52278066 ..., 2.20771813 2.29967761 1.62323284] Used the cpu $ THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python check1.py Using gpu device 0: GeForce GTX 580 [GpuElemwise{exp,no_inplace}(<CudaNdarrayType(float32, vector)>), HostFromGpu(GpuElemwise{exp,no_inplace}.0)] Looping 1000 times took 0.638810873032 seconds Result is [ 1.23178029 1.61879349 1.52278066 ..., 2.20771813 2.29967761 1.62323296] Used the gpu
因爲目前Nvidia GPU主要是針對float32位浮點數計算進行優化加速,因此須要將代碼中的數據及變量類型置成float32。
具體對代碼作以下更改:
(1)process_data.py
line 55, W = np.zeros(shape=(vocab_size+1, k), dtype='float32') line 56, W[0] = np.zeros(k, dtype='float32')
修改後運行命令,得到每一個word對應的詞向量(float32)。
python process_data.py GoogleNews-vectors-negative300.bin
(2)conv_net_sentence.py
添加allow_input_downcast=True,程序中間運算過程若產生float64,會cast到float32。
lin 82, set_zero = theano.function([zero_vec_tensor], updates=[(Words, T.set_subtensor(Words[0,:], zero_vec_tensor))], allow_input_downcast=True) lin131, val_model = theano.function([index], classifier.errors(y), givens={ x: val_set_x[index * batch_size: (index + 1) * batch_size], y: val_set_y[index * batch_size: (index + 1) * batch_size]}, allow_input_downcast=True) lin 137, test_model = theano.function([index], classifier.errors(y), givens={ x: train_set_x[index * batch_size: (index + 1) * batch_size], y: train_set_y[index * batch_size: (index + 1) * batch_size]}, allow_input_downcast=True) lin 141, train_model = theano.function([index], cost, updates=grad_updates, givens={ x: train_set_x[indexbatch_size:(index+1)batch_size], y: train_set_y[indexbatch_size:(index+1)batch_size]}, allow_input_downcast=True) lin 155, test_model_all = theano.function([x,y], test_error, allow_input_downcast=True)
(3)運行程序
THEANO_FLAGS=mode=FAST_RUN,device=gpu0,floatX=float32,warn_float64=raise python conv_net_sentence.py -static -word2vec
THEANO_FLAGS=mode=FAST_RUN,device=gpu0,floatX=float32,warn_float64=raise python conv_net_sentence.py -nonstatic -word2vec
THEANO_FLAGS=mode=FAST_RUN,device=gpu0,floatX=float32,warn_float64=raise python conv_net_sentence.py -nonstatic -rand
(4)結果驚人,訓練時間提高了20x。
第一次跑gpu,以上過程,如有疏忽,還請多多指導。
Reference:
一、有關theano配置:http://deeplearning.net/software/theano/library/config.html
二、Ubuntu安裝Theano+CUDA:http://www.linuxidc.com/Linux/2014-10/107503.htm