(下載的時候沒有提示 不知道是正在下 仍是出現錯誤 卡着了)。。一直沒有反應 html
下載前要以管理員身份運行 sudo su 再 python examples/finetune_flickr_style/assemble_data.py --workers=1 --images=2000 --seed 831486 python
或者在命令前加sudo服務器
參考了 http://blog.csdn.net/lujiandong1/article/details/50495454多線程
在使用這個教程時,主要遇到了兩個問題:app
一、數據下不下來。dom
- python examples/finetune_flickr_style/assemble_data.py --workers=1 --images=2000 --seed 831486
運行上述指令時,程序莫名其妙就不動了,也不下載文件,程序也沒有掛掉,好像進入了死鎖狀態。socket
查看源程序:assemble_data.py,能夠看出assemble_data.py用了大量多線程,多進程。個人解決方案就是改源程序,不使用進程來下載了。而且,對下載進行了超時限定,超過6s就認爲超時,進而不下載。ui
====================================================================================================url
assemble_data.py中使用多線程,多進程的源代碼以下:spa
- pool = multiprocessing.Pool(processes=num_workers)
- map_args = zip(df['image_url'], df['image_filename'])
- results = pool.map(download_image, map_args)
===================================================================================================
我修改後的源碼以下:
- import os
- import urllib
- import hashlib
- import argparse
- import numpy as np
- import pandas as pd
- from skimage import io
- import multiprocessing
- import socket
-
-
- MISSING_IMAGE_SHA1 = '6a92790b1c2a301c6e7ddef645dca1f53ea97ac2'
-
- example_dirname = os.path.abspath(os.path.dirname(__file__))
- caffe_dirname = os.path.abspath(os.path.join(example_dirname, '../..'))
- training_dirname = os.path.join(caffe_dirname, 'data/flickr_style')
-
-
- def download_image(args_tuple):
- "For use with multiprocessing map. Returns filename on fail."
- try:
- url, filename = args_tuple
- if not os.path.exists(filename):
- urllib.urlretrieve(url, filename)
- with open(filename) as f:
- assert hashlib.sha1(f.read()).hexdigest() != MISSING_IMAGE_SHA1
- test_read_image = io.imread(filename)
- return True
- except KeyboardInterrupt:
- raise Exception()
- except:
- return False
-
- def mydownload_image(args_tuple):
- "For use with multiprocessing map. Returns filename on fail."
- try:
- url, filename = args_tuple
- if not os.path.exists(filename):
- urllib.urlretrieve(url, filename)
- with open(filename) as f:
- assert hashlib.sha1(f.read()).hexdigest() != MISSING_IMAGE_SHA1
- test_read_image = io.imread(filename)
- return True
- except KeyboardInterrupt:
- raise Exception()
- except:
- return False
-
-
-
- if __name__ == '__main__':
- parser = argparse.ArgumentParser(
- description='Download a subset of Flickr Style to a directory')
- parser.add_argument(
- '-s', '--seed', type=int, default=0,
- help="random seed")
- parser.add_argument(
- '-i', '--images', type=int, default=-1,
- help="number of images to use (-1 for all [default])",
- )
- parser.add_argument(
- '-w', '--workers', type=int, default=-1,
- help="num workers used to download images. -x uses (all - x) cores [-1 default]."
- )
- parser.add_argument(
- '-l', '--labels', type=int, default=0,
- help="if set to a positive value, only sample images from the first number of labels."
- )
-
- args = parser.parse_args()
- np.random.seed(args.seed)
-
- csv_filename = os.path.join(example_dirname, 'flickr_style.csv.gz')
- df = pd.read_csv(csv_filename, index_col=0, compression='gzip')
- df = df.iloc[np.random.permutation(df.shape[0])]
- if args.labels > 0:
- df = df.loc[df['label'] < args.labels]
- if args.images > 0 and args.images < df.shape[0]:
- df = df.iloc[:args.images]
-
-
- if training_dirname is None:
- training_dirname = os.path.join(caffe_dirname, 'data/flickr_style')
- images_dirname = os.path.join(training_dirname, 'images')
- if not os.path.exists(images_dirname):
- os.makedirs(images_dirname)
- df['image_filename'] = [
- os.path.join(images_dirname, _.split('/')[-1]) for _ in df['image_url']
- ]
-
-
- num_workers = args.workers
- if num_workers <= 0:
- num_workers = multiprocessing.cpu_count() + num_workers
- print('Downloading {} images with {} workers...'.format(
- df.shape[0], num_workers))
-
- map_args = zip(df['image_url'], df['image_filename'])
-
- socket.setdefaulttimeout(6)
- results = []
- for item in map_args:
- value = mydownload_image(item)
- results.append(value)
- if value == False:
- print 'Flase'
- else:
- print '1'
-
- print len(results)
- df = df[results]
- for split in ['train', 'test']:
- split_df = df[df['_split'] == split]
- filename = os.path.join(training_dirname, '{}.txt'.format(split))
- split_df[['image_filename', 'label']].to_csv(
- filename, sep=' ', header=None, index=None)
- print('Writing train/val for {} successfully downloaded images.'.format(
- df.shape[0]))
修改主要有如下幾點:
一、#!/usr/bin/env python3 使用python3
二、
- map_args = zip(df['image_url'], df['image_filename'])
-
- socket.setdefaulttimeout(6)
- results = []
- for item in map_args:
- value = mydownload_image(item)
- results.append(value)
- if value == False:
- print 'Flase'
- else:
- print '1'
-
- print len(results)
只使用單線程下載,不使用多線程,多進程下載。而且,設定鏈接的超時時間爲6s,socket.setdefaulttimeout(6)。
通過上述改進,就能夠把數據下載下來。
===================================================================================================
二、
在運行命令:
- ./build/tools/caffe train -solver models/finetune_flickr_style/solver.prototxt -weights models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel
時遇到錯誤:
Failed to parse NetParameter file: models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel
出錯的緣由是咱們傳入的數據bvlc_reference_caffenet.caffemodel 並非二進制的。
緣由:由於我是在win7下,把bvlc_reference_caffenet.caffemodel下載下來,再使用winSCP傳輸到服務器上,直接在服務器上使用wget下載,速度太慢了,可是在傳輸的過程當中winSCP就把bvlc_reference_caffenet.caffemodel的格式給篡改了,致使bvlc_reference_caffenet.caffemodel不是二進制的。
解決方案,把winSCP的傳輸格式設置成二進制,那麼就能夠解決這個問題。
詳情見博客:http://blog.chinaunix.net/uid-20332519-id-5585964.html