Fine-tuning CaffeNet for Style Recognition on 「Flickr Style」 Data 數據下載遇到的問題

時間 2019-11-17

標籤 fine tuning caffenet style recognition flickr data 數據下載遇到問題简体版

原文原文鏈接

（下載的時候沒有提示不知道是正在下仍是出現錯誤卡着了）。。一直沒有反應 html

下載前要以管理員身份運行 sudo su 再 python examples/finetune_flickr_style/assemble_data.py --workers=1 --images=2000 --seed 831486 python

或者在命令前加sudo服務器

參考了 http://blog.csdn.net/lujiandong1/article/details/50495454多線程

在使用這個教程時，主要遇到了兩個問題：app

一、數據下不下來。dom

[python] view plain copy

python examples/finetune_flickr_style/assemble_data.py --workers=1 --images=2000 --seed 831486

運行上述指令時，程序莫名其妙就不動了，也不下載文件，程序也沒有掛掉，好像進入了死鎖狀態。socket

查看源程序：assemble_data.py，能夠看出assemble_data.py用了大量多線程，多進程。個人解決方案就是改源程序，不使用進程來下載了。而且，對下載進行了超時限定，超過6s就認爲超時，進而不下載。ui

====================================================================================================url

assemble_data.py中使用多線程，多進程的源代碼以下：spa

[python] view plain copy

pool = multiprocessing.Pool(processes=num_workers)
map_args = zip(df['image_url'], df['image_filename'])
results = pool.map(download_image, map_args)

===================================================================================================

我修改後的源碼以下：

[python] view plain copy

#!/usr/bin/env python3
"""
Form a subset of the Flickr Style data, download images to dirname, and write
Caffe ImagesDataLayer training file.
"""
import os
import urllib
import hashlib
import argparse
import numpy as np
import pandas as pd
from skimage import io
import multiprocessing
import socket
# Flickr returns a special image if the request is unavailable.
MISSING_IMAGE_SHA1 = '6a92790b1c2a301c6e7ddef645dca1f53ea97ac2'
example_dirname = os.path.abspath(os.path.dirname(__file__))
caffe_dirname = os.path.abspath(os.path.join(example_dirname, '../..'))
training_dirname = os.path.join(caffe_dirname, 'data/flickr_style')
def download_image(args_tuple):
"For use with multiprocessing map. Returns filename on fail."
try:
url, filename = args_tuple
if not os.path.exists(filename):
urllib.urlretrieve(url, filename)
with open(filename) as f:
assert hashlib.sha1(f.read()).hexdigest() != MISSING_IMAGE_SHA1
test_read_image = io.imread(filename)
return True
except KeyboardInterrupt:
raise Exception() # multiprocessing doesn't catch keyboard exceptions
except:
return False
def mydownload_image(args_tuple):
"For use with multiprocessing map. Returns filename on fail."
try:
url, filename = args_tuple
if not os.path.exists(filename):
urllib.urlretrieve(url, filename)
with open(filename) as f:
assert hashlib.sha1(f.read()).hexdigest() != MISSING_IMAGE_SHA1
test_read_image = io.imread(filename)
return True
except KeyboardInterrupt:
raise Exception() # multiprocessing doesn't catch keyboard exceptions
except:
return False
if __name__ == '__main__':
parser = argparse.ArgumentParser(
description='Download a subset of Flickr Style to a directory')
parser.add_argument(
'-s', '--seed', type=int, default=0,
help="random seed")
parser.add_argument(
'-i', '--images', type=int, default=-1,
help="number of images to use (-1 for all [default])",
)
parser.add_argument(
'-w', '--workers', type=int, default=-1,
help="num workers used to download images. -x uses (all - x) cores [-1 default]."
)
parser.add_argument(
'-l', '--labels', type=int, default=0,
help="if set to a positive value, only sample images from the first number of labels."
)
args = parser.parse_args()
np.random.seed(args.seed)
# Read data, shuffle order, and subsample.
csv_filename = os.path.join(example_dirname, 'flickr_style.csv.gz')
df = pd.read_csv(csv_filename, index_col=0, compression='gzip')
df = df.iloc[np.random.permutation(df.shape[0])]
if args.labels > 0:
df = df.loc[df['label'] < args.labels]
if args.images > 0 and args.images < df.shape[0]:
df = df.iloc[:args.images]
# Make directory for images and get local filenames.
if training_dirname is None:
training_dirname = os.path.join(caffe_dirname, 'data/flickr_style')
images_dirname = os.path.join(training_dirname, 'images')
if not os.path.exists(images_dirname):
os.makedirs(images_dirname)
df['image_filename'] = [
os.path.join(images_dirname, _.split('/')[-1]) for _ in df['image_url']
]
# Download images.
num_workers = args.workers
if num_workers <= 0:
num_workers = multiprocessing.cpu_count() + num_workers
print('Downloading {} images with {} workers...'.format(
df.shape[0], num_workers))
#pool = multiprocessing.Pool(processes=num_workers)
map_args = zip(df['image_url'], df['image_filename'])
#results = pool.map(download_image, map_args)
socket.setdefaulttimeout(6)
results = []
for item in map_args:
value = mydownload_image(item)
results.append(value)
if value == False:
print 'Flase'
else:
print '1'
# Only keep rows with valid images, and write out training file lists.
print len(results)
df = df[results]
for split in ['train', 'test']:
split_df = df[df['_split'] == split]
filename = os.path.join(training_dirname, '{}.txt'.format(split))
split_df[['image_filename', 'label']].to_csv(
filename, sep=' ', header=None, index=None)
print('Writing train/val for {} successfully downloaded images.'.format(
df.shape[0]))

修改主要有如下幾點：

一、#!/usr/bin/env python3 使用python3

二、

[python] view plain copy

#pool = multiprocessing.Pool(processes=num_workers)
map_args = zip(df['image_url'], df['image_filename'])
#results = pool.map(download_image, map_args)
socket.setdefaulttimeout(6)
results = []
for item in map_args:
value = mydownload_image(item)
results.append(value)
if value == False:
print 'Flase'
else:
print '1'
# Only keep rows with valid images, and write out training file lists.
print len(results)

只使用單線程下載，不使用多線程，多進程下載。而且，設定鏈接的超時時間爲6s,socket.setdefaulttimeout(6)。

通過上述改進，就能夠把數據下載下來。

===================================================================================================

二、

在運行命令：

[plain] view plain copy

./build/tools/caffe train -solver models/finetune_flickr_style/solver.prototxt -weights models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel

時遇到錯誤：

Failed to parse NetParameter file: models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel

出錯的緣由是咱們傳入的數據bvlc_reference_caffenet.caffemodel 並非二進制的。

緣由：由於我是在win7下，把bvlc_reference_caffenet.caffemodel下載下來，再使用winSCP傳輸到服務器上，直接在服務器上使用wget下載，速度太慢了，可是在傳輸的過程當中winSCP就把bvlc_reference_caffenet.caffemodel的格式給篡改了,致使bvlc_reference_caffenet.caffemodel不是二進制的。

解決方案，把winSCP的傳輸格式設置成二進制，那麼就能夠解決這個問題。

詳情見博客:http://blog.chinaunix.net/uid-20332519-id-5585964.html