"C:\Program Files\Python36\python.exe" C:/Users/Administrator.SC-201612181954/PycharmProjects/untitled2/test1 http://www2.bingfeng.tw/data/attachment/forum/201601/21/150057zygjy5rf2y5spf2y.png http://i-3.yxdown.com/2016/5/19/b24c1344-5524-4f35-96e2-cd1db694d563.jpg http://i-3.yxdown.com/2016/5/19/b43738d9-5523-4659-a8fe-19b838650af8.jpg http://attach10.92wy.com/images/2016/0111/1452497908993e6d86.jpg http://www2.bingfeng.tw/data/attachment/forum/201601/21/150234lsabgyz2yg00ji00.jpg http://attach10.92wy.com/images/2016/0111/14524963748832cf37.jpg http://img.qqzhi.com/upload/img_2_2950581147D1797566349_23.jpg Traceback (most recent call last): File "C:\Program Files\Python36\lib\site-packages\urllib3-1.22-py3.6.egg\urllib3\connectionpool.py", line 387, in _make_request six.raise_from(e, None) File "<string>", line 2, in raise_from File "C:\Program Files\Python36\lib\site-packages\urllib3-1.22-py3.6.egg\urllib3\connectionpool.py", line 383, in _make_request httplib_response = conn.getresponse() File "C:\Program Files\Python36\lib\http\client.py", line 1331, in getresponse response.begin() File "C:\Program Files\Python36\lib\http\client.py", line 297, in begin version, status, reason = self._read_status() File "C:\Program Files\Python36\lib\http\client.py", line 258, in _read_status line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1") File "C:\Program Files\Python36\lib\socket.py", line 586, in readinto return self._sock.recv_into(b) socket.timeout: timed out During handling of the above exception, another exception occurred: Traceback (most recent call last): File "C:\Program Files\Python36\lib\requests\adapters.py", line 440, in send timeout=timeout File "C:\Program Files\Python36\lib\site-packages\urllib3-1.22-py3.6.egg\urllib3\connectionpool.py", line 639, in urlopen _stacktrace=sys.exc_info()[2]) File "C:\Program Files\Python36\lib\site-packages\urllib3-1.22-py3.6.egg\urllib3\util\retry.py", line 357, in increment raise six.reraise(type(error), error, _stacktrace) File "C:\Program Files\Python36\lib\site-packages\urllib3-1.22-py3.6.egg\urllib3\packages\six.py", line 686, in reraise raise value File "C:\Program Files\Python36\lib\site-packages\urllib3-1.22-py3.6.egg\urllib3\connectionpool.py", line 601, in urlopen chunked=chunked) File "C:\Program Files\Python36\lib\site-packages\urllib3-1.22-py3.6.egg\urllib3\connectionpool.py", line 389, in _make_request self._raise_timeout(err=e, url=url, timeout_value=read_timeout) File "C:\Program Files\Python36\lib\site-packages\urllib3-1.22-py3.6.egg\urllib3\connectionpool.py", line 309, in _raise_timeout raise ReadTimeoutError(self, url, "Read timed out. (read timeout=%s)" % timeout_value) urllib3.exceptions.ReadTimeoutError: HTTPConnectionPool(host='img.qqzhi.com', port=80): Read timed out. (read timeout=10) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "C:/Users/Administrator.SC-201612181954/PycharmProjects/untitled2/test1", line 12, in <module> pic= requests.get(each, timeout=10) File "C:\Program Files\Python36\lib\requests\api.py", line 72, in get return request('get', url, params=params, **kwargs) File "C:\Program Files\Python36\lib\requests\api.py", line 58, in request return session.request(method=method, url=url, **kwargs) File "C:\Program Files\Python36\lib\requests\sessions.py", line 502, in request resp = self.send(prep, **send_kwargs) File "C:\Program Files\Python36\lib\requests\sessions.py", line 612, in send r = adapter.send(request, **kwargs) File "C:\Program Files\Python36\lib\requests\adapters.py", line 516, in send raise ReadTimeout(e, request=request) requests.exceptions.ReadTimeout: HTTPConnectionPool(host='img.qqzhi.com', port=80): Read timed out. (read timeout=10)
具體學爬蟲的過程當中遇到了這個問題,按照網絡上面的教程,爬圖片爬到一半就卡死了,下面是代碼html
import re import requests #url = 'http://image.baidu.com/search/flip?tn=baiduimage&ipn=r&ct=201326592&cl=2&lm=-1&st=-1&fm=result&fr=&sf=1&fmq=1460997499750_R&pv=&ic=0&nc=1&z=&se=1&showtab=0&fb=0&width=&height=&face=0&istype=2&ie=utf-8&word=%E5%B0%8F%E9%BB%84%E4%BA%BA' url = 'http://image.baidu.com/search/flip?tn=baiduimage&ipn=r&ct=201326592&cl=2&lm=-1&st=-1&fm=result&fr=&sf=1&fmq=1501470791167_R&pv=&ic=0&nc=1&z=&se=1&showtab=0&fb=0&width=&height=&face=0&istype=2&ie=utf-8&ctd=1501470791180%5E00_1899X935&word=%E5%9B%A2%E5%AD%90%E5%B0%91%E5%A5%B3' html = requests.get(url).text pic_url = re.findall('"objURL":"(.*?)",',html,re.S) i = 0 for each in pic_url: print(each) try: pic= requests.get(each, timeout=10) except requests.exceptions.ConnectionError: print( '【錯誤】當前圖片沒法下載') continue string = 'picture1\\'+str(i) + '.jpg' fp = open(string,'wb') fp.write(pic.content) fp.close() i += 1
檢查之後發現是百度所在的頁面該圖片雖然收錄在本身的機器上,因此你還能看見,可是實際鏈接已經失效,只能在縮略圖上看到python
仔細查看錯誤信息發現返回的錯誤信息是api
requests.exceptions.ReadTimeout:
而不是代碼中的網絡
except requests.exceptions.ConnectionError:
我查了一下這兩個錯誤的區別,發現區別不是很大,通常來講readtimeout出現的緣由是該網頁點擊之後加載特別慢,然後者就是直接顯示鏈接錯誤類型的,大部分的人在驗證這裏的鏈接錯誤的時候都是同時加的驗證(這裏也對這個程序原來的做者表示不要偷懶,博客上教東西教一半,太坑爹了)session