一、asyncio aiohttp aiofile 異步爬取圖片

先後折騰了好多天,不廢話,先直接上代碼,再分析:html

 1 import aiohttp
 2 import asyncio
 3 import aiofiles
 4 
 5 header = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/22.0.1207.1 Safari/537.1',
 6                   'Referer': 'https://www.mzitu.com/',
 7                    'Accept': "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
 8                    'Accept-Encoding': 'gzip',
 9                    }
10 
11 async def fetch(session, url):
12     async with session.get(url, proxy='http://59.62.164.252:9999') as response:
13         return await response.read()
14 
15 async def main():
16     async with aiohttp.ClientSession(headers=header) as session:
17         content = await fetch(session, 'https://i.meizitu.net/thumbs/2019/03/174061_01e35_236.jpg')
18         print(content)
19         async with aiofiles.open('D:/a.jpg', 'wb') as f:
20             f.write(content)
21 
22 loop = asyncio.get_event_loop()
23 loop.run_until_complete(main())
24 loop.close()

 

開始心路歷程:python

一、看了廖雪峯老師python教程中協程一章節、《流暢的python》中協程一章節,以及前先後後網上查詢的資料,無論怎麼改均報錯,人接近暴走狀態。web

最後Google查詢ClientSession:Client Reference複製源碼作嘗試:session

 1 import aiohttp
 2 import asyncio
 3 
 4 async def fetch(client):
 5     async with client.get('http://python.org') as resp:
 6         assert resp.status == 200
 7         return await resp.text()
 8 
 9 async def main():
10     async with aiohttp.ClientSession() as client:
11         html = await fetch(client)
12         print(html)
13 
14 loop = asyncio.get_event_loop()
15 loop.run_until_complete(main())

運行成功app

二、改成下載圖片,並想fetch函數能不能直接返回response?異步

 1 import aiohttp
 2 import asyncio
 3 import aiofiles
 4 
 5 header = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/22.0.1207.1 Safari/537.1',
 6                   'Referer': 'https://www.mzitu.com/',
 7                    'Accept': "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
 8                    'Accept-Encoding': 'gzip',
 9                    }
10 
11 async def fetch(session, url):
12     async with session.get(url) as response:
13         return response
14 
15 async def main():
16     async with aiohttp.ClientSession() as session:
17         response = await fetch(session, 'https://i.meizitu.net/thumbs/2019/03/174061_01e35_236.jpg')
18         print(response.read())
19         with open('D:/a.jpg', 'wb') as f:
20             f.write(response.read())
21 
22 loop = asyncio.get_event_loop()
23 loop.run_until_complete(main())
24 loop.close()

運行直接報錯:async

貌似fetch函數中不能返回response?百思不得姐,問題先放這,之後再解決吧函數

三、根據上面ClientSession文檔中介紹:oop

請求頭header應放在ClientSession實例化中fetch

四、aiohttp supports HTTP/HTTPS proxies

可是,它根本就不支持 https 代理。

可參考 Python3 異步代理爬蟲池

 

頭疼,先寫這麼多吧

最後嘗試貌似代理ip又有問題,暈

相關文章
相關標籤/搜索