轉自:https://pawelmhm.github.io/asyncio/python/aiohttp/2016/04/22/asyncio-aiohttp.htmlhtml
In this post I’d like to test limits of python aiohttp and check its performance in terms of requests per minute. Everyone knows that asynchronous code performs better when applied to network operations, but it’s still interesting to check this assumption and understand how exactly it is better and why it’s is better. I’m going to check it by trying to make 1 million requests with aiohttp client. How many requests per minute will aiohttp make? What kind of exceptions and crashes can you expect when you try to make such volume of requests with very primitive scripts? What are main gotchas that you need to think about when trying to make such volume of requests?node
Async programming is not easy. It’s not easy because using callbacks and thinking in terms of events and event handlers requires more effort than usual synchronous programming. But it is also difficult because asyncio is still relatively new and there are few blog posts, tutorials about it. Official docs are very terse and contain only basic examples. There are some Stack Overflow questions but not that many only 410 as of time of writing (compare with 2 585 questions tagged 「twisted」) There are couple of nice blog posts and articles about asyncio over there such as this, that, that or perhaps even this or this.python
To make it easier let’s start with the basics - simple HTTP hello world - just making GET and fetching one single HTTP response.linux
In synchronous world you just do:git
import requests def hello(): return requests.get("http://httpbin.org/get") print(hello())
How does that look in aiohttp?github
#!/usr/local/bin/python3.5 import asyncio from aiohttp import ClientSession async def hello(url): async with ClientSession() as session: async with session.get(url) as response: response = await response.read() print(response) loop = asyncio.get_event_loop() loop.run_until_complete(hello("http://httpbin.org/headers"))
hmm looks like I had to write lots of code for such a basic task… There is 「async def」 and 「async with」 and two 「awaits」 here. It seems really confusing at first sight, let’s try to explain it then.web
You make your function asynchronous by using async keyword before function definition and using await keyword. There are actually two asynchronous operations that our hello() function performs. First it fetches response asynchronously, then it reads response body in asynchronous manner.express
Aiohttp recommends to use ClientSession as primary interface to make requests. ClientSession allows you to store cookies between requests and keeps objects that are common for all requests (event loop, connection and other things). Session needs to be closed after using it, and closing session is another asynchronous operation, this is why you need async with
every time you deal with sessions.api
After you open client session you can use it to make requests. This is where another asynchronous operation starts, downloading request. Just as in case of client sessions responses must be closed explicitly, and context manager’s with
statement ensures it will be closed properly in all circumstances.bash
To start your program you need to run it in event loop, so you need to create instance of asyncio loop and put task into this loop.
It all does sound bit difficult but it’s not that complex and looks logical if you spend some time trying to understand it.
Now let’s try to do something more interesting, fetching multiple urls one after another. With synchronous code you would do just:
for url in urls: print(requests.get(url).text)
This is really quick and easy, async will not be that easy, so you should always consider if something more complex is actually necessary for your needs. If your app works nice with synchronous code maybe there is no need to bother with async code? If you do need to bother with async code here’s how you do that. Our hello()
async function stays the same but we need to wrap it in asyncio Future
object and pass whole lists of Future objects as tasks to be executed in the loop.
loop = asyncio.get_event_loop() tasks = [] # I'm using test server localhost, but you can use any url url = "http://localhost:8080/{}" for i in range(5): task = asyncio.ensure_future(hello(url.format(i))) tasks.append(task) loop.run_until_complete(asyncio.wait(tasks))
Now let’s say we want to collect all responses in one list and do some postprocessing on them. At the moment we’re not keeping response body anywhere, we just print it, let’s return this response, keep it in list, and print all responses at the end.
To collect bunch of responses you probably need to write something along the lines of:
#!/usr/local/bin/python3.5 import asyncio from aiohttp import ClientSession async def fetch(url, session): async with session.get(url) as response: return await response.read() async def run(r): url = "http://localhost:8080/{}" tasks = [] # Fetch all responses within one Client session, # keep connection alive for all requests. async with ClientSession() as session: for i in range(r): task = asyncio.ensure_future(fetch(url.format(i), session)) tasks.append(task) responses = await asyncio.gather(*tasks) # you now have all response bodies in this variable print(responses) def print_responses(result): print(result) loop = asyncio.get_event_loop() future = asyncio.ensure_future(run(4)) loop.run_until_complete(future)
Notice usage of asyncio.gather()
, this collects bunch of Future objects in one place and waits for all of them to finish.
Now let’s simulate real process of learning and let’s make mistake in above script and try to debug it, this should be really helpful for demonstration purposes.
This is how sample broken async function looks like:
# WARNING! BROKEN CODE DO NOT COPY PASTE async def fetch(url): async with ClientSession() as session: async with session.get(url) as response: return response.read()
This code is broken, but it’s not that easy to figure out why if you dont know much about asyncio. Even if you know Python well but you dont know asyncio or aiohttp well you’ll be in trouble to figure out what happens.
What is output of above function?
It produces following output:
pawel@pawel-VPCEH390X ~/p/l/benchmarker> ./bench.py [<generator object ClientResponse.read at 0x7fa68d465728>, <generator object ClientResponse.read at 0x7fa68cdd9468>, <generator object ClientResponse.read at 0x7fa68d4656d0>, <generator object ClientResponse.read at 0x7fa68cdd9af0>]
What happens here? You expected to get response objects after all processing is done, but here you actually get bunch of generators, why is that?
It happens because as I’ve mentioned earlier response.read()
is async operation, this means that it does not return result immediately, it just returns generator. This generator still needs to be called and executed, and this does not happen by default, yield from
in Python 3.4 and await
in Python 3.5 were added exactly for this purpose: to actually iterate over generator function. Fix to above error is just adding await before response.read()
.
# async operation must be preceded by await return await response.read() # NOT: return response.read()
Let’s break our code in some other way.
# WARNING! BROKEN CODE DO NOT COPY PASTE async def run(r): url = "http://localhost:8080/{}" tasks = [] for i in range(r): task = asyncio.ensure_future(fetch(url.format(i))) tasks.append(task) responses = asyncio.gather(*tasks) print(responses)
Again above code is broken but it’s not easy to figure out why if you’re just learning asyncio.
Above produces following output:
pawel@pawel-VPCEH390X ~/p/l/benchmarker> ./bench.py <_GatheringFuture pending> Task was destroyed but it is pending! task: <Task pending coro=<fetch() running at ./bench.py:7> wait_for=<Future pending cb=[Task._wakeup()]> cb=[gather.<locals>._done_callback(0)() at /usr/local/lib/python3.5/asyncio/tasks.py:602]> Task was destroyed but it is pending! task: <Task pending coro=<fetch() running at ./bench.py:7> wait_for=<Future pending cb=[Task._wakeup()]> cb=[gather.<locals>._done_callback(1)() at /usr/local/lib/python3.5/asyncio/tasks.py:602]> Task was destroyed but it is pending! task: <Task pending coro=<fetch() running at ./bench.py:7> wait_for=<Future pending cb=[Task._wakeup()]> cb=[gather.<locals>._done_callback(2)() at /usr/local/lib/python3.5/asyncio/tasks.py:602]> Task was destroyed but it is pending! task: <Task pending coro=<fetch() running at ./