Python中syncio和aiohttp

時間 2019-12-04

原文原文鏈接

CPython 解釋器自己就不是線程安全的，所以有全局解釋器鎖(GIL)，一次只容許使用一個線程執行 Python 字節碼。所以，一個 Python 進程一般不能同時使用多個 CPU 核心。然而，標準庫中全部執行阻塞型 I/O 操做的函數，在等待操做系統返回結果時都會釋放GIL。這意味着在 Python 語言這個層次上可使用多線程，而 I/O 密集型 Python 程序能從中受益：一個 Python 線程等待網絡響應時，阻塞型 I/O 函數會釋放 GIL，再運行一個線程。asyncio這個包使用事件循環驅動的協程實現併發。 asyncio 大量使用 yield from 表達式，所以與Python 舊版不兼容。
asyncio 包使用的「協程」是較嚴格的定義。適合asyncio API 的協程在定義體中必須使用 yield from，而不能使用 yield。此外，適合 asyncio 的協程要由調用方驅動，並由調用方經過 yield from 調用;html

先看2個例子：python

import threading
import asyncio
 
@asyncio.coroutine
def hello():
    print('Start Hello', threading.currentThread())
    yield from asyncio.sleep(5)
    print('End Hello', threading.currentThread())
 
@asyncio.coroutine
def world():
    print('Start World', threading.currentThread())
    yield from asyncio.sleep(3)
    print('End World', threading.currentThread())
 
# 獲取EventLoop:
loop = asyncio.get_event_loop()
tasks = [hello(), world()]
# 執行coroutine
loop.run_until_complete(asyncio.wait(tasks))
loop.close()

@asyncio.coroutine把生成器函數標記爲協程類型。
asyncio.sleep(3) 建立一個3秒後完成的協程。
loop.run_until_complete(future)，運行直到future完成;若是參數是 coroutine object,則須要使用 ensure_future()函數包裝。
loop.close() 關閉事件循環web

import asyncio
 
@asyncio.coroutine
def worker(text):
    """ 協程運行的函數 :param text::return: """
    i = 0
    while True:
        print(text, i)
        try:
            yield from asyncio.sleep(.1)
        except asyncio.CancelledError:
            break
        i += 1
 
@asyncio.coroutine
def client(text, io_used):
    work_fu = asyncio.ensure_future(worker(text))
    # 僞裝等待I/O一段時間
    yield from asyncio.sleep(io_used)
    # 結束運行協程
    work_fu.cancel()
    return 'done'
 
loop = asyncio.get_event_loop()
tasks = [client('xiaozhe', 3), client('zzz', 5)]
result = loop.run_until_complete(asyncio.wait(tasks))
loop.close()
print('Answer:', result)

asyncio.ensure_future(coro_or_future, *, loop=None)：計劃安排一個 coroutine object的執行，返回一個 asyncio.Task object。
worker_fu.cancel()：取消一個協程的執行，拋出CancelledError異常。
asyncio.wait()：協程的參數是一個由期物或協程構成的可迭代對象; wait 會分別把各個協程包裝進一個 Task 對象。安全

async和await是針對coroutine的新語法，要使用新的語法，只須要作兩步簡單的替換。
1. 把@asyncio.coroutine替換爲async
2. 把yield from替換爲await服務器

@asyncio.coroutine
def hello():
  print("Hello world!")
  r = yield from asyncio.sleep(1)
  print("Hello again!")

等價於網絡

async def hello():
  print("Hello world!")
  r = await asyncio.sleep(1)
  print("Hello again!")

asyncio能夠實現單線程併發IO操做。若是僅用在客戶端，發揮的威力不大。若是把asyncio用在服務器端，例如Web服務器，因爲HTTP鏈接就是IO操做，所以能夠用單線程+coroutine實現多用戶的高併發支持。session

asyncio實現了TCP、UDP、SSL等協議，aiohttp則是基於asyncio實現的HTTP框架多線程

客戶端：併發

import aiohttp
import asyncio
import async_timeout
 
 
async def fetch(session, url):
    async with async_timeout.timeout(10):
        async with session.get(url) as response:
            return await response.text()
 
 
async def main():
    async with aiohttp.ClientSession() as session:
        html = await fetch(session, 'http://python.org')
        print(html)
 
loop = asyncio.get_event_loop()
loop.run_until_complete(main())

服務端：app

from aiohttp import web
 
async def handle(request):
    name = request.match_info.get('name', "Anonymous")
    text = "Hello, " + name
    return web.Response(text=text)
 
app = web.Application()
app.router.add_get('/', handle)
app.router.add_get('/{name}', handle)
 
web.run_app(app)

運行結果：

爬取噹噹暢銷書的圖書信息的代碼以下：

'''異步方式爬取噹噹暢銷書的圖書信息'''
import  os
import time
import aiohttp
import asyncio
import pandas as pd
from bs4 import BeautifulSoup

# table表格用於儲存書本信息
table = []


# 獲取網頁（文本信息）
async def fetch(session, url):
    async with session.get(url) as response:
        return await response.text(encoding='gb18030')


# 解析網頁
async def parser(html):
    # 利用BeautifulSoup將獲取到的文本解析成HTML
    soup = BeautifulSoup(html, 'lxml')
    # 獲取網頁中的暢銷書信息
    book_list = soup.find('ul', class_='bang_list clearfix bang_list_mode')('li')
    for book in book_list:
        info = book.find_all('div')
        # 獲取每本暢銷書的排名，名稱，評論數，做者，出版社
        rank = info[0].text[0:-1]
        name = info[2].text
        comments = info[3].text.split('條')[0]
        author = info[4].text
        date_and_publisher= info[5].text.split()
        publisher = date_and_publisher[1] if len(date_and_publisher)>=2 else ''
        # 將每本暢銷書的上述信息加入到table中
        table.append([rank, name, comments, author, publisher])


# 處理網頁
async def download(url):
    async with aiohttp.ClientSession() as session:
        html = await fetch(session, url)
        await parser(html)

# 所有網頁
urls = ['http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-recent7-0-0-1-%d'%i for i in range(1,26)]
# 統計該爬蟲的消耗時間
print('#' * 50)
t1 = time.time() # 開始時間

# 利用asyncio模塊進行異步IO處理
loop = asyncio.get_event_loop()
tasks = [asyncio.ensure_future(download(url)) for url in urls]
tasks = asyncio.gather(*tasks)
loop.run_until_complete(tasks)

# 將table轉化爲pandas中的DataFrame並保存爲CSV格式的文件
df = pd.DataFrame(table, columns=['rank', 'name', 'comments', 'author', 'publisher'])
df.to_csv('dangdang.csv', index=False)

t2 = time.time() # 結束時間
print('使用aiohttp，總共耗時：%s' % (t2 - t1))
print('#' * 50)

1. python aiohttp與aiohttp
2. python aiohttp
3. Python aiohttp
4. python aiohttp模塊
5. Python 模塊 aiohttp
6. Python調用aiohttp
7. python中的異步編程：asyncio 和aiohttp 的結合使用（aiohttp-requests、aiofiles）
8. python---aiohttp的使用
9. asyncio 和aiohttp
10. Python中asyncio與aiohttp入門教程
更多相關文章...
• SQLite - Python - SQLite教程
• Docker 安裝 Python - Docker教程
• Scala 中文亂碼解決
• C# 中 foreach 遍歷的用法

相關標籤/搜索

pytest+allure+aiohttp

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。