第一個爬蟲程序

時間 2021-08-12

標籤 html 數組 markdown ide url code htm beautifulsoup class 欄目網絡爬蟲简体版

原文原文鏈接

使用的庫

from urllib.request import urlopen
from bs4 import BeautifulSoup as bf

發出請求，獲取html（獲取到的是字節，須要轉換）
html

html=urlopen("http://www.baidu.com")

數組

用beautifulsoup將獲取的內容轉換爲結構化內容

obj=bf(html.read(),'html.parser')markdown

find_all方法能夠找出全部的對應標籤

logo_pic_info=obj.findall('img',class="index-logo-src")

ide

logo_url="http:"+logo_pic_info[1]['src']

url

用find_all獲取的標籤是一個數組，能夠用數組的訪問方法訪問。code

相關標籤/搜索