python的基礎爬蟲（利用requests和bs4）

時間 2019-11-26

標籤 python 基礎爬蟲利用 requests bs4 欄目 Python 简体版

原文原文鏈接

一、將請求網上資源：html

1 import requests
2 res=requests.get('http://*******')
3 res.encoding='utf-8'
4 print(res.text)

這裏面使用requests的get方法來獲取html，具體是get仍是post等等要經過網頁頭信息來查詢：post

好比百度的方法就是能夠利用get獲得。spa

二、將獲得的網頁利用BeautifulSoup進行剖析code

1 from bs4 import BeautifulSoup
2 soup=BeautifulSoup(res.text,'html.parser')
3 print(soup)#能夠看到網頁的內容
4 for news in soup.select('.news-item'):#爬取一些新聞信息
5     header=news.select（'h1'）[0].text#新聞標題
6     time=news.select('.time')[0]#時間
7     print（header,time)

這裏面須要注意的是結點的問題，在查看網頁的源代碼的時候要分清信息存儲的位置，一步一步進行剖析，合理使用for循環。htm

相關標籤/搜索

python+requests+bs4+xlwt