貓哥教你寫爬蟲 033--爬蟲初體驗-BeautifulSoup-做業

beautifulsoup 解析器

解析器 使用方法 優點 劣勢
Python標準庫 BeautifulSoup(text, "html.parser") Python的內置標準庫執行速度適中文檔容錯能力強 Python 2.7.3 or 3.2.2前的版本中文檔容錯能力差
lxml HTML 解析器 BeautifulSoup(text, "lxml") 速度快文檔容錯能力強 須要安裝C語言庫
lxml XML 解析器 BeautifulSoup(text, "xml") 速度快惟一支持XML的解析器 須要安裝C語言庫
html5lib BeautifulSoup(text, "html5lib") 生成HTML5格式的文檔 速度慢不依賴外部擴展

做業1:爬取文章, 並保存到本地(每一個文章, 一個html文件)

wordpress-edu-3autumn.localprod.forc.work

import requests
from bs4 import BeautifulSoup
soup = BeautifulSoup(requests.get('https://wordpress-edu-3autumn.localprod.forc.work/').text,'html.parser')
for i in soup.find_all('h2',class_='entry-title'):
    print(i.find('a').text)
    with open('{}.html'.format(i.find('a').text),'w',encoding='utf8') as file:
        soup = BeautifulSoup(requests.get(i.find('a')['href']).text,'lxml')
        file.write(str(soup.find('div',class_='entry-content')))
複製代碼

做業2: 爬取分類下的圖書名和對應價格, 保存到books.txt

books.toscrape.com

最終效果...

1559113335513

import requests
from bs4 import BeautifulSoup
soup = BeautifulSoup(requests.get('http://books.toscrape.com/').text,'html.parser')
with open('books.txt','w',encoding='utf8') as file:
    for i in soup.find('ul',class_='nav nav-list').find('ul').find_all('li'):
        file.write(i.text.strip()+'\n')
        res = requests.get("http://books.toscrape.com/"+i.find('a')['href'])
        res.encoding='utf8'
        soup = BeautifulSoup(res.text,'html.parser')
        for j in soup.find_all('li',class_="col-xs-6 col-sm-4 col-md-3 col-lg-3"):
            print(j.find('h3').find('a')['title'])
            file.write('\t"{}" {}\n'.format(j.find('h3').find('a')['title'],j.find('p',class_='price_color').text))
複製代碼
Travel
	"It's Only the Himalayas" £45.17
	"Full Moon over Noah’s Ark: An Odyssey to Mount Ararat and Beyond" £49.43
	"See America: A Celebration of Our National Parks & Treasured Sites" £48.87
	"Vagabonding: An Uncommon Guide to the Art of Long-Term World Travel" £36.94
	"Under the Tuscan Sun" £37.33
	"A Summer In Europe" £44.34
	"The Great Railway Bazaar" £30.54
	"A Year in Provence (Provence #1)" £56.88
	"The Road to Little Dribbling: Adventures of an American in Britain (Notes From a Small Island #2)" £23.21
	"Neither Here nor There: Travels in Europe" £38.95
	"1,000 Places to See Before You Die" £26.08
Mystery
	"Sharp Objects" £47.82
	"In a Dark, Dark Wood" £19.63
	"The Past Never Ends" £56.50
	"A Murder in Time" £16.64
	"The Murder of Roger Ackroyd (Hercule Poirot #4)" £44.10
	"The Last Mile (Amos Decker #2)" £54.21
	"That Darkness (Gardiner and Renner #1)" £13.92
	"Tastes Like Fear (DI Marnie Rome #3)" £10.69
	"A Time of Torment (Charlie Parker #14)" £48.35
	"A Study in Scarlet (Sherlock Holmes #1)" £16.73
	"Poisonous (Max Revere Novels #3)" £26.80
	"Murder at the 42nd Street Library (Raymond Ambler #1)" £54.36
	"Most Wanted" £35.28
	"Hide Away (Eve Duncan #20)" £11.84
	"Boar Island (Anna Pigeon #19)" £59.48
	"The Widow" £27.26
	"Playing with Fire" £13.71
	"What Happened on Beale Street (Secrets of the South Mysteries #2)" £25.37
	"The Bachelor Girl's Guide to Murder (Herringford and Watts Mysteries #1)" £52.30
	"Delivering the Truth (Quaker Midwife Mystery #1)" £20.89
Historical Fiction
	"Tipping the Velvet" £53.74
	"Forever and Forever: The Courtship of Henry Longfellow and Fanny Appleton" £29.69
	"A Flight of Arrows (The Pathfinders #2)" £55.53
	"The House by the Lake" £36.95
	"Mrs. Houdini" £30.25
	"The Marriage of Opposites" £28.08
	"Glory over Everything: Beyond The Kitchen House" £45.84
	"Love, Lies and Spies" £20.55
	"A Paris Apartment" £39.01
	"Lilac Girls" £17.28
	"The Constant Princess (The Tudor Court #1)" £16.62
	"The Invention of Wings" £37.34
	"World Without End (The Pillars of the Earth #2)" £32.97
	"The Passion of Dolssa" £28.32
	"Girl With a Pearl Earring" £26.77
	"Voyager (Outlander #3)" £21.07
	"The Red Tent" £35.66
	"The Last Painting of Sara de Vos" £55.55
	"The Guernsey Literary and Potato Peel Pie Society" £49.53
	"Girl in the Blue Coat" £46.83
......
複製代碼

快速跳轉:

貓哥教你寫爬蟲 000--開篇.md
貓哥教你寫爬蟲 001--print()函數和變量.md
貓哥教你寫爬蟲 002--做業-打印皮卡丘.md
貓哥教你寫爬蟲 003--數據類型轉換.md
貓哥教你寫爬蟲 004--數據類型轉換-小練習.md
貓哥教你寫爬蟲 005--數據類型轉換-小做業.md
貓哥教你寫爬蟲 006--條件判斷和條件嵌套.md
貓哥教你寫爬蟲 007--條件判斷和條件嵌套-小做業.md
貓哥教你寫爬蟲 008--input()函數.md
貓哥教你寫爬蟲 009--input()函數-人工智能小愛同窗.md
貓哥教你寫爬蟲 010--列表,字典,循環.md
貓哥教你寫爬蟲 011--列表,字典,循環-小做業.md
貓哥教你寫爬蟲 012--布爾值和四種語句.md
貓哥教你寫爬蟲 013--布爾值和四種語句-小做業.md
貓哥教你寫爬蟲 014--pk小遊戲.md
貓哥教你寫爬蟲 015--pk小遊戲(全新改版).md
貓哥教你寫爬蟲 016--函數.md
貓哥教你寫爬蟲 017--函數-小做業.md
貓哥教你寫爬蟲 018--debug.md
貓哥教你寫爬蟲 019--debug-做業.md
貓哥教你寫爬蟲 020--類與對象(上).md
貓哥教你寫爬蟲 021--類與對象(上)-做業.md
貓哥教你寫爬蟲 022--類與對象(下).md
貓哥教你寫爬蟲 023--類與對象(下)-做業.md
貓哥教你寫爬蟲 024--編碼&&解碼.md
貓哥教你寫爬蟲 025--編碼&&解碼-小做業.md
貓哥教你寫爬蟲 026--模塊.md
貓哥教你寫爬蟲 027--模塊介紹.md
貓哥教你寫爬蟲 028--模塊介紹-小做業-廣告牌.md
貓哥教你寫爬蟲 029--爬蟲初探-requests.md
貓哥教你寫爬蟲 030--爬蟲初探-requests-做業.md
貓哥教你寫爬蟲 031--爬蟲基礎-html.md
貓哥教你寫爬蟲 032--爬蟲初體驗-BeautifulSoup.md
貓哥教你寫爬蟲 033--爬蟲初體驗-BeautifulSoup-做業.md
貓哥教你寫爬蟲 034--爬蟲-BeautifulSoup實踐.md
貓哥教你寫爬蟲 035--爬蟲-BeautifulSoup實踐-做業-電影top250.md
貓哥教你寫爬蟲 036--爬蟲-BeautifulSoup實踐-做業-電影top250-做業解析.md
貓哥教你寫爬蟲 037--爬蟲-寶寶要聽歌.md
貓哥教你寫爬蟲 038--帶參數請求.md
貓哥教你寫爬蟲 039--存儲數據.md
貓哥教你寫爬蟲 040--存儲數據-做業.md
貓哥教你寫爬蟲 041--模擬登陸-cookie.md
貓哥教你寫爬蟲 042--session的用法.md
貓哥教你寫爬蟲 043--模擬瀏覽器.md
貓哥教你寫爬蟲 044--模擬瀏覽器-做業.md
貓哥教你寫爬蟲 045--協程.md
貓哥教你寫爬蟲 046--協程-實踐-吃什麼不會胖.md
貓哥教你寫爬蟲 047--scrapy框架.md
貓哥教你寫爬蟲 048--爬蟲和反爬蟲.md
貓哥教你寫爬蟲 049--完結撒花.mdhtml

相關文章
相關標籤/搜索