1.錯誤排除html
bsObj = BeautifulSoup(html.read())
報錯:python
UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.
解決辦法:windows
bsObj = BeautifulSoup(html.read(),"html.parser")
簡介:經過定位HTML標籤來格式化和組織複雜的網絡信息,用簡單的python對象來展示XML結構信息。網絡
python3 安裝 版本4 BeautifulSoup4 (BS4) ssh
運行實例:this
1 #!/usr/bin/env python 2 # encoding: utf-8 3 """ 4 @author: 俠之大者kamil 5 @file: beautifulsoup.py 6 @time: 2016/4/19 16:36 7 """ 8 from bs4 import BeautifulSoup 9 from urllib.request import urlopen 10 html = urlopen('http://www.cnblogs.com/kamil/') 11 print(type(html)) 12 bsObj = BeautifulSoup(html.read(),"html.parser") #html.read() 獲取網頁內容,而且傳輸到BeautifulSoup 對象。 13 print(type(bsObj)) 14 print(bsObj.h1)
第12 行注意,須要加上 "html.parser"url
結果:spa
ssh://kamil@xzdz.hk:22/usr/bin/python3 -u /home/kamil/windows_python3/python3/Day11/day12/beautifulsoup.py <class 'http.client.HTTPResponse'> <class 'bs4.BeautifulSoup'> <h1><a class="headermaintitle" href="http://www.cnblogs.com/kamil/" id="Header1_HeaderTitle">俠之大者kamil</a></h1> Process finished with exit code 0
官方文檔code