BeautifulSoup_python3

時間 2019-11-21

標籤 beautifulsoup python3 python 欄目 Python 简体版

原文原文鏈接

1.錯誤排除html

bsObj = BeautifulSoup(html.read())

報錯：python

 UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

解決辦法：windows

bsObj = BeautifulSoup(html.read(),"html.parser")

BeautifulSoup

簡介：經過定位HTML標籤來格式化和組織複雜的網絡信息，用簡單的python對象來展示XML結構信息。網絡

python3 安裝版本4 BeautifulSoup4 （BS4） ssh

運行實例：this

 1 #!/usr/bin/env python
 2 # encoding: utf-8
 3 """
 4 @author: 俠之大者kamil
 5 @file: beautifulsoup.py
 6 @time: 2016/4/19 16:36
 7 """
 8 from bs4 import BeautifulSoup
 9 from urllib.request import urlopen
10 html = urlopen('http://www.cnblogs.com/kamil/')
11 print(type(html))
12 bsObj = BeautifulSoup(html.read(),"html.parser") #html.read() 獲取網頁內容，而且傳輸到BeautifulSoup 對象。
13 print(type(bsObj))
14 print(bsObj.h1)

第12 行注意，須要加上 "html.parser"url

結果：spa

ssh://kamil@xzdz.hk:22/usr/bin/python3 -u /home/kamil/windows_python3/python3/Day11/day12/beautifulsoup.py
<class 'http.client.HTTPResponse'>
<class 'bs4.BeautifulSoup'>
<h1><a class="headermaintitle" href="http://www.cnblogs.com/kamil/" id="Header1_HeaderTitle">俠之大者kamil</a></h1>

Process finished with exit code 0

官方文檔code

更多相關文章...

相關標籤/搜索

Python

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。