1. 安裝 pip install BeautifulSouphtml
2. 官方文檔 - https://www.crummy.com/software/BeautifulSoup/bs4/doc/python
3. 經常使用的Object類型spa
from bs4 import BeautifulSoup bsObj = BeautifulSoup('<p style="float:left">Chapter 1</p>', 'html.parser') #BeautifulSoup Object tagObj = bsObj.p #Tag Object navStrObj = tagObj.string #NavigableString Object
4. 經常使用APIcode
5. Samplehtm
HTMLblog
<html> <body> <span class="red yellow">Story1</span> <span class="green">Story2</span> <span class="red">Story3</span> <span class="green" id="four">Story4</span> </body> </html>
Pythonip
data = bsObj.findAll('span') # [<span class="red yellow">Story1</span>, <span class="green">Story2</span>, <span class="red">Story3</span>, <span class="green" id="four">Story4</span>] #同時知足兩個屬性 data = bsObj.findAll(attrs = {'id':'four', 'class': 'green'}) # [<span class="green" id="four">Story4</span>] #同時知足一個屬性的多個值,順序也必須相同 data = bsObj.findAll(attrs = {'class': 'red yellow'}) # [<span class="red yellow">Story1</span>] data = bsObj.findAll(attrs = {'class': 'yellow red'}) # [] #只輸入string參數,返回NavigableString list data = bsObj.findAll(string='Story1') # ['Story1'] data = bsObj.findAll(string=['Story1','Story2']) # ['Story1', 'Story2'] #keyword是class時,須要加下劃線,避免和python關鍵詞class衝突 data = bsObj.findAll(class_='green') # [<span class="green">Story2</span>, <span class="green" id="four">Story4</span>] #是否包含某個屬性 data = bsObj.findAll(id=True) # [<span class="green" id="four">Story4</span>] #tag中所包含的內容 data = bsObj.findAll(id=True)[0].get_text() # Story4 (這裏是string類型,而不是NavigableString)