BeautifulSoup4 入門

時間 2019-12-05

標籤 beautifulsoup4 beautifulsoup 入門简体版

原文原文鏈接

BeautifulSoup是Python包裏最有名的HTML parser分解工具之一。簡單易用

安裝：

pip install beautifulsoup4

注意大小寫，並且不要安裝BeautifulSoup，由於BeautifulSoup表明3.0，已經中止更新。html

經常使用語法

參考我以前的文章：BeautifulSoup ：一些經常使用功能的使用和測試html5

# 建立實例
soup = BeautifulSoup(html, 'html5lib')

選擇器

根據不一樣的網頁，選擇器的使用會很不一樣：shell

絕大部分下使用CSS選擇器select()就足夠了
若是按照標籤屬性名查找，而屬性名中有-等特殊字符，那麼就只能使用find()選擇器了。

# 最佳選擇器: CSS選擇器（返回tag list）
results = soup.select('div[class*=hello_world] ~ div')

for tag in results:
    print(tag.string)       #print the tag's html string
    # print(tag.get_text())     #print its inner text

#單TAG精確選擇器：返回單個tag. 
tag = soup.find('div', attrs={'class': 'detail-block'})
print(tag.get_text())

# 多Tag精確選擇器: 返回的是text，不是tag
results = soup.find_all('div', attrs={'class': 'detail-block'})

# 多class選擇器(標籤含有多個Class)，重點是"class*="
results = soup.select('div[class*=hello_world] ~ div')

獲取值

tag = soup.find('a')

# 只獲取標籤的文本內容
text = tag.get_text()

# 獲取標籤的所有內容(如<a href='sdfj'> asdfa</a>)
s = tag.string

# 獲取標籤的屬性
link = tag['href']

修改值

參考：Beautiful Soup（四）--修改文檔樹函數

tag = soup.find('a', attrs={'class': 'detail-block'})

#修改屬性
tag['href'] = 'https://google.com'

# 修改內容 <tag>..</tag>中間的內容
tag.string = 'New Content'

# 刪除屬性
del tag['class']

對象類型

在咱們使用選擇器搜索各種tag標籤時，BeautifulSoup會根據使用的函數而返回不一樣類型的變量。而不一樣的變量的使用方法也須要注意。工具

Tag類型（<class 'bs4.element.Tag'>）:測試
- tag.string
- tag.get_text()
可遍歷字符串類型（bs4.element.NavigableString）:
Comment類型（<class 'bs4.element.Comment'>）:

增刪改標籤

參考：使用BeautifulSoup改變網頁內容google

# 修改標籤內容
tag = soup.find('title')
tag.string = 'New Title'

1. BeautifulSoup4入門
2. Python beautifulsoup4 快速入門
3. beautifulsoup4
4. BeautifulSoup4庫
5. bs4(BeautifulSoup4)下載
6. Python3.7.0 安裝beautifulsoup4 4.6.3
7. windows下安裝beautifulsoup4
8. Python 中安裝BeautifulSoup4
9. 七、安裝BeautifulSoup4庫
10. beautifulsoup4-4.3.2的安裝
更多相關文章...
• Memcached入門教程 - NoSQL教程
• Neo4j數據庫入門教程 - NoSQL教程
• YAML 入門教程
• Java Agent入門實戰（一）-Instrumentation介紹與使用

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。