Beautiful Soup的用法（五）：select的使用

時間 2019-12-13

標籤 beautiful soup 用法 select 使用简体版

原文原文鏈接

原文地址：http://www.bugingcode.com/blog/beautiful_soup_select.htmlcss

select 的功能跟find和find_all 同樣用來選取特定的標籤，它的選取規則依賴於css，咱們把它叫作css選擇器，若是以前有接觸過jquery ，能夠發現select的選取規則和jquery有點像。html

經過標籤名查找

在進行過濾時標籤名不加任何修飾，以下：python

from bs4 import BeautifulSoup  
import re  
  
html = """  
<html><head><title>The Dormouse's story</title></head>  
<body>  
<p class="title" name="dromouse"><b>The Dormouse's story</b></p>  
<p class="story">Once upon a time there were three little sisters; and their names were  
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and  
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;  
and they lived at the bottom of a well.</p>  
</body>  
</html>  
"""  
  
soup = BeautifulSoup(html, "lxml")  
print soup.select('p')

返回的結果以下：jquery

[<p class="title" name="dromouse"><b>The Dormouse's story</b></p>, <p class="story">Once upon a time there were three little sisters; and their names were\n<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a> and\n<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>;\nand they lived at the bottom of a well.</p>]

經過結果能夠看出，他返回的是一個數組，再繼續看看數組裏的元素是什麼呢？編程

print type(soup.select('p')[0])

結果爲：數組

<class 'bs4.element.Tag'>

清楚了返回的是bs4.element.Tag，這一點和find_all是同樣的，select('p') 返回了全部標籤名爲p的tag。code

經過類名和id進行查找

在進行過濾時類名前加點，id名前加 #orm

print soup.select('.title')  
print soup.select('#link2')

返回的結果爲：xml

[<p class="title" name="dromouse"><b>The Dormouse's story</b></p>]
[<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>]

經過屬性查找

若是不是id或者是類名，是否是就不能進行過濾了？若是能夠，該如何來表達，htm

print soup.select('[href="http://example.com/lacie"]')

選擇href 爲http://example.com/lacie　的tag。

組合查找

組合查找能夠分爲兩種，一種是在一個tag中進行兩個條件的查找，一種是樹狀的查找一層一層之間的查找。

第一種狀況，以下所示：

print soup.select('a#link2')

選擇標籤名爲a，id爲link2的tag。

輸出的結果以下：

[<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>]

另外一種狀況，以下：

從body開始，在body裏面查找全部的 p，在全部的p 中查找標籤名爲a，id 爲link2的tag，這樣像樹狀一層一層的查找，在分析html結構是是很是常見的。層和層之間用空格分開。

print soup.select('body p a#link2')

結果以下：

[<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>]

轉載請標明來之：http://www.bugingcode.com/

更多教程：阿貓學編程

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。