beautifulsoup簡單用法

時間 2019-11-11

標籤 beautifulsoup 簡單用法简体版

原文原文鏈接

原文地址html

http://www.cnblogs.com/yupeng/p/3362031.html

這篇文章講的也很全數組

http://www.cnblogs.com/twinsclover/archive/2012/04/26/2471704.html

稍微研究了下bs4這個庫，運行了下都還好用，就是解析html的各類結構，和xml的elementTree解析庫是相似的，使用起來差很少。spa

能夠直接調試，用來熟悉其用法調試

 1 # coding=utf-8
 2 #
 3 from bs4 import BeautifulSoup
 4 
 5 html_doc = """
 6 <html><head><title>The Dormouse's story</title></head>
 7 <body>
 8 <p class="title"><b>The Dormouse's story</b></p>
 9 <p class="story">Once upon a time there were three little sisters; and their names were
10 <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
11 <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
12 <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
13 and they lived at the bottom of a well.</p>
14 <p class="story">...</p>
15 """
16 
17 soup = BeautifulSoup(html_doc,'html.parser')
18 # print soup.title
19 # print soup.title.name
20 # print soup.title.string
21 # print soup.p
22 # print soup.a
23 # print soup.find_all('a')
24 # a=soup.find_all('a')
25 # print len(a)
26 # print soup.find_all('p')#返回相似數組的結構
27 # p=soup.find_all('p')
28 # print len(p)
29 # print soup.find(id='link3')
30 
31 # print soup.get_text()#返回整個的文本
32 # print soup.p.get_text()#根據解析的節點來
33 # for i in soup.find_all('p'):
34     # print i.get_text()
35     # print i.contents
36 # print soup.a['href'],soup.a['class'],soup.a['id'],soup.a.text#注意單節點的每一個內容都獲取到了
37 # print soup.html,soup.head,soup.body#s總體，頭，身體，所有的結構
38 # print soup.p.contents,soup.head.contents#列表形式返回子內容
39 # for i in list(soup.head.children):#不須要知道子節點的名稱，迭代遍歷子內容
40 #     print i,
41 # print soup.a.parent#向上查找，parents是查找全部的
42 # for i in soup.html.parents:
43 #     print i,len(i)
44 # print soup.a.parent
45 # print soup.find_all(class_="sister")
46 print soup.find_all('a',limit=1)#限制個數

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。