官方文檔:https://www.crummy.com/software/BeautifulSoup/bs4/doc/#html
中文文檔:https://www.crummy.com/software/BeautifulSoup/bs4/doc/index.zh.htmlhtml5
Beautiful Soup 是一個能夠從HTML或XML文本中提取數據的Python庫,它能對HTML、XML格式進行解析成樹形結構並提取相關信息。python
Beautiful Soup庫是一個靈活又方便的網頁解析庫,處理高效,支持多種解析庫(後面會介紹),利用它不用編寫正則表達式便可方便地實現網頁信息的提取。正則表達式
安裝json
Beautiful Soup 3 目前已經中止開發,推薦在如今的項目中使用Beautiful Soup 4,安裝方法:api
pip install beautifulsoup4
解析器 | 使用方法 | 優點 | 劣勢 |
---|---|---|---|
Python標準庫 | BeautifulSoup(markup, "html.parser") | Python的內置標準庫、執行速度適中 、文檔容錯能力強 | Python 2.7.3 or 3.2.2)前的版本中文容錯能力差 |
lxml HTML 解析器 | BeautifulSoup(markup, "lxml") | 速度快、文檔容錯能力強 | 須要安裝C語言庫 |
lxml XML 解析器 | BeautifulSoup(markup, "xml") | 速度快、惟一支持XML的解析器 | 須要安裝C語言庫 |
html5lib | BeautifulSoup(markup, "html5lib") | 最好的容錯性、以瀏覽器的方式解析文檔、生成HTML5格式的文檔 | 速度慢、不依賴外部擴展 |
若是僅是想要解析HTML文檔,只要用文檔建立 BeautifulSoup 對象就能夠了。Beautiful Soup會自動選擇一個解析器來解析文檔.可是還能夠經過參數指定使用那種解析器來解析當前文檔。BeautifulSoup 第一個參數應該是要被解析的文檔字符串或是文件句柄,第二個參數用來標識怎樣解析文檔.若是第二個參數爲空,那麼Beautiful Soup根據當前系統安裝的庫自動選擇解析器,解析器的優先數序: lxml, html5lib, Python標準庫(python自帶的解析庫).瀏覽器
安裝解析器庫:服務器
pip install html5lib
pip install lxml
容錯處理,文檔的容錯能力指的是在html代碼不完整的狀況下,使用該模塊能夠識別該錯誤。網絡
使用BeautifulSoup解析上述代碼,可以獲得一個 BeautifulSoup 的對象,並能按照 標準的縮進格式結構輸出數據結構
html = """ <html><head><title>The Dormouse's story</title></head> <body> <p class="title" name="dromouse"><b>The Dormouse's story</b></p> <p class="story">Once upon a time there were three little sisters; and their names were <a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>, <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>; and they lived at the bottom of a well.</p> <p class="story">...</p> """ from bs4 import BeautifulSoup soup = BeautifulSoup(html, 'lxml') print(soup.prettify()) #處理好縮進,結構化顯示 print(soup.title.string)
<html> <head> <title> The Dormouse's story </title> </head> <body> <p class="title" name="dromouse"> <b> The Dormouse's story </b> </p> <p class="story"> Once upon a time there were three little sisters; and their names were <a class="sister" href="http://example.com/elsie" id="link1"> <!-- Elsie --> </a> , <a class="sister" href="http://example.com/lacie" id="link2"> Lacie </a> and <a class="sister" href="http://example.com/tillie" id="link3"> Tillie </a> ; and they lived at the bottom of a well. </p> <p class="story"> ... </p> </body> </html> The Dormouse's story
選擇標籤元素(存在多個時取第一個)
from bs4 import BeautifulSoup import requests html = """ <html><head><title>The Dormouse's story</title></head> <body> <p class="title" name="dromouse"><b>The is pppp</b></p> <p class="story">Once upon a time there were three little sisters; and their names were <a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>, <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>; and they lived at the bottom of a well.</p> <p class="story">...</p> """ soup = BeautifulSoup(html, 'lxml') print(soup.title) #獲取改標籤 <title>The Dormouse's story</title> print(soup.title.name) #獲取標籤名 print(soup.title.text) #獲取標籤內容 print(soup.p.text) print(soup.p.string) dic = soup.p.attrs #獲取 p標籤全部屬性返回一個字典結構 print(dic) #獲取 p標籤全部屬性返回一個字典結構 print(dic["name"]) print(soup.p.attrs["class"]) #獲取指定屬性值,返回列表 print(soup.p["class"])
打印輸出:
<title>The Dormouse's story</title> title The Dormouse's story The is pppp The is pppp {'class': ['title'], 'name': 'dromouse'} dromouse ['title'] ['title']
html = """ <html><head><title>The Dormouse's story</title></head> <body> <div class="title" name="dromouse"><b class='bb bcls xiong'>The Dormouse's story</b></div> <p class="story">Once upon a time there were three little sisters; and their names were <a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>, <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>; and they lived at the bottom of a well.</p> <p class="story">...</p> """ soup = BeautifulSoup(html, 'lxml') print(soup.div.b['class']) #標籤嵌套選擇 print(soup.p.stripped_strings) #<generator object stripped_strings at 0x000002C7CC772830> print(list(soup.p.stripped_strings)) print(soup.p.text)
打印輸出:
['bb', 'bcls', 'xiong'] <generator object stripped_strings at 0x000002471D323830> ['Once upon a time there were three little sisters; and their names were', ',', 'Lacie', 'and', 'Tillie', ';\nand they lived at the bottom of a well.'] Once upon a time there were three little sisters; and their names were , Lacie and Tillie; and they lived at the bottom of a well.
對於一個標籤的兒子節點不只包括標籤節點,也包括字符串節點,空格表示爲'\n'
html = """ <html> <head> <title>The Dormouse's story</title> </head> <body> <p class="story"> Once upon a time there were three little sisters; and their names were <a href="http://example.com/elsie" class="sister" id="link1"> <span>Elsie</span> </a> <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a> and they lived at the bottom of a well. </p> <p class="story">...</p> """ soup = BeautifulSoup(html, 'lxml') print(soup.p.contents) #子節點列表,將<p>全部子節點存在列表中 print("======================================================================>") print(soup.p.children) #子節點的可迭代類型,<list_iterator object at 0x0000029154DF7FD0> for i, child in enumerate(soup.p.children): print(i, str(child).strip()) #child 是bs4.element 對象 print("======================================================================>") print(soup.p.descendants) #子孫節點的迭代類型,<generator object descendants at 0x000001C7583D2888> for i, child in enumerate(soup.p.descendants): print(i, child)
打印輸出:
['\n Once upon a time there were three little sisters; and their names were\n ', <a class="sister" href="http://example.com/elsie" id="link1"> <span>Elsie</span> </a>, '\n', <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, '\n and\n ', <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>, '\n and they lived at the bottom of a well.\n '] ======================================================================> <list_iterator object at 0x000001C2E2AB6EF0> 0 Once upon a time there were three little sisters; and their names were 1 <a class="sister" href="http://example.com/elsie" id="link1"> <span>Elsie</span> </a> 2 3 <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a> 4 and 5 <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a> 6 and they lived at the bottom of a well. ======================================================================> <generator object descendants at 0x000001C2E2AA3830> 0 Once upon a time there were three little sisters; and their names were 1 <a class="sister" href="http://example.com/elsie" id="link1"> <span>Elsie</span> </a> 2 3 <span>Elsie</span> 4 Elsie 5 6 7 <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a> 8 Lacie 9 and 10 <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a> 11 Tillie 12 and they lived at the bottom of a well.
html = """ <html> <head> <title>The Dormouse's story</title> </head> <body> <p class="story"> Once upon a time there were three little sisters; and their names were <a href="http://example.com/elsie" class="sister" id="link1"> <span>Elsie</span> </a> <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a> and they lived at the bottom of a well. </p> <p class="story">...</p> """ soup = BeautifulSoup(html, 'lxml') print(soup.a.parent) print("========================================================================>") print(soup.a.parents) #祖先節點,返回可迭代類型 for item in soup.a.parents: print(item)
打印輸出:
<p class="story"> Once upon a time there were three little sisters; and their names were <a class="sister" href="http://example.com/elsie" id="link1"> <span>Elsie</span> </a> <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a> and <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a> and they lived at the bottom of a well. </p> ========================================================================> <generator object parents at 0x000001A078752830> <p class="story"> Once upon a time there were three little sisters; and their names were <a class="sister" href="http://example.com/elsie" id="link1"> <span>Elsie</span> </a> <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a> and <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a> and they lived at the bottom of a well. </p> <body> <p class="story"> Once upon a time there were three little sisters; and their names were <a class="sister" href="http://example.com/elsie" id="link1"> <span>Elsie</span> </a> <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a> and <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a> and they lived at the bottom of a well. </p> <p class="story">...</p> </body> <html> <head> <title>The Dormouse's story</title> </head> <body> <p class="story"> Once upon a time there were three little sisters; and their names were <a class="sister" href="http://example.com/elsie" id="link1"> <span>Elsie</span> </a> <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a> and <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a> and they lived at the bottom of a well. </p> <p class="story">...</p> </body></html> <html> <head> <title>The Dormouse's story</title> </head> <body> <p class="story"> Once upon a time there were three little sisters; and their names were <a class="sister" href="http://example.com/elsie" id="link1"> <span>Elsie</span> </a> <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a> and <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a> and they lived at the bottom of a well. </p> <p class="story">...</p> </body></html>
html = """ <html> <head> <title>The Dormouse's story</title> </head> <body> <p class="story"> Once upon a time there were three little sisters; and their names were <a href="http://example.com/elsie" class="sister" id="link1"> <span>Elsie</span> </a> <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a> and they lived at the bottom of a well. </p> <p class="story">...</p> """ soup = BeautifulSoup(html, 'lxml') print(list(enumerate(soup.a.next_sibling))) #下一個兄弟節點 print(list(enumerate(soup.a.next_siblings))) #下面全部的兄弟節點 print(list(enumerate(soup.a.previous_sibling))) #上一個兄弟節點 print(list(enumerate(soup.a.previous_siblings))) #上面全部的兄弟節點
打印輸出:
[(0, '\n')] [(0, '\n'), (1, <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>), (2, '\n and\n '), (3, <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>), (4, '\n and they lived at the bottom of a well.\n ')] [(0, '\n'), (1, ' '), (2, ' '), (3, ' '), (4, ' '), (5, ' '), (6, ' '), (7, ' '), (8, ' '), (9, ' '), (10, ' '), (11, ' '), (12, ' '), (13, 'O'), (14, 'n'), (15, 'c'), (16, 'e'), (17, ' '), (18, 'u'), (19, 'p'), (20, 'o'), (21, 'n'), (22, ' '), (23, 'a'), (24, ' '), (25, 't'), (26, 'i'), (27, 'm'), (28, 'e'), (29, ' '), (30, 't'), (31, 'h'), (32, 'e'), (33, 'r'), (34, 'e'), (35, ' '), (36, 'w'), (37, 'e'), (38, 'r'), (39, 'e'), (40, ' '), (41, 't'), (42, 'h'), (43, 'r'), (44, 'e'), (45, 'e'), (46, ' '), (47, 'l'), (48, 'i'), (49, 't'), (50, 't'), (51, 'l'), (52, 'e'), (53, ' '), (54, 's'), (55, 'i'), (56, 's'), (57, 't'), (58, 'e'), (59, 'r'), (60, 's'), (61, ';'), (62, ' '), (63, 'a'), (64, 'n'), (65, 'd'), (66, ' '), (67, 't'), (68, 'h'), (69, 'e'), (70, 'i'), (71, 'r'), (72, ' '), (73, 'n'), (74, 'a'), (75, 'm'), (76, 'e'), (77, 's'), (78, ' '), (79, 'w'), (80, 'e'), (81, 'r'), (82, 'e'), (83, '\n'), (84, ' '), (85, ' '), (86, ' '), (87, ' '), (88, ' '), (89, ' '), (90, ' '), (91, ' '), (92, ' '), (93, ' '), (94, ' '), (95, ' ')] [(0, '\n Once upon a time there were three little sisters; and their names were\n ')]
基於bs4庫的HTML內容查找方法
<>.find_all(name,attrs,recursive,text,**kwargs) # 返回一個列表類型,存儲查找的結果
name 對標籤名稱的檢索字符串 attrs 對標籤屬性值的檢索字符串,可標註屬性檢索 recursive 是否對子孫所有搜索,默認True text 對文本內容進行檢索
其餘的 find 方法:
可根據標籤名、屬性、內容查找文檔
html=''' <div class="panel"> <div class="panel-heading"> <h4>Hello</h4> </div> <div class="panel-body"> <ul class="list" id="list-1"> <li class="element">Foo</li> <li class="element">Bar</li> <li class="element">Jay</li> </ul> <ul class="list list-small" id="list-2"> <li class="element">Foo</li> <li class="element">Bar</li> </ul> </div> </div> ''' from bs4 import BeautifulSoup soup = BeautifulSoup(html, 'lxml') print(soup.find_all('ul')) print(type(soup.find_all('ul')[0]))
[<ul class="list" id="list-1"> <li class="element">Foo</li> <li class="element">Bar</li> <li class="element">Jay</li> </ul>, <ul class="list list-small" id="list-2"> <li class="element">Foo</li> <li class="element">Bar</li> </ul>] <class 'bs4.element.Tag'>
html=''' <div class="panel"> <div class="panel-heading"> <h4>Hello</h4> </div> <div class="panel-body"> <ul class="list" id="list-1"> <li class="element">Foo</li> <li class="element">Bar</li> <li class="element">Jay</li> </ul> <ul class="list list-small" id="list-2"> <li class="element">Foo</li> <li class="element">Bar</li> </ul> </div> </div> ''' from bs4 import BeautifulSoup soup = BeautifulSoup(html, 'lxml') for ul in soup.find_all('ul'): print(ul.find_all('li'))
[<li class="element">Foo</li>, <li class="element">Bar</li>, <li class="element">Jay</li>] [<li class="element">Foo</li>, <li class="element">Bar</li>]
html=''' <div class="panel"> <div class="panel-heading"> <h4>Hello</h4> </div> <div class="panel-body"> <ul class="list" id="list-1" name="elements"> <li class="element">Foo</li> <li class="element">Bar</li> <li class="element">Jay</li> </ul> <ul class="list2 list-small" id="list-2"> <li class="element">Foo</li> <li class="element">Bar</li> </ul> </div> </div> ''' soup = BeautifulSoup(html, 'lxml') print(soup.find_all(attrs={'id': 'list-1'})) #推薦這種寫法 print(soup.find_all(id="list-1")) #相似於**kwargs傳值,與上一種寫法效果相同 print(soup.find_all(attrs={'class': 'list-small'})) print(soup.find_all(class_="list2"))
打印輸出:
[<ul class="list" id="list-1" name="elements"> <li class="element">Foo</li> <li class="element">Bar</li> <li class="element">Jay</li> </ul>] [<ul class="list" id="list-1" name="elements"> <li class="element">Foo</li> <li class="element">Bar</li> <li class="element">Jay</li> </ul>] [<ul class="list2 list-small" id="list-2"> <li class="element">Foo</li> <li class="element">Bar</li> </ul>] [<ul class="list2 list-small" id="list-2"> <li class="element">Foo</li> <li class="element">Bar</li> </ul>]
html=''' <div class="panel"> <div class="panel-heading"> <h4>Hello</h4> </div> <div class="panel-body"> <ul class="list" id="list-1"> <li class="element">Foo</li> <li class="element">Bar</li> <li class="element">Jay</li> </ul> <ul class="list list-small" id="list-2"> <li class="element">Foo</li> <li class="element">Bar</li> </ul> </div> </div> ''' from bs4 import BeautifulSoup soup = BeautifulSoup(html, 'lxml') print(soup.find_all(text='Foo'))
['Foo', 'Foo']
find返回單個元素,find_all返回全部元素
html=''' <div class="panel"> <div class="panel-heading"> <h4>Hello</h4> </div> <div class="panel-body"> <ul class="list" id="list-1"> <li class="element">Foo</li> <li class="element">Bar</li> <li class="element">Jay</li> </ul> <ul class="list list-small" id="list-2"> <li class="element">Foo</li> <li class="element">Bar</li> </ul> </div> </div> ''' from bs4 import BeautifulSoup soup = BeautifulSoup(html, 'lxml') print(soup.find('ul')) print(type(soup.find('ul'))) print(soup.find('page'))
<ul class="list" id="list-1"> <li class="element">Foo</li> <li class="element">Bar</li> <li class="element">Jay</li> </ul> <class 'bs4.element.Tag'> None
find_parents() find_parent()
find_parents()返回全部祖先節點,find_parent()返回直接父節點。
find_next_siblings() find_next_sibling()
find_next_siblings()返回後面全部兄弟節點,find_next_sibling()返回後面第一個兄弟節點。
find_previous_siblings() find_previous_sibling()
find_previous_siblings()返回前面全部兄弟節點,find_previous_sibling()返回前面第一個兄弟節點。
find_all_next() find_next()
find_all_next()返回節點後全部符合條件的節點, find_next()返回第一個符合條件的節點
find_all_previous() 和 find_previous()
find_all_previous()返回節點後全部符合條件的節點, find_previous()返回第一個符合條件的節點
經過select()直接傳入CSS選擇器便可完成選擇
html=''' <div class="panel"> <div class="panel-heading"> <h4>Hello</h4> </div> <div class="panel-heading"> <h4>World</h4> </div> <div class="panel-body"> <ul class="list" id="list-1"> <li class="element">Foo</li> <li class="element">Bar</li> <li class="element">Jay</li> </ul> <ul class="list list-small" id="list-2"> <li class="element">Foo</li> <li class="element">Bar</li> </ul> </div> </div> ''' soup = BeautifulSoup(html, 'lxml') print(soup.select('.panel .panel-heading')) print(soup.select('ul li')) print(soup.select('#list-2 .element')) print(type(soup.select('ul')[0]))
輸出結果:
[<div class="panel-heading"> <h4>Hello</h4> </div>, <div class="panel-heading"> <h4>World</h4> </div>] [<li class="element">Foo</li>, <li class="element">Bar</li>, <li class="element">Jay</li>, <li class="element">Foo</li>, <li class="element">Bar</li>] [<li class="element">Foo</li>, <li class="element">Bar</li>] <class 'bs4.element.Tag'>
html=''' <div class="panel"> <div class="panel-heading"> <h4>Hello</h4> </div> <div class="panel-body"> <ul class="list" id="list-1"> <li class="element">Foo</li> <li class="element">Bar</li> <li class="element">Jay</li> </ul> <ul class="list list-small" id="list-2"> <li class="element">Foo</li> <li class="element">Bar</li> </ul> </div> </div> ''' from bs4 import BeautifulSoup soup = BeautifulSoup(html, 'lxml') for ul in soup.select('ul'): print(ul.select('li'))
[<li class="element">Foo</li>, <li class="element">Bar</li>, <li class="element">Jay</li>] [<li class="element">Foo</li>, <li class="element">Bar</li>]
html=''' <div class="panel"> <div class="panel-heading"> <h4>Hello</h4> </div> <div class="panel-body"> <ul class="list" id="list-1"> <li class="element">Foo</li> <li class="element">Bar</li> <li class="element">Jay</li> </ul> <ul class="list list-small" id="list-2"> <li class="element">Foo</li> <li class="element">Bar</li> </ul> </div> </div> ''' from bs4 import BeautifulSoup soup = BeautifulSoup(html, 'lxml') for ul in soup.select('ul'): print(ul['id']) print(ul.attrs['id'])
list-1 list-1 list-2 list-2
html=''' <div class="panel"> <div class="panel-heading"> <h4>Hello</h4> </div> <div class="panel-body"> <ul class="list" id="list-1"> <li class="element">Foo</li> <li class="element">Bar</li> <li class="element">Jay</li> </ul> <ul class="list list-small" id="list-2"> <li class="element">Foo</li> <li class="element">Bar</li> </ul> </div> </div> ''' from bs4 import BeautifulSoup soup = BeautifulSoup(html, 'lxml') for li in soup.select('li'): print(li.get_text())
Foo
Bar
Jay
Foo
Bar
總結:
步驟1:從網絡上獲取大學排名網頁內容getHTMLText()
步驟2:提取網頁內容中信息到合適的數據結構fillUnivList()
步驟3:利用數據結構展現並輸出結果printUnivLise()
import requests from bs4 import BeautifulSoup import bs4 def getHTMLText(url): try: r = requests.get(url, timeout=30) r.raise_for_status() r.encoding = r.apparent_encoding return r.text except: return "error" def fillUnivList(ulist, html): soup = BeautifulSoup(html, "html.parser") for tr in soup.find('tbody').children: if isinstance(tr, bs4.element.Tag): # 過濾掉非標籤類型 tds = tr('td') ulist.append([tds[0].string, tds[1].string, tds[3].string]) # 中文對齊問題的解決: # 採用中文字符的空格填充 chr(12288) def printUnivList(ulist, num): tplt = "{0:^10}\t{1:{3}^10}\t{2:^10}" print(tplt.format("排名", "學校名稱", "總分", chr(12288))) for i in range(num): u = ulist[i] print(tplt.format(u[0], u[1], u[2], chr(12288))) def main(): uinfo = [] url = 'http://www.zuihaodaxue.cn/zuihaodaxuepaiming2019.html' html = getHTMLText(url) fillUnivList(uinfo, html) printUnivList(uinfo, 20) if __name__ == '__main__': main()
採集到的數據使用pyecharts進行數據可視化展現
import requests,json,re,bs4 from bs4 import BeautifulSoup header = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3472.3 Safari/537.36'} def getHtmlText(url): try: ret = requests.get(url , headers=header , timeout=30) ret.encoding = "utf8" ret.raise_for_status() return ret.text except: return None def fillUnivList(ulist,html): soup = BeautifulSoup(html,"lxml") for tr in soup.tbody.children: if isinstance(tr, bs4.element.Tag): #判斷tr是不是bs4.element.Tag類型 tds = tr("td") # print(tds) ulist.append([tds[0].string,tds[1].string,tds[2].string,tds[3].string]) # 中文對齊問題的解決: # 採用中文字符的空格填充 chr(12288) def printUnivList(ulist, num): tplt = "{0:^10}\t{1:{3}^10}\t{2:^10}" print(tplt.format("排名", "學校名稱", "總分", chr(12288))) for i in range(num): u = ulist[i] print(tplt.format(u[0], u[1], u[3], chr(12288))) #pyecharts數據可視化展現 def showData(ulist,num): from pyecharts import Bar attrs = [] vals = [] for i in range(num): attrs.append(ulist[i][1]) vals.append(ulist[i][3]) bar = Bar("2019中國大學排行榜") bar.add( "中國大學排行榜", attrs, vals, is_datazoom_show=True, datazoom_type="both", datazoom_range=[0, 10], xaxis_rotate=30, xaxis_label_textsize=8, is_label_show=True, ) bar.render("2019中國大學排行榜4.html") def showData_funnel(ulist,num): from pyecharts import Funnel attrs = [] vals = [] for i in range(num): attrs.append(ulist[i][1]) vals.append(ulist[i][3]) funnel = Funnel(width=1000,height=800) funnel.add( "大學排行榜", attrs, vals, is_label_show=True, label_pos="inside", label_text_color="#fff", ) funnel.render("2019中國大學排行榜4.html") def main(): uinfo = [] url = 'http://www.zuihaodaxue.cn/zuihaodaxuepaiming2019.html' html = getHtmlText(url) fillUnivList(uinfo, html) print(uinfo) # showData(uinfo,100) showData_funnel(uinfo,20) # printUnivList(uinfo, 30) if __name__ == '__main__': main()
語法:isinstance(object,type)
做用:來判斷一個對象是不是一個已知的類型。
其第一個參數(object)爲對象,第二個參數(type)爲類型名(int...)或類型名的一個列表((int,list,float)是一個列表)。其返回值爲布爾型(True or flase)。
若對象的類型與參數二的類型相同則返回True。若參數二爲一個元組,則若對象類型與元組中類型名之一相同即返回True。
下面是兩個例子:
例一
>>> a = 4
>>> isinstance (a,int)
True
>>> isinstance (a,str)
False
>>> isinstance (a,(str,int,list))
True
例二
>>> a = "b"
>>> isinstance(a,str)
True
>>> isinstance(a,int)
False
>>> isinstance(a,(int,list,float))
False
>>> isinstance(a,(int,list,float,str))
True
Response.raise_for_status()
若是發送了一個錯誤請求(一個 4XX 客戶端錯誤,或者 5XX 服務器錯誤響應),咱們能夠經過 Response.raise_for_status()
來拋出異常:
>>> bad_r = requests.get('http://httpbin.org/status/404') >>> bad_r.status_code 404
>>> bad_r.raise_for_status() Traceback (most recent call last): File "requests/models.py", line 832, in raise_for_status raise http_error requests.exceptions.HTTPError: 404 Client Error
可是,因爲咱們的例子中 r
的 status_code
是 200
,當咱們調用 raise_for_status()
時,獲得的是:
>>> r.raise_for_status()
None
http://www.cnblogs.com/0bug/p/8260834.html
http://pyecharts.org/#/
https://www.cnblogs.com/kongzhagen/p/6472746.html
https://www.cnblogs.com/haiyan123/p/8289560.html
https://www.cnblogs.com/haiyan123/p/8317398.html