【轉】requests、BeautifulSoup使用總結

時間 2019-11-10

標籤 requests beautifulsoup 使用總結简体版

原文原文鏈接

轉自，https://www.cnblogs.com/wupeiqi/articles/6283017.html ----html

Python標準庫中提供了：urllib、urllib二、httplib等模塊以供Http請求，可是，它的 API 太渣了。它是爲另外一個時代、另外一個互聯網所建立的。它須要巨量的工做，甚至包括各類方法覆蓋，來完成最簡單的任務。python

Requests 是使用 Apache2 Licensed 許可證的基於Python開發的HTTP 庫，其在Python內置模塊的基礎上進行了高度的封裝，從而使得Pythoner進行網絡請求時，變得美好了許多，使用Requests能夠垂手可得的完成瀏覽器可有的任何操做。git

一、GET請求github

 
         # 一、無參數實例 
        
         import  
         requests 
        
         ret  
         =  
         requests.get( 
         'https://github.com/timeline.json' 
         ) 
        
         print  
         ret.url 
        
         print  
         ret.text 
        
         # 二、有參數實例 
        
         import  
         requests 
        
         payload  
         =  
         { 
         'key1' 
         :  
         'value1' 
         ,  
         'key2' 
         :  
         'value2' 
         } 
        
         ret  
         =  
         requests.get( 
         "http://httpbin.org/get" 
         , params 
         = 
         payload) 
        
         print  
         ret.url 
        
         print  
         ret.text

二、POST請求json

 
         # 一、基本POST實例 
        
         import  
         requests 
        
         payload  
         =  
         { 
         'key1' 
         :  
         'value1' 
         ,  
         'key2' 
         :  
         'value2' 
         } 
        
         ret  
         =  
         requests.post( 
         "http://httpbin.org/post" 
         , data 
         = 
         payload) 
        
         print  
         ret.text 
        
         # 二、發送請求頭和數據實例 
        
         import  
         requests 
        
         import  
         json 
        
         url  
         =  
         'https://api.github.com/some/endpoint' 
        
         payload  
         =  
         { 
         'some' 
         :  
         'data' 
         } 
        
         headers  
         =  
         { 
         'content-type' 
         :  
         'application/json' 
         } 
        
         ret  
         =  
         requests.post(url, data 
         = 
         json.dumps(payload), headers 
         = 
         headers) 
        
         print  
         ret.text 
        
         print  
         ret.cookies

三、其餘請求api

 
    
     
       
       
         requests.get(url, params 
         = 
         None 
         ,  
         * 
         * 
         kwargs) 
        
 
         requests.post(url, data 
         = 
         None 
         , json 
         = 
         None 
         ,  
         * 
         * 
         kwargs) 
        
 
         requests.put(url, data 
         = 
         None 
         ,  
         * 
         * 
         kwargs) 
        
 
         requests.head(url,  
         * 
         * 
         kwargs) 
        
 
         requests.delete(url,  
         * 
         * 
         kwargs) 
        
 
         requests.patch(url, data 
         = 
         None 
         ,  
         * 
         * 
         kwargs) 
        
 
         requests.options(url,  
         * 
         * 
         kwargs) 
        
 
            
        
 
         # 以上方法均是在此方法的基礎上構建 
        
 
         requests.request(method, url,  
         * 
         * 
         kwargs) 
        
 
     
 
    
  

四、更多參數瀏覽器

參數列表

參數示例

官方文檔：http://cn.python-requests.org/zh_CN/latest/user/quickstart.html#id4cookie

BeautifulSoup

BeautifulSoup是一個模塊，該模塊用於接收一個HTML或XML字符串，而後將其進行格式化，以後遍可使用他提供的方法進行快速查找指定元素，從而使得在HTML或XML中查找指定元素變得簡單。網絡

 
         from  
         bs4  
         import  
         BeautifulSoup 
        
         html_doc  
         =  
         """ 
        
         <html><head><title>The Dormouse's story</title></head> 
        
         <body> 
        
         asdf 
        
         <div class="title"> 
        
         <b>The Dormouse's story總共</b> 
        
         <h1>f</h1> 
        
         </div> 
        
         <div class="story">Once upon a time there were three little sisters; and their names were 
        
         <a  class="sister0" id="link1">Els<span>f</span>ie</a>, 
        
         <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and 
        
         <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>; 
        
         and they lived at the bottom of a well.</div> 
        
         ad<br/>sf 
        
         <p class="story">...</p> 
        
         </body> 
        
         </html> 
        
         """ 
        
         soup  
         =  
         BeautifulSoup(html_doc, features 
         = 
         "lxml" 
         ) 
        
         # 找到第一個a標籤 
        
         tag1  
         =  
         soup.find(name 
         = 
         'a' 
         ) 
        
         # 找到全部的a標籤 
        
         tag2  
         =  
         soup.find_all(name 
         = 
         'a' 
         ) 
        
         # 找到id＝link2的標籤 
        
         tag3  
         =  
         soup.select( 
         '#link2' 
         )

安裝：app

 
         pip3 install beautifulsoup4

使用示例：

 
         from  
         bs4  
         import  
         BeautifulSoup 
        
         html_doc  
         =  
         """ 
        
         <html><head><title>The Dormouse's story</title></head> 
        
         <body> 
        
         ... 
        
         </body> 
        
         </html> 
        
         """ 
        
         soup  
         =  
         BeautifulSoup(html_doc, features 
         = 
         "lxml" 
         )

1. name，標籤名稱

2. attr，標籤屬性

3. children,全部子標籤

 
         # body = soup.find('body') 
        
         # v = body.children

4. children,全部子子孫孫標籤

5. clear,將標籤的全部子標籤所有清空（保留標籤名）

6. decompose,遞歸的刪除全部的標籤

7. extract,遞歸的刪除全部的標籤，並獲取刪除的標籤

8. decode,轉換爲字符串（含當前標籤）；decode_contents（不含當前標籤）

9. encode,轉換爲字節（含當前標籤）；encode_contents（不含當前標籤）

10. find,獲取匹配的第一個標籤

11. find_all,獲取匹配的全部標籤

12. has_attr,檢查標籤是否具備該屬性

13. get_text,獲取標籤內部文本內容

14. index,檢查標籤在某標籤中的索引位置

 
         # tag = soup.find('body') 
        
         # v = tag.index(tag.find('div')) 
        
         # print(v) 
        
         # tag = soup.find('body') 
        
         # for i,v in enumerate(tag): 
        
         # print(i,v)

15. is_empty_element,是不是空標籤(是否能夠是空)或者自閉合標籤，

判斷是不是以下標籤：'br' , 'hr', 'input', 'img', 'meta','spacer', 'link', 'frame', 'base'

16. 當前的關聯標籤

17. 查找某標籤的關聯標籤

18. select,select_one, CSS選擇器

19. 標籤的內容

20.append在當前標籤內部追加一個標籤

21.insert在當前標籤內部指定位置插入一個標籤

22. insert_after,insert_before 在當前標籤後面或前面插入

23. replace_with 在當前標籤替換爲指定標籤

24. 建立標籤之間的關係

25. wrap，將指定標籤把當前標籤包裹起來

26. unwrap，去掉當前標籤，將保留其包裹的標籤

 
         # tag = soup.find('a') 
        
         # v = tag.unwrap() 
        
         # print(soup)

更多參數官方：http://beautifulsoup.readthedocs.io/zh_CN/v4.4.0/

一大波"自動登錄"示例

抽屜新熱榜

github

知乎

博客園

拉勾網

相關標籤/搜索

requests+beautifulsoup

Shape使用總結

用法總結

python+requests+beautifulsoup

python3+requests+beautifulsoup+mysql

requests+beautifulsoup+mysqldb

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。