BeautifulSoup的用法

時間 2019-12-08

標籤 beautifulsoup 用法简体版

原文原文鏈接

BeautifulSoup是一個模塊，該模塊用於接收一個HTML或XML字符串，而後將其進行格式化，以後遍能夠使用他提供的方法進行快速查找指定元素，從而使得在HTML或XML中查找指定元素變得簡單。html

 
      from 
      bs4  
      import 
      BeautifulSoup 
     
      html_doc  
      = 
      """ 
     
      <html><head><title>The Dormouse's story</title></head> 
     
      <body> 
     
      asdf 
     
      <div class="title"> 
     
      <b>The Dormouse's story總共</b> 
     
      <h1>f</h1> 
     
      </div> 
     
      <div class="story">Once upon a time there were three little sisters; and their names were 
     
      <a  class="sister0" id="link1">Els<span>f</span>ie</a>, 
     
      <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and 
     
      <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>; 
     
      and they lived at the bottom of a well.</div> 
     
      ad<br/>sf 
     
      <p class="story">...</p> 
     
      </body> 
     
      </html> 
     
      """ 
     
      soup  
      = 
      BeautifulSoup(html_doc, features 
      = 
      "lxml" 
      ) 
     
      # 找到第一個a標籤 
     
      tag1  
      = 
      soup.find(name 
      = 
      'a' 
      ) 
     
      # 找到全部的a標籤 
     
      tag2  
      = 
      soup.find_all(name 
      = 
      'a' 
      ) 
     
      # 找到id＝link2的標籤 
     
      tag3  
      = 
      soup.select( 
      '#link2' 
      )