Beautiful Soup 解析html表格

時間 2019-11-18

原文原文鏈接

from bs4 import BeautifulSoup
import urllib.request
doc = urllib.request.urlopen('http://www.bkzy.org/Index/Declaration?intPageNo=1')
doc = doc.read().decode('utf-8')

soup = BeautifulSoup(doc, "html.parser")

school = 0
pro_code = 1
pro_name = 2
xuewei = 3
pdf = 4


# find_all 查到全部tr列表
for tr in soup.find_all('tr',):　　# 在每一個tr找td
    td = tr.find_all('td')
    try:
        print('%s_%s_%s_%s.pdf' % (
            td[school].text.strip(),
            td[pro_code].text.strip(),
            td[pro_name].text.strip(),
            td[xuewei].text.strip())
            ,td[pdf].find('a')['href'])
    except IndexError as e:
        pass

1. Python HTML解析庫Beautiful Soup
2. Beautiful Soup
3. Beautiful Soup用法
4. Beautiful Soup模塊
5. 04 Beautiful Soup
6. Beautiful Soup 解決爬蟲編碼格式問題，Beautiful Soup編碼格式
7. Beautiful Soup Documentation
8. python Beautiful Soup庫
9. Beautiful Soup庫整理
10. Beautiful Soup庫
更多相關文章...
• TCP報文格式解析 - TCP/IP教程
• XML DOM 解析器 - XML DOM 教程
• IntelliJ IDEA代碼格式化設置
• 互聯網組織的未來：剖析GitHub員工的任性之源

相關標籤/搜索