python3 利用正則獲取網頁中的想保存下來的內容

時間 2019-12-05

標籤 python3 python 利用正則獲取網頁保存下來內容欄目 Python 简体版

原文原文鏈接

須要獲取某個網頁中表格部分中某個產品的成份html

分析在html中成份的元素代碼url

<a href="/composition/4c3060178d1184935a48c4e51be4f63f.html">水</a>

用正則匹配，因爲 4c3060178d1184935a48c4e51be4f63f是變更的，也須要分組下，成分也是要分組的，所以正則的寫法是：spa

r'<td class="td1">(.*?)">(.*?)</a></td>'

匹配用findall來找全部的，因爲有2個分組，想要的成分保存在元組的index是1 全部代碼中 item[1],就是要保存的內容3d

import  requests
import re

url='https://www.bevol.cn/product/68a3432166d24e22504d0b2b5262ea00.html'
response = requests.get(url)
html=str(response.content,'utf-8')

compile = re.compile(r'<td class="td1">(.*?)">(.*?)</a></td>', re.I) # 不區分大小寫

all = compile.findall(html)
for item in all:
    print(item[1])