Python學習---網頁爬蟲[下載圖片]

時間 2020-05-08

原文原文鏈接

爬蟲學習--下載圖片

1.主要用到了urllib和re庫 html

2.利用urllib.urlopen()函數得到頁面源代碼函數

3.利用正則匹配圖片類型,固然正則越準確，下載的越多學習

4.利用urllib.urlretrieve()下載圖片，而且能夠從新命名，利用%S url

5.應該是運營商有所限制，因此未能下載所有的圖片，不過仍是OK的spa

URL分析：code

源碼：htm

#coding=utf-8
import re
import urllib
def getHtml(url):
    page=urllib.urlopen(url)
    html=page.read();
    return html
def getImage(html):
    reg=r'src="(.*?\.jpg)" size'
    imgre=re.compile(reg)
    imgeList =re.findall(imgre,html)
    x=0
    for image in imgeList:
        urllib.urlretrieve(image,'%s_hhh.jpg' % x)
        x+=1
html=getHtml("https://tieba.baidu.com/p/5256641773")
getImage(html)

相關標籤/搜索