Python爬蟲爬取網頁圖片

時間 2019-12-01

原文原文鏈接

沒想到python是如此強大，使人着迷，之前看見圖片老是一張一張複製粘貼，如今好了，學會python就能夠用程序將一張張圖片，保存下來。html

今天逛貼吧看見好多美圖，但是圖片有點多，不想一張一張地複製粘貼，怎麼辦呢？辦法老是有的，即使沒有咱們也能夠創造一個辦法。python

下面就看看我今天寫的程序：正則表達式

#coding=utf-8

#urllib模塊提供了讀取Web頁面數據的接口
import urllib.request
#re模塊主要包含了正則表達式
import re
#定義一個getHtml()函數
def getHtml(url):
    page = urllib.request.urlopen(url)  #urllib.request.urlopen()方法用於打開一個URL地址
    html = page.read() #read()方法用於讀取URL上的數據
    return html

def getImg(html):
    reg = r'src="(.+?\.jpg)" pic_ext'    #正則表達式，獲得圖片地址
    imgre = re.compile(reg)     #re.compile() 能夠把正則表達式編譯成一個正則表達式對象.
    html = html.decode('utf-8') #python3
    imglist = re.findall(imgre,html)      #re.findall() 方法讀取html 中包含 imgre（正則表達式）的數據
    #把篩選的圖片地址經過for循環遍歷並保存到本地
    #核心是urllib.request.urlretrieve()方法,直接將遠程數據下載到本地，圖片經過x依次遞增命名
    x = 0

    for imgurl in imglist:
     urllib.request.urlretrieve(imgurl,'D:\E\%s.jpg' % x)
     x += 1


html = getHtml("https://tieba.baidu.com/p/xxxxxxxx")
print(getImg(html))