基本小爬蟲程序

時間 2019-11-13

原文原文鏈接

#!/usr/bin/python
import re
import urllib
def getHtml(url):
    page = urllib.urlopen(url)
    html = page.read()
    return html
def getImg(html):
    reg = r"src=\"(.+\.jpg)\" pic_ext"
    imgre = re.compile(reg)
    imglist = re.findall(imgre,html)
    x = 1
    for imgurl in imglist:
        urllib.urlretrieve(imgurl,"%s.jpg" %x)
        print x
        x += 1
    return (x-1)
html = getHtml("http://tieba.baidu.com/p/2753105329")
print getImg(html)

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。