python-79：爲何要對日期進行處理

時間 2019-11-29

原文原文鏈接

爲何要對日期進行處理html

前面說過，每一頁顯示的文章數有20條，而天天更新的也就三四條，若是每次都將20條信息顯示出來的話，必然會有不少重複的信息，這樣很差，因此我但願能按日期搜索，好比，我天天早上運行這個程序，我但願顯示的是昨天更新的文章，由於若是顯示當天的文章的話，後面還有十幾個小時，指不定會更新什麼的，因此咱們須要對日期進行處理，方便拿來對比而且按照這個時間來進行搜索python

要實現這個功能，咱們獲取的應該有兩個時間，昨天的日期、文章發佈的日期，先獲取昨天的日期，而後拿這個日期與每篇文章發佈的日期進行對比，若是匹配的話就能夠認爲是咱們須要的文章
linux

那麼，怎麼獲取昨天的日期呢
windows

我使用的是linux操做系統，因此經過命令我能夠很容易的獲取到昨天的時間，就像下面這樣：函數

search_date = os.popen(r'date -d"yesterday" +"%Y/%m/%d"').read().strip('\n')
print search_date

上面的代碼只是適用於Linux系統，windows系統上或許有相似的命令你們能夠本身去搜索並實現，可是，這裏還有另一種實現的方法，使用python本身的函數，適用於任何系統post

import datetime
today=datetime.date.today() 
oneday=datetime.timedelta(days=1) 
yesterday=today-oneday   
date = yesterday.strftime(r'%Y/%m/%d')

這樣就可以獲取到昨天的日期了，咱們要用這個日期與文章發佈日期進行對比，(這應該在循環裏面實現，由於咱們有20篇文章，咱們須要對每一篇的發佈日期都進行對比)，若是兩個日期相同，咱們就認爲匹配到的是昨天發佈的文章，而後，咱們再將這些信息輸出，因此，咱們的代碼看起來應該像這樣學習

#!/usr/bin/env python
# -*- coding:UTF-8 -*-
__author__ = '217小月月坑'
 
'''
search the article of yesterday（linux）
'''
 
from bs4 import BeautifulSoup
import urllib2
import re
import os

import sys
reload(sys)
sys.setdefaultencoding( "utf-8" )

# get the source code of html
url = 'http://blog.jobbole.com/all-posts/'
request = urllib2.Request(url)
response = urllib2.urlopen(request)
contents = response.read()

# get the date of yesterday
search_date = os.popen(r'date -d"yesterday" +"%Y/%m/%d"').read().strip('\n')

# get url, title, date of the article
pattern = re.compile(r'class="archive-title".*?href="(.*?)".*?title="(.*?)">.*?<br />(.*?)<a.*?',re.S)
items = re.findall(pattern,contents)
i = 0
for item in items:
    release_date = re.sub(r'\s',"",item[2]).split(r'&')[0]
    # contrast the date
    if search_date == release_date:     
        i += 1 
        print i, item[0], item[1], release_date
        continue

而後結果是這樣的url