Python學習教程(Python學習路線):Pandas庫基礎分析-詳解時間序列的處理bash
在使用Python進行數據分析時,常常會遇到時間日期格式處理和轉換,特別是分析和挖掘與時間相關的數據,好比量化交易就是從歷史數據中尋找股價的變化規律。Python中自帶的處理時間的模塊有datetime,NumPy庫也提供了相應的方法,Pandas做爲Python環境下的數據分析庫,更是提供了強大的日期數據處理的功能,是處理時間序列的利器。學習
一、生成日期序列ui
主要提供pd.data_range()和pd.period_range()兩個方法,給定參數有起始時間、結束時間、生成時期的數目及時間頻率(freq='M’月,'D’天,‘W’,周,'Y’年)等。spa
兩種主要區別在於pd.date_range()生成的是DatetimeIndex格式的日期序列;pd.period_range()生成的是PeriodIndex格式的日期序列。code
如下經過生成月時間序列和周時間序列來對比下:cdn
date_rng = pd.date_range('2019-01-01', freq='M', periods=12)
print(f'month date_range(): {date_rng}')
""" date_range(): DatetimeIndex(['2019-01-31', '2019-02-28', '2019-03-31', '2019-04-30', '2019-05-31', '2019-06-30', '2019-07-31', '2019-08-31', '2019-09-30', '2019-10-31', '2019-11-30', '2019-12-31'], dtype='datetime64[ns]', freq='M') """
period_rng = pd.period_range('2019/01/01', freq='M', periods=12)
print(f'month period_range(): {period_rng}')
""" period_range(): PeriodIndex(['2019-01', '2019-02', '2019-03', '2019-04', '2019-05', '2019-06', '2019-07', '2019-08', '2019-09', '2019-10', '2019-11', '2019-12'], dtype='period[M]', freq='M') """
date_rng = pd.date_range('2019-01-01', freq='W-SUN', periods=12)
print(f'week date_range(): {date_rng}')
""" week date_range(): DatetimeIndex(['2019-01-06', '2019-01-13', '2019-01-20', '2019-01-27', '2019-02-03', '2019-02-10', '2019-02-17', '2019-02-24', '2019-03-03', '2019-03-10', '2019-03-17', '2019-03-24'], dtype='datetime64[ns]', freq='W-SUN') """
period_rng=pd.period_range('2019-01-01',freq='W-SUN',periods=12)
print(f'week period_range(): {period_rng}')
""" week period_range(): PeriodIndex(['2018-12-31/2019-01-06', '2019-01-07/2019-01-13', '2019-01-14/2019-01-20', '2019-01-21/2019-01-27', '2019-01-28/2019-02-03', '2019-02-04/2019-02-10', '2019-02-11/2019-02-17', '2019-02-18/2019-02-24', '2019-02-25/2019-03-03', '2019-03-04/2019-03-10', '2019-03-11/2019-03-17', '2019-03-18/2019-03-24'], dtype='period[W-SUN]', freq='W-SUN') """
date_rng = pd.date_range('2019-01-01 00:00:00', freq='H', periods=12)
print(f'hour date_range(): {date_rng}')
""" hour date_range(): DatetimeIndex(['2019-01-01 00:00:00', '2019-01-01 01:00:00', '2019-01-01 02:00:00', '2019-01-01 03:00:00', '2019-01-01 04:00:00', '2019-01-01 05:00:00', '2019-01-01 06:00:00', '2019-01-01 07:00:00', '2019-01-01 08:00:00', '2019-01-01 09:00:00', '2019-01-01 10:00:00', '2019-01-01 11:00:00'], dtype='datetime64[ns]', freq='H') """
period_rng=pd.period_range('2019-01-01 00:00:00',freq='H',periods=12)
print(f'hour period_range(): {period_rng}')
""" hour period_range(): PeriodIndex(['2019-01-01 00:00', '2019-01-01 01:00', '2019-01-01 02:00', '2019-01-01 03:00', '2019-01-01 04:00', '2019-01-01 05:00', '2019-01-01 06:00', '2019-01-01 07:00', '2019-01-01 08:00', '2019-01-01 09:00', '2019-01-01 10:00', '2019-01-01 11:00'], dtype='period[H]', freq='H') """
複製代碼
二、生成Timestamp對象及轉換對象
建立一個Timestamp時間戳對象有pd.Timestamp()方法和pd.to_datetime()方法。以下所示:blog
ts=pd.Timestamp(2019,1,1)
print(f'pd.Timestamp()-1:{ts}')
#pd.Timestamp()-1:2019-01-01 00:00:00
ts=pd.Timestamp(dt(2019,1,1,hour=0,minute=1,second=1))
print(f'pd.Timestamp()-2:{ts}')
#pd.Timestamp()-2:2019-01-01 00:01:01
ts=pd.Timestamp("2019-1-1 0:1:1")
print(f'pd.Timestamp()-3:{ts}')
#pd.Timestamp()-3:2019-01-01 00:01:01
print(f'pd.Timestamp()-type:{type(ts)}')
#pd.Timestamp()-type:<class 'pandas._libs.tslibs.timestamps.Timestamp'>
#dt=pd.to_datetime(2019,1,1) 不支持
dt=pd.to_datetime(dt(2019,1,1,hour=0,minute=1,second=1))
print(f'pd.to_datetime()-1:{dt}')
#pd.to_datetime()-1:2019-01-01 00:01:01
dt=pd.to_datetime("2019-1-1 0:1:1")
print(f'pd.to_datetime()-2:{dt}')
#pd.to_datetime()-2:2019-01-01 00:01:01
print(f'pd.to_datetime()-type:{type(dt)}')
#pd.to_datetime()-type:<class 'pandas._libs.tslibs.timestamps.Timestamp'>
#pd.to_datetime生成自定義時間序列
dtlist=pd.to_datetime(["2019-1-1 0:1:1", "2019-3-1 0:1:1"])
print(f'pd.to_datetime()-list:{dtlist}')
#pd.to_datetime()-list:DatetimeIndex(['2019-01-01 00:01:01', '2019-03-01 00:01:01'], dtype='datetime64[ns]', freq=None)
#時間戳轉換爲period月時期
pr = ts.to_period('M')
print(f'ts.to_period():{pr}')
#ts.to_period():2019-01
print(f'pd.to_period()-type:{type(pr)}')
#pd.to_period()-type:<class 'pandas._libs.tslibs.period.Period'>
複製代碼
三、生成period對象及轉換教程
#定義時期period
per=pd.Period('2019')
print(f'pd.Period():{per}')
#pd.Period():2019
per_del=pd.Period('2019')-pd.Period('2018')
print(f'2019和2018間隔{per_del}年')#能夠直接+、-整數(表明年)
#2019和2018間隔1年
#時期轉換爲時間戳
print(per.to_timestamp(how='end'))#2019-12-31 00:00:00
print(per.to_timestamp(how='start'))#2019-01-01 00:00:00
複製代碼
四、生成時間間隔Timedelta索引
#生成時間間隔Timedelta
print(pd.Timedelta(days=5, minutes=50, seconds=20, milliseconds=10, microseconds=10, nanoseconds=10))
#5 days 00:50:20.010010
#獲取當前時間
now=pd.datetime.now()
#計算當前時間日後50天的日期
dt=now+pd.Timedelta(days=50)
print(f'當前時間是{now}, 50天后時間是{dt}')
#當前時間是2019-06-08 17:59:31.726065, 50天后時間是2019-07-28 17:59:31.726065
#只顯示年月日
print(dt.strftime('%Y-%m-%d'))#2019-07-28
複製代碼
五、重採樣及頻率轉換
#asfreq 按季度顯示索引值
#'DatetimeIndex' object has no attribute 'asfreq'
date=pd.date_range('1/1/2018', periods=20, freq='D')
tsdat_series=pd.Series(range(20),index=date)
tsp_series=tsdat_series.to_period('D')
print(tsp_series.index.asfreq('Q'))
date=pd.period_range('1/1/2018', periods=20, freq='D')
tsper_series=pd.Series(range(20),index=date)
print(tsper_series.index.asfreq('Q'))
""" PeriodIndex(['2018Q1', '2018Q1', '2018Q1', '2018Q1', '2018Q1', '2018Q1', '2018Q1', '2018Q1', '2018Q1', '2018Q1', '2018Q1', '2018Q1', '2018Q1', '2018Q1', '2018Q1', '2018Q1', '2018Q1', '2018Q1', '2018Q1', '2018Q1'], dtype='period[Q-DEC]', freq='Q-DEC') """
#resample 按季度統計並顯示
print(tsdat_series.resample('Q').sum().to_period('Q'))
""" 2018Q1 190 Freq: Q-DEC, dtype: int64 """
#groupby 按周進行彙總求平均值
print(tsdat_series.groupby(lambda x:x.weekday).mean())
""" 0 7.0 1 8.0 2 9.0 3 10.0 4 11.0 5 12.0 6 9.5 dtype: float64 """
複製代碼
以前也跟你們有講過pandas的相關教程,你們有不理解的地方可回過頭複習一下,更多的Python學習教程和Python學習路線會繼續跟你們分享!