Pandas 數據處理 | Datetime 在 Pandas 中的一些用法！

時間 2020-08-02

原文原文鏈接

Datatime 是 Python 中一種時間數據類型，對於不一樣時間格式之間的轉換是比較方便的，而在 Pandas 中也一樣支持 DataTime 數據機制，能夠藉助它實現許多有用的功能，例如html

1，函數to_datetime() 將數據列表中的 Series 列轉化爲 datetime 類型，python

#Convert the type to datetime
apple.Date = pd.to_datetime(apple.Date)
apple['Date'].head()

#
0   2014-07-08
1   2014-07-07
2   2014-07-03
3   2014-07-02
4   2014-07-01
Name: Date, dtype: datetime64[ns]

2，DataFrame.resample(freq)，將數據基於時間列以 freq 做爲頻度對全局數據作重採樣，計算出分段數據和、均值、方差等指標；下面例子中原數據的索引是 Datatime 數據格式，以月爲時間單位求出各列數據的平均值git

# Resample the data based the offset,get the mean of data
# BM — bussiness month end frequency

apple_month = apple.resample("BM").mean()
apple_month.head()

下面將根據幾道練習題，簡單介紹一下 Pandas 是怎麼處理 DataFrame 數據的github

1 , to_datetime() 與 resample() 操做

1.1，讀取數據web

url = "https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/09_Time_Series/Apple_Stock/appl_1980_2014.csv"
apple =pd.read_csv(url)
apple.head()

能夠看到，時間在 Date 這一列數據中，但不是標準的 datetime 格式，須要格式處理一下app

1.2，datetime 格式轉換ide

#Convert the type to datetime
apple.Date = pd.to_datetime(apple.Date)
apple['Date'].head()

**1.3，將 Date 列設爲 index **函數

apple = apple.set_index("Date")
# Set Index
apple.head()

Date 雖然已經設爲 index，可是時間排列卻並不清晰，datetime 數據能夠直接排序這裏用 sort_index(ascending = True) 完成排序學習

1.4，對索引進行排序ui

# Sort The DataFrame based on Date columns
apple.sort_index(ascending = True).head()

1.5，以月爲單位對數據採樣並獲取mean()

# Resample the data based the offset,get the mean of data
# BM — bussiness month end frequency

apple_month = apple.resample("BM").mean()
apple_month.head()

BM 全稱 Bussiness Month，是商業月的意思，在 Pandas 中稱爲 DataOffset，除了月以外，還提供年、日、秒、小時、分..等做爲採樣單位，固然也能夠自定義

關於 Data Offset 具體詳細內容可參考：https://pandas.pydata.org/docs/user_guide/timeseries.html#timeseries-offset-aliases；

1.6，計算時間列表中最先日期與最晚日期相差天數

(apple.index.max()-apple.index.min()).days

#
12261

2，統計近兩年蘋果、特斯拉、IBM、LINKD各公司股價

2.1，pandas_datareader 獲取數據

import pandas as pd
from pandas_datareader import data as web
import datetime as dt

start = dt.datetime(2019,1,1)
end = dt.datetime.today()
stocks = ['APPLE','TSLA','IBM','LNKD']
df = web.DataReader(stocks,'yahoo',start,end)
df

使用以前請確保pandas_datareader 包已經安裝成功，這個包幫助咱們直接經過爬蟲獲取近兩年的各公司的股票信息，後面 start，end 兩個 datetime 時間用於限制時間

結果顯示彷佛這種方法獲取不到到的蘋果和LINKD 的股價(但並不影響，由於這裏主要是學習一下 datetime 在 Pandas 的用法)

2.2，獲取股票數據

vol = df['Volume']
vol

**2.3，建立新列，表示 week、year **

後面作聚類分析，聚類基準選擇的是 week、year , 所以須要提早建立好兩列(week，year)數據

vol['week'] = vol.index.week
vol['year'] = vol.index.year
vol.head()

2.4，groupby 聚類分組(先 week ,後 year)

week = vol.groupby(['week','year']).sum()

week.head()

這樣就能夠很清晰地比對，2019-2020年對於每一週來講各公司股票的總值變化啦

好了，以上就是本篇文章的全部內容啦；最後，感謝你們的閱讀！

Reference:

1,https://pandas.pydata.org/docs/user_guide/timeseries.html#timeseries-offset-aliases

2，https://github.com/guipsamora/pandas_exercises/blob/master/09_Time_Series/Getting_Financial_Data

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。