python科學計算-pandas

時間 2019-11-21

原文原文鏈接

Pandas是基於Numpy開發出的,專門用於數據分析的開源Python庫。

import numpy as np
import pandas as pd

# 數據準備
l = list(range(5))
np01 = np.array(l)
np02 = np.arange(10)
d01 = {'Michael': 95, 'Bob': 75, 'Tracy': 85}
d02 = {
    'a': [1, 2, 3, 4],
    'b': [5, 6, 7, 8],
    'c': [9, 10, 11, 12],
    'd': [13, 14, 15, 16]
}

# 1.一維數據結構Series
## 1.1經過list建立
s_l = pd.Series(l)
s_l_index = pd.Series(l,index=['bj','sh','gz','sz','qd'])

## 1.2經過numpy的ndarray數據結構建立
s_np01 = pd.Series(np01)
s_np02 = pd.Series(np02)

## 1.3經過字典建立
s_d01 = pd.Series(d01)

# 2.二維數據結構DataFrame
## 2.1使用dict建立
df_d02 = pd.DataFrame(d02, index=['one','two','three','four'])

# 2.2從文件讀取
df_csv = pd.read_csv('./price.csv') #假定目標文件夾下有price.csv文件

# 2.3使用二維numpy數據結構建立
dates = list('abcdef')  #或者 dates = pd.date_range('20130101',periods=6)
df = pd.DataFrame(np.random.randn(6,4),index=dates,columns=list('ABCD'))

# 3. DataFrame的屬性
shape = df.shape
ndim = df.ndim
index = df.index
columns = df.columns
values = df.values

# 4. DataFrame的索引
# 4.1直接選取 df[]
a1 = df[1:3] #左包右不包，下標從0開始，此例爲選取第二、3行
a2 = df[:3]  #選取前3行

a3 = df['A'] #選取A列
a4 = df[['A','C']]   #選取A、C列

a5 = df[df['A']>0]   #選取A列大於0的行

# 4.2標籤選取df.loc[]
a6 = df.loc['a', :]
a7 = df.loc['a','A']
a8 = df.loc['a':'d', :]
a9 = df.loc[['a','d'], ['B','C']]

# 4.3位置選取df.iloc[]
a10 = df.iloc[1:4,1:4]  #選擇第一、二、3行，第一、二、3列
a11 = df.iloc[[1,4],1:3]    #選擇第一、4行，第一、2列

# 5. Panel
#聚寬平臺獲取的panel數據，行標爲時間，列標爲各種價格，還有一個股票代碼標
panel = get_price(['000001.XSHE','000002.XSHE'],start_date='2016-07-12', end_date='2016-07-15', frequency='daily', fields=['open','high','low','close'])