python panda::dataframe經常使用操做

一、條件查詢:

result = df.query("((a==1 and b=="x") or c/d < 3))"
print result

 

二、遍歷

a)根據索引遍歷spa

for  idx in df.index:
  dd = df.loc[idx]
  print(dd)

 

b)按行遍歷code

for  i in range(0, len(df)):
  dd = df.iloc[i]
  print(dd)

 

 

三、對某列求均值

# 對「volume」列求均值
result = df["volume"].mean()
print(result)

 

四、按照指定列排序

result_df = df.sort_values(by="sales" , ascending=False) 
print(result_df)

注意,以上排序,非inplaceblog

 

五、提取特定行/列

若有數據:排序

        code          update_time  last_price  open_price     ...      option_gamma  option_vega  option_theta  option_rho
42  HK.02018  2019-04-26 16:08:05       53.70       52.70     ...               NaN          NaN           NaN         NaN
15  HK.00151  2019-04-26 16:08:33        6.17        6.21     ...               NaN          NaN           NaN         NaN
14  HK.00101  2019-04-26 16:08:05       18.22       18.26     ...               NaN          NaN           NaN         NaN

 

a)按照索引提取索引

提取索引爲42的行和全部列:io

result = df.loc[42, :]
print(result)

result:ast

        code          update_time  last_price  open_price     ...      option_gamma  option_vega  option_theta  option_rho
42  HK.02018  2019-04-26 16:08:05       53.70       52.70     ...               NaN          NaN           NaN         NaN

 

提取索引爲15,42的數據,  只須要code和update_time兩列:class

result = df.loc[[15,42], [0,2]]
print(result)

result:date

        code          update_time  
42  HK.02018  2019-04-26 16:08:05 
15  HK.00151  2019-04-26 16:08:33 

 

b)按行提取遍歷

提取第2行的數據, 全部列:

result = df.iloc[1, :]
print(result)

result:

       code          update_time  last_price  open_price     ...      option_gamma  option_vega  option_theta  option_rho
15  HK.00151  2019-04-26 16:08:33        6.17        6.21     ...               NaN          NaN           NaN         NaN

 

提取前2行的數據, 全部列:

result = df.iloc[0:2, :]
print(result)

result:

        code          update_time  last_price  open_price     ...      option_gamma  option_vega  option_theta  option_rho
42  HK.02018  2019-04-26 16:08:05       53.70       52.70     ...               NaN          NaN           NaN         NaN
15  HK.00151  2019-04-26 16:08:33        6.17        6.21     ...               NaN          NaN           NaN         NaN

 

提取一、3行的數據, 只須要code和update_time兩列:

result = df.iloc[[0,2], 0:2]
print(result)

result:

        code          update_time 
42  HK.02018  2019-04-26 16:08:05
14  HK.00101  2019-04-26 16:08:05

 

六、複製列

df['col']=df['col1']+df['col2']

 

將col1和col2相除的結果加1,放入新的newcol列:

df['newcol']=df['col1']/df['col2']+1

 

七、重命名列

new_df = df.rename(columns={'oldName1': 'newName1', 'oldName2': 'newName2'})
print(new_df)
# inplace模式
df.rename(columns={'oldName1': 'newName1', 'oldName2': 'newName2'}, inplace=True)
print(df)
相關文章
相關標籤/搜索