1.主要講的是當index存在重複值的時候, 能夠用 obj.index.is_unique 判斷,獲取重複index的值的時候obj['a'],返回的全部重複的index的值。
2.dataframe 經常使用的算術統計函數,https://chrisalbon.com/python/pandas_dataframe_descriptive_stats.html
函數list 參見, python 數據分析, P139 ,table 5-10
3.import pandas_datareader as web 能夠採集股票數據做爲統計樣本,支持的web及使用方式,見下表。
https://pandas-datareader.readthedocs.io/en/latest/
(1)series 和 series
returns.MSFT.corr(returns.IBM) 相關係數
returns.MSFT.cov(returns.IBM) 協方差
(2)frame 自相關
returns.corr()
returns.cov()
(3)frame 和 series 相關
returns.corrwith(returns.IBM)
(4)frame 和 frame 相關
returns.corrwith(volumn)
import numpy as np
from pandas import DataFrame , Series
print ("Axis indexes with duplicate values")
obj=Series(range(5),index =['a','a','b','b','c'])
print("obj is \n", obj)
print("obj.index.is_unique is ",obj.index.is_unique)
print("obj['a'] is \n", obj['a'])
print("obj['b'] is \n",obj['b'])
df=DataFrame(np.random.randn(4,3),index=['a','a','b','b'])
print("df is \n",df)
print("df.ix['b'] is \n ",df.ix['b'])
df = DataFrame([[1.4, np.nan], [7.1, -4.5], [np.nan, np.nan], [0.75, -1.3]],index=['a', 'b', 'c', 'd'],columns=['one','two'])
print("df is \n",df)
print("Calling dafaframe's sum method returns a Series containing column sums")
print("df.sum() is \n",df.sum())
print("passing axis=1 sums over the rows instead")
print("df.sum(axis=1) \n", df.sum(axis=1))
print("NA values are excluded unless the entire slice is NA.this can be disabled using the skipna option")
print("df.mean(axis=1,skipna=False \n ",df.mean(axis=1,skipna=False))
print("df.idxmax() return indirect statistics like the index value where the maximum values are attained \n",df.idxmax())
print("df.cumsum() return cumulative sum of values \n",df.cumsum())
print("df.describe() return multiple summary statistics in one shot \n",df.describe())
obj=Series(['a','a','b','c']*4)
print("obj is \n",obj)
print("obj.describe() return alternate summary statistics \n",obj.describe())
import pandas_datareader as webhttps://pandas-datareader.readthedocs.io/en/latest/all_data={}for ticker in ['AAPL','IBM', 'MSFT', 'GOOG']: all_data[ticker] = web.get_data_google(ticker,'1/1/2016','1/1/2017')print("all data is \n ", all_data)price = DataFrame({tic: data['Close'] for tic, data in all_data.items()})volume = DataFrame({tic: data['Volume'] for tic, data in all_data.items()})returns = price.pct_change()print("returns.tail()\n",returns.tail())print("returns.MSFT.corr(returns.IBM) \n",returns.MSFT.corr(returns.IBM))print("returns.MSFT.cov(returns.IBM) \n", returns.MSFT.cov(returns.IBM))print("returns.corr() \n", returns.corr())print("returns.cov() \n", returns.cov())print("returns.corrwith(returns.IBM) \n",returns.corrwith(returns.IBM))print("volumn is \n",volume)print("returns.corrwith(volumn) \n",returns.corrwith(volume))