繼上一篇文章,這篇文章介紹一下Pandas模塊裏面的DataFrame結構html
DataFrame unifies two or more Series into a single data structure.Each Series then represents a named column of the DataFrame, and instead of each column having its own index, the DataFrame provides a single index and the data in all columns is aligned to the master index of the DataFrame.
這段話的意思是,DataFrame提供的是一個相似表的結構,由多個Series組成,而Series在DataFrame中叫columns(理解有錯請指出,(逃~
數組
pd.DataFrame()
參數:
一、二維array;
二、Series 列表;
三、value爲Series的字典;app
import pandas as pd import numpy as np s1=np.array([1,2,3,4]) s2=np.array([5,6,7,8]) df=pd.DataFrame([s1,s2]) print df
import pandas as pd import numpy as np s1=pd.Series(np.array([1,2,3,4])) s2=pd.Series(np.array([5,6,7,8])) df=pd.DataFrame([s1,s2]) print df
import pandas as pd import numpy as np s1=pd.Series(np.array([1,2,3,4])) s2=pd.Series(np.array([5,6,7,8])) df=pd.DataFrame({"a":s1,"b":s2}); print df
注:若建立使用的參數中,array、Series長度不同時,對應index的value值若不存在則爲NaNide
df=pd.DataFrame({"A":[1,2,3,4],"B":[5,6,7,8],"C":[1,1,1,1]}) df.ix[df.A>1,'B']= -1 print df
df.ix[條件,then操做區域]spa
df=pd.DataFrame({"A":[1,2,3,4],"B":[5,6,7,8],"C":[1,1,1,1]}) df["then"]=np.where(df.A<3,1,0) print df
np.where(條件,then,else)code
df=pd.DataFrame({"A":[1,2,3,4],"B":[5,6,7,8],"C":[1,1,1,1]}) df=df[df.A>=2] print df
df=pd.DataFrame({"A":[1,2,3,4],"B":[5,6,7,8],"C":[1,1,1,1]}) df=df.loc[df.A>2] print df
(還有不少種方法就不一一列舉了)htm
df = pd.DataFrame({'animal': 'cat dog cat fish dog cat cat'.split(), 'size': list('SSMMMLL'), 'weight': [8, 10, 11, 1, 20, 12, 12], 'adult' : [False] * 5 + [True] * 2}); #列出動物中weight最大的對應size group=df.groupby("animal").apply(lambda subf: subf['size'][subf['weight'].idxmax()]) print group
e.2 使用get_group 取出其中一分組get
df = pd.DataFrame({'animal': 'cat dog cat fish dog cat cat'.split(), 'size': list('SSMMMLL'), 'weight': [8, 10, 11, 1, 20, 12, 12], 'adult' : [False] * 5 + [True] * 2}); group=df.groupby("animal") cat=group.get_group("cat") print cat
http://pandas.pydata.org/pandas-docs/stable/cookbook.htmlpandas