操做Series和DataFrame中的數據的經常使用方法:python
導入python庫:數據結構
import numpy as np import pandas as pd
測試的數據結構:測試
Series:spa
>>> obj = pd.Series([4.5, 7.2, -5.3, 3.6], index=['d', 'b', 'a', 'c']) >>> obj d 4.5 b 7.2 a -5.3 c 3.6 dtype: float64
DataFrame:code
>>> data = { ... 'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'], ... 'year': [2000, 2001, 2002, 2001, 2002], ... 'pop': [1.5, 1.7, 3.6, 2.4, 2.9] ... } >>> frame = pd.DataFrame(data) >>> frame pop state year 0 1.5 Ohio 2000 1 1.7 Ohio 2001 2 3.6 Ohio 2002 3 2.4 Nevada 2001 4 2.9 Nevada 2002
從新索引 reindex():對象
建立一個適應新索引的新對象:blog
對於Series來講,只有列索引(數據標籤):索引
調用該Series的reindex將會根據新索引進行重排。若是某個索引值當前不存在,就引入缺失值get
例:將 ['d', 'b', 'a', 'c'] 替換爲 ['a', 'b', 'c', 'd', 'e'] e不存在 ,自動引入缺失值NaN,能夠使用fill_value手動選擇缺失值pandas
>>> obj.reindex(['a', 'b', 'c', 'd', 'e']) a -5.3 b 7.2 c 3.6 d 4.5 e NaN dtype: float64 >>> obj.reindex(['a', 'b', 'c', 'd', 'e'],fill_value=666) a -5.3 b 7.2 c 3.6 d 4.5 e 666.0 dtype: float64
對於DataFrame來講,既有行索引也有列索引,默認是行索引,但也可同時進行從新索引(使用方法看例子和輸出結果)。
例:須要注意的是,int和str的區別,默認的索引類型是int型,
>>> frame pop state year 0 1.5 Ohio 2000 1 1.7 Ohio 2001 2 3.6 Ohio 2002 3 2.4 Nevada 2001 4 2.9 Nevada 2002 >>> frame.reindex([4,3,2,1,0]) pop state year 4 2.9 Nevada 2002 3 2.4 Nevada 2001 2 3.6 Ohio 2002 1 1.7 Ohio 2001 0 1.5 Ohio 2000 >>> frame.reindex(['4','3','2','1','0']) pop state year 4 NaN NaN NaN 3 NaN NaN NaN 2 NaN NaN NaN 1 NaN NaN NaN 0 NaN NaN NaN >>> frame.reindex(['a', 'b', 'c', 'd', 'e']) pop state year a NaN NaN NaN b NaN NaN NaN c NaN NaN NaN d NaN NaN NaN e NaN NaN NaN >>> frame.reindex([4,3,2,1,0],columns=['year', 'state', 'pop']) year state pop 4 2002 Nevada 2.9 3 2001 Nevada 2.4 2 2002 Ohio 3.6 1 2001 Ohio 1.7 0 2000 Ohio 1.5 >>> frame.reindex(index=[4,3,2,1,0],columns=['year', 'state', 'pop']) year state pop 4 2002 Nevada 2.9 3 2001 Nevada 2.4 2 2002 Ohio 3.6 1 2001 Ohio 1.7 0 2000 Ohio 1.5
刪除指定行/列的項:
對於Series來講,只有列的概念:
>>> obj d 4.5 b 7.2 a -5.3 c 3.6 dtype: float64 >>> obj.drop(['d','a']) b 7.2 c 3.6 dtype: float64
對於DataFrame來講,既有行也有列,默認是刪除行,刪除列時設置axis爲1, 不然會報錯(使用方法看例子和輸出結果)。
>>> frame pop state year 0 1.5 Ohio 2000 1 1.7 Ohio 2001 2 3.6 Ohio 2002 3 2.4 Nevada 2001 4 2.9 Nevada 2002 >>> frame.drop([0,1]) pop state year 2 3.6 Ohio 2002 3 2.4 Nevada 2001 4 2.9 Nevada 2002 >>> frame.drop(['pop']) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python3.6/site-packages/pandas/core/generic.py", line 2530, in drop obj = obj._drop_axis(labels, axis, level=level, errors=errors) File "/usr/local/lib/python3.6/site-packages/pandas/core/generic.py", line 2562, in _drop_axis new_axis = axis.drop(labels, errors=errors) File "/usr/local/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 3744, in drop labels[mask]) ValueError: labels ['pop'] not contained in axis >>> frame.drop(['pop'],axis=1) state year 0 Ohio 2000 1 Ohio 2001 2 Ohio 2002 3 Nevada 2001 4 Nevada 2002
索引 ,選取,過濾:
Series:
選取:
series的選取相似於list;不一樣的是 series既能夠使用數字索引選取,也能夠使用自定標籤索引選取。
>>> obj = pd.Series([4.5, 7.2, -5.3, 3.6], index=['d', 'b', 'a', 'c']) >>> obj d 4.5 b 7.2 a -5.3 c 3.6 dtype: float64 >>> obj['d'] 4.5 >>> obj[0] 4.5
賦值:賦值:
與選取相似。
>>> obj['d'] = 0 >>> obj['d'] 0.0 >>> obj d 0.0 b 7.2 a -5.3 c 3.6 dtype: float64 >>> obj[0] = 88 >>> obj d 88.0 b 7.2 a -5.3 c 3.6 dtype: float64
DataFrame:
選取:
DataFrame默認的索引指的是列索引,而且只能使用列標籤索引,不能使用數字索引會報錯(返回Series對象)。
DataFrame能夠使用切片功能來進行 行索引選取(返回DataFrame對象)。
DataFrame也能夠使用DataFrame.ix[val]來進行具體選取(返回Series對象)。使用方法:frame.ix[0]返回第一行的Series對象。frame.ix[1,['year']]返回第二行,第year列的Series對象。
例:列索引
>>> frame year state pop 0 2000 Ohio 1.5 1 2001 Ohio 1.7 2 2002 Ohio 3.6 3 2001 Nevada 2.4 4 2002 Nevada 2.9 >>> frame['year'] 0 2000 1 2001 2 2002 3 2001 4 2002 Name: year, dtype: int64 >>> frame[0] Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2525, in get_loc return self._engine.get_loc(key) File "pandas/_libs/index.pyx", line 117, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 139, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 1265, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 1273, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 0
例:行索引
>>> frame year state pop 0 2000 Ohio 1.5 1 2001 Ohio 1.7 2 2002 Ohio 3.6 3 2001 Nevada 2.4 4 2002 Nevada 2.9 >>> frame[0:2] year state pop 0 2000 Ohio 1.5 1 2001 Ohio 1.7 >>> frame[0:1] year state pop 0 2000 Ohio 1.5 >>> frame.ix[0] year 2000 state Ohio pop 1.5
Name: 0, dtype: object
例:ix索引
>>> frame.ix[0] year 2000 state Ohio pop 1.5 Name: 0, dtype: object >>> frame.ix[1,['year']] year 2001 Name: 1, dtype: object
例:返回格式
>>> type(frame['year']) <class 'pandas.core.series.Series'> >>> type(frame[0:2]) <class 'pandas.core.frame.DataFrame'> >>> type(frame.ix[0]) <class 'pandas.core.series.Series'> >>> type(frame.ix[0,['year']]) <class 'pandas.core.series.Series'>
賦值:
例:DataFrame賦值
#frame >>> frame year state pop 0 2000 Ohio 1.5 1 2001 Ohio 1.7 2 2002 Ohio 3.6 3 2001 Nevada 2.4 4 2002 Nevada 2.9 #對frame列賦值非list是會對整列賦值 >>> frame['year'] = 5 >>> frame year state pop 0 5 Ohio 1.5 1 5 Ohio 1.7 2 5 Ohio 3.6 3 5 Nevada 2.4 4 5 Nevada 2.9 >>> frame['year'] = 'test' >>> frame year state pop 0 test Ohio 1.5 1 test Ohio 1.7 2 test Ohio 3.6 3 test Nevada 2.4 4 test Nevada 2.9 #對frame列賦值進行list整列賦值是必須保證list長度等於行的長度。 >>> frame['year'] = range(5) >>> frame year state pop 0 0 Ohio 1.5 1 1 Ohio 1.7 2 2 Ohio 3.6 3 3 Nevada 2.4 4 4 Nevada 2.9 >>> frame['year'] = range(4) Traceback (most recent call last): ValueError: Length of values does not match length of index #行賦值 >>> frame.ix[0] = 5 >>> frame year state pop 0 5 5 5.0 1 1 Ohio 1.7 2 2 Ohio 3.6 3 3 Nevada 2.4 4 4 Nevada 2.9
算術運算: