pandas 數據結構的基本功能

操做Series和DataFrame中的數據的經常使用方法:python

導入python庫:數據結構

import numpy as np
import pandas as pd

測試的數據結構:測試

Series:spa

>>> obj = pd.Series([4.5, 7.2, -5.3, 3.6], index=['d', 'b', 'a', 'c'])
>>> obj
d    4.5
b    7.2
a   -5.3
c    3.6
dtype: float64

DataFrame:code

>>> data = {
...     'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'],
...     'year': [2000, 2001, 2002, 2001, 2002],
...     'pop': [1.5, 1.7, 3.6, 2.4, 2.9]
... }
>>> frame = pd.DataFrame(data)
>>> frame
   pop   state  year
0  1.5    Ohio  2000
1  1.7    Ohio  2001
2  3.6    Ohio  2002
3  2.4  Nevada  2001
4  2.9  Nevada  2002

 

從新索引 reindex():對象

  建立一個適應新索引的新對象:blog

  對於Series來講,只有列索引(數據標籤):索引

  調用該Series的reindex將會根據新索引進行重排。若是某個索引值當前不存在,就引入缺失值get

  例:將 ['d', 'b', 'a', 'c'] 替換爲 ['a', 'b', 'c', 'd', 'e']   e不存在 ,自動引入缺失值NaN,能夠使用fill_value手動選擇缺失值pandas

>>> obj.reindex(['a', 'b', 'c', 'd', 'e'])
a   -5.3
b    7.2
c    3.6
d    4.5
e    NaN
dtype: float64
>>> obj.reindex(['a', 'b', 'c', 'd', 'e'],fill_value=666)
a     -5.3
b      7.2
c      3.6
d      4.5
e    666.0
dtype: float64

  對於DataFrame來講,既有行索引也有列索引,默認是行索引,但也可同時進行從新索引(使用方法看例子和輸出結果)。

  例:須要注意的是,int和str的區別,默認的索引類型是int型,

>>> frame
   pop   state  year
0  1.5    Ohio  2000
1  1.7    Ohio  2001
2  3.6    Ohio  2002
3  2.4  Nevada  2001
4  2.9  Nevada  2002
>>> frame.reindex([4,3,2,1,0])
   pop   state  year
4  2.9  Nevada  2002
3  2.4  Nevada  2001
2  3.6    Ohio  2002
1  1.7    Ohio  2001
0  1.5    Ohio  2000
>>> frame.reindex(['4','3','2','1','0'])
   pop state  year
4  NaN   NaN   NaN
3  NaN   NaN   NaN
2  NaN   NaN   NaN
1  NaN   NaN   NaN
0  NaN   NaN   NaN
>>> frame.reindex(['a', 'b', 'c', 'd', 'e'])
   pop state  year
a  NaN   NaN   NaN
b  NaN   NaN   NaN
c  NaN   NaN   NaN
d  NaN   NaN   NaN
e  NaN   NaN   NaN
>>> frame.reindex([4,3,2,1,0],columns=['year', 'state', 'pop'])
   year   state  pop
4  2002  Nevada  2.9
3  2001  Nevada  2.4
2  2002    Ohio  3.6
1  2001    Ohio  1.7
0  2000    Ohio  1.5
>>> frame.reindex(index=[4,3,2,1,0],columns=['year', 'state', 'pop'])
   year   state  pop
4  2002  Nevada  2.9
3  2001  Nevada  2.4
2  2002    Ohio  3.6
1  2001    Ohio  1.7
0  2000    Ohio  1.5

刪除指定行/列的項:

  對於Series來講,只有列的概念:

>>> obj
d    4.5
b    7.2
a   -5.3
c    3.6
dtype: float64
>>> obj.drop(['d','a'])
b    7.2
c    3.6
dtype: float64

  對於DataFrame來講,既有行也有列,默認是刪除行,刪除列時設置axis爲1, 不然會報錯(使用方法看例子和輸出結果)。

   

>>> frame
   pop   state  year
0  1.5    Ohio  2000
1  1.7    Ohio  2001
2  3.6    Ohio  2002
3  2.4  Nevada  2001
4  2.9  Nevada  2002
>>> frame.drop([0,1])
   pop   state  year
2  3.6    Ohio  2002
3  2.4  Nevada  2001
4  2.9  Nevada  2002
>>> frame.drop(['pop'])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.6/site-packages/pandas/core/generic.py", line 2530, in drop
    obj = obj._drop_axis(labels, axis, level=level, errors=errors)
  File "/usr/local/lib/python3.6/site-packages/pandas/core/generic.py", line 2562, in _drop_axis
    new_axis = axis.drop(labels, errors=errors)
  File "/usr/local/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 3744, in drop
    labels[mask])
ValueError: labels ['pop'] not contained in axis
>>> frame.drop(['pop'],axis=1)
    state  year
0    Ohio  2000
1    Ohio  2001
2    Ohio  2002
3  Nevada  2001
4  Nevada  2002

 

索引 ,選取,過濾:

  Series:

    選取:

      series的選取相似於list;不一樣的是 series既能夠使用數字索引選取,也能夠使用自定標籤索引選取。

>>> obj = pd.Series([4.5, 7.2, -5.3, 3.6], index=['d', 'b', 'a', 'c'])
>>> obj
d    4.5
b    7.2
a   -5.3
c    3.6
dtype: float64
>>> obj['d']
4.5
>>> obj[0]
4.5

    賦值:賦值:

      與選取相似。

>>> obj['d'] = 0
>>> obj['d']
0.0
>>> obj
d    0.0
b    7.2
a   -5.3
c    3.6
dtype: float64
>>> obj[0] = 88
>>> obj
d    88.0
b     7.2
a    -5.3
c     3.6
dtype: float64

  DataFrame:

    選取:

      DataFrame默認的索引指的是列索引,而且只能使用列標籤索引,不能使用數字索引會報錯(返回Series對象)。

      DataFrame能夠使用切片功能來進行 行索引選取(返回DataFrame對象)。

      DataFrame也能夠使用DataFrame.ix[val]來進行具體選取(返回Series對象)。使用方法:frame.ix[0]返回第一行的Series對象。frame.ix[1,['year']]返回第二行,第year列的Series對象。

例:列索引

>>> frame
   year   state  pop
0  2000    Ohio  1.5
1  2001    Ohio  1.7
2  2002    Ohio  3.6
3  2001  Nevada  2.4
4  2002  Nevada  2.9
>>> frame['year']
0    2000
1    2001
2    2002
3    2001
4    2002
Name: year, dtype: int64
>>> frame[0]
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2525, in get_loc
    return self._engine.get_loc(key)
  File "pandas/_libs/index.pyx", line 117, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 139, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1265, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1273, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 0

例:行索引

>>> frame
   year   state  pop
0  2000    Ohio  1.5
1  2001    Ohio  1.7
2  2002    Ohio  3.6
3  2001  Nevada  2.4
4  2002  Nevada  2.9
>>> frame[0:2]
   year state  pop
0  2000  Ohio  1.5
1  2001  Ohio  1.7
>>> frame[0:1]
   year state  pop
0  2000  Ohio  1.5
>>> frame.ix[0]
year     2000
state    Ohio
pop       1.5
Name: 0, dtype: object

例:ix索引

>>> frame.ix[0]
year     2000
state    Ohio
pop       1.5
Name: 0, dtype: object
>>> frame.ix[1,['year']]
year    2001
Name: 1, dtype: object

例:返回格式

>>> type(frame['year'])
<class 'pandas.core.series.Series'>


>>> type(frame[0:2])
<class 'pandas.core.frame.DataFrame'>


>>> type(frame.ix[0])
<class 'pandas.core.series.Series'>

>>> type(frame.ix[0,['year']])
<class 'pandas.core.series.Series'>

     賦值:

例:DataFrame賦值

#frame
>>> frame
   year   state  pop
0  2000    Ohio  1.5
1  2001    Ohio  1.7
2  2002    Ohio  3.6
3  2001  Nevada  2.4
4  2002  Nevada  2.9
#對frame列賦值非list是會對整列賦值
>>> frame['year'] = 5
>>> frame
   year   state  pop
0     5    Ohio  1.5
1     5    Ohio  1.7
2     5    Ohio  3.6
3     5  Nevada  2.4
4     5  Nevada  2.9
>>> frame['year'] = 'test'
>>> frame
   year   state  pop
0  test    Ohio  1.5
1  test    Ohio  1.7
2  test    Ohio  3.6
3  test  Nevada  2.4
4  test  Nevada  2.9

#對frame列賦值進行list整列賦值是必須保證list長度等於行的長度。
>>> frame['year'] = range(5)
>>> frame
   year   state  pop
0     0    Ohio  1.5
1     1    Ohio  1.7
2     2    Ohio  3.6
3     3  Nevada  2.4
4     4  Nevada  2.9
>>> frame['year'] = range(4)
Traceback (most recent call last):
ValueError: Length of values does not match length of index



#行賦值
>>> frame.ix[0] = 5
>>> frame
   year   state  pop
0     5       5  5.0
1     1    Ohio  1.7
2     2    Ohio  3.6
3     3  Nevada  2.4
4     4  Nevada  2.9

 

 

 

 算術運算:

相關文章
相關標籤/搜索