Pandas | 08 重建索引

從新索引會更改DataFrame的行標籤和列標籤。shell

能夠經過索引來實現多個操做:app

  • 從新排序現有數據以匹配一組新的標籤。
  • 在沒有標籤數據的標籤位置插入缺失值(NA)標記。

 

import pandas as pd
import numpy as np

N=20

df = pd.DataFrame({
   'A': pd.date_range(start='2016-01-01',periods=N,freq='D'),
   'x': np.linspace(0,stop=N-1,num=N),
   'y': np.random.rand(N),
   'C': np.random.choice(['Low','Medium','High'],N).tolist(),
   'D': np.random.normal(100, 10, size=(N)).tolist()
})
print(df)
print('\n')

#reindex the DataFrame
df_reindexed = df.reindex(index=[0,2,5], columns=['A', 'C', 'B'])      # 將符合的提取出來了 print (df_reindexed)

輸出結果:dom

 

            A     x         y       C           D
0 2016-01-01 0.0 0.910736 Low 105.308796
1 2016-01-02 1.0 0.570500 Low 91.024238
2 2016-01-03 2.0 0.930298 High 112.359308
3 2016-01-04 3.0 0.251355 Medium 106.155192
4 2016-01-05 4.0 0.579235 Low 90.079651
5 2016-01-06 5.0 0.623852 High 110.592218
6 2016-01-07 6.0 0.621130 Medium 96.222673
7 2016-01-08 7.0 0.989647 Medium 92.253444
8 2016-01-09 8.0 0.506653 Medium 102.601417
9 2016-01-10 9.0 0.099482 Low 97.721659
10 2016-01-11 10.0 0.254750 Medium 75.502131
11 2016-01-12 11.0 0.543014 Medium 88.895951
12 2016-01-13 12.0 0.911283 Medium 79.526056
13 2016-01-14 13.0 0.255296 Low 92.248119
14 2016-01-15 14.0 0.205302 Low 103.301747
15 2016-01-16 15.0 0.246407 Low 107.158250
16 2016-01-17 16.0 0.202039 High 96.411279
17 2016-01-18 17.0 0.734529 High 88.177103
18 2016-01-19 18.0 0.275703 Medium 82.885365
19 2016-01-20 19.0 0.084449 High 98.803349


A C B
0 2016-01-01 Low NaN
2 2016-01-03 High NaN
5 2016-01-06 High NaN
 

重建索引與其餘對象對齊

有時可能但願採起一個對象和從新索引,其軸被標記爲與另外一個對象相同。 考慮下面的例子來理解這一點。函數

 

import pandas as pd
import numpy as np

df1 = pd.DataFrame(np.random.randn(10,3),columns=['col1','col2','col3'])
df2 = pd.DataFrame(np.random.randn(7,3),columns=['col1','col2','col3'])
print(df1)
print(df2)

df1 = df1.reindex_like(df2)                 # 在df1中,把和df2同樣的標籤行提取出來
print(df1)

輸出結果:spa

 

       col1      col2      col3
0 0.989992 0.543438 -2.311684
1 -0.704759 -0.555589 -0.570049
2 -0.658263 -0.605368 -0.025520
3 1.533949 -0.936191 -0.071094
4 -0.729812 -0.339670 0.468700
5 -0.164076 0.075098 0.654549
6 -0.491034 1.096496 -0.166250
7 0.230918 -1.561643 1.501326
8 0.703623 -0.407445 -0.792633
9 0.340817 -1.132127 -0.695821

col1 col2 col3
0 0.144380 0.295776 -0.743097
1 -1.597853 0.029949 -1.605222
2 0.626728 -0.077997 -0.167353
3 0.466008 0.695279 -0.047752
4 -1.088821 -0.456605 1.192847
5 -0.020330 1.616297 -0.368196
6 -1.038790 -1.264894 0.059060

col1 col2 col3
0 0.989992 0.543438 -2.311684
1 -0.704759 -0.555589 -0.570049
2 -0.658263 -0.605368 -0.025520
3 1.533949 -0.936191 -0.071094
4 -0.729812 -0.339670 0.468700
5 -0.164076 0.075098 0.654549
6 -0.491034 1.096496 -0.166250

注意 - 在這裏,df1數據幀(DataFrame)被更改並從新編號,如df2 列名稱應該匹配,不然將爲整個列標籤添加NANrest

填充時從新加註

reindex()採用可選參數方法,它是一個填充方法,其值以下:code

  • pad/ffill - 向前填充值
  • bfill/backfill - 向後填充值
  • nearest - 從最近的索引值填充
import pandas as pd
import numpy as np

df1 = pd.DataFrame(np.random.randn(6,3),columns=['col1','col2','col3'])
df2 = pd.DataFrame(np.random.randn(2,3),columns=['col1','col2','col3'])

# Padding NAN's
print(df2.reindex_like(df1))
print('\n')

# Now Fill the NAN's with preceding Values
print ("Data Frame with Forward Fill:")
print (df2.reindex_like(df1,method='ffill'))

輸出結果:orm

col1 col2 col3 0 1.311620 -0.707176 0.599863 1 -0.423455 -0.700265 1.133371 2 NaN NaN NaN 3 NaN NaN NaN 4 NaN NaN NaN 5 NaN NaN NaN Data Frame with Forward Fill: col1 col2 col3 0 1.311620 -0.707176 0.599863 1 -0.423455 -0.700265 1.133371 2 -0.423455 -0.700265 1.133371 3 -0.423455 -0.700265 1.133371 4 -0.423455 -0.700265 1.133371 5 -0.423455 -0.700265 1.133371

注 - 最後四行被填充了。對象

重建索引時的填充限制

限制參數在重建索引時提供對填充的額外控制。限制指定連續匹配的最大計數。blog

import pandas as pd
import numpy as np

df1 = pd.DataFrame(np.random.randn(6,3),columns=['col1','col2','col3'])
df2 = pd.DataFrame(np.random.randn(2,3),columns=['col1','col2','col3'])

# Padding NAN's
print(df2.reindex_like(df1))
print('\n')

# Now Fill the NAN's with preceding Values
print ("Data Frame with Forward Fill limiting to 1:")
print(df2.reindex_like(df1,method='ffill',limit=1))

輸出結果:

col1 col2 col3 0 0.247784 2.128727 0.702576 1 -0.055713 -0.021732 -0.174577 2 NaN NaN NaN 3 NaN NaN NaN 4 NaN NaN NaN 5 NaN NaN NaN Data Frame with Forward Fill limiting to 1: col1 col2 col3 0 0.247784 2.128727 0.702576 1 -0.055713 -0.021732 -0.174577 2 -0.055713 -0.021732 -0.174577 3 NaN NaN NaN 4 NaN NaN NaN 5 NaN NaN NaN
 

注意 - 只有第7行由前6行填充。 而後,其它行按原樣保留。

重命名

rename()方法容許基於一些映射(字典或者系列)或任意函數來從新標記一個軸。

import pandas as pd
import numpy as np

df1 = pd.DataFrame(np.random.randn(6,3),columns=['col1','col2','col3'])
print(df1)
print('\n')

print ("After renaming the rows and columns:")
print(df1.rename(columns={'col1' : 'c1', 'col2' : 'c2'},index = {0 : 'apple', 1 : 'banana', 2 : 'durian'}))

輸出結果:

col1 col2 col3 0 0.486791 0.105759 1.540122 1 -0.990237 1.007885 -0.217896 2 -0.483855 -1.645027 -1.194113 3 -0.122316 0.566277 -0.366028 4 -0.231524 -0.721172 -0.112007 5 0.438810 0.000225 0.435479 After renaming the rows and columns: c1 c2 col3 apple 0.486791 0.105759 1.540122 banana -0.990237 1.007885 -0.217896 durian -0.483855 -1.645027 -1.194113 3 -0.122316 0.566277 -0.366028 4 -0.231524 -0.721172 -0.112007 5 0.438810 0.000225 0.435479
 

rename()方法提供了一個inplace命名參數,默認爲False並複製底層數據。 指定傳遞inplace = True則表示將數據重命名。

相關文章
相關標籤/搜索