從新索引會更改DataFrame的行標籤和列標籤。shell
能夠經過索引來實現多個操做:app
import pandas as pd import numpy as np N=20 df = pd.DataFrame({ 'A': pd.date_range(start='2016-01-01',periods=N,freq='D'), 'x': np.linspace(0,stop=N-1,num=N), 'y': np.random.rand(N), 'C': np.random.choice(['Low','Medium','High'],N).tolist(), 'D': np.random.normal(100, 10, size=(N)).tolist() }) print(df) print('\n') #reindex the DataFrame df_reindexed = df.reindex(index=[0,2,5], columns=['A', 'C', 'B']) # 將符合的提取出來了 print (df_reindexed)
輸出結果:dom
A x y C D
0 2016-01-01 0.0 0.910736 Low 105.308796
1 2016-01-02 1.0 0.570500 Low 91.024238
2 2016-01-03 2.0 0.930298 High 112.359308
3 2016-01-04 3.0 0.251355 Medium 106.155192
4 2016-01-05 4.0 0.579235 Low 90.079651
5 2016-01-06 5.0 0.623852 High 110.592218
6 2016-01-07 6.0 0.621130 Medium 96.222673
7 2016-01-08 7.0 0.989647 Medium 92.253444
8 2016-01-09 8.0 0.506653 Medium 102.601417
9 2016-01-10 9.0 0.099482 Low 97.721659
10 2016-01-11 10.0 0.254750 Medium 75.502131
11 2016-01-12 11.0 0.543014 Medium 88.895951
12 2016-01-13 12.0 0.911283 Medium 79.526056
13 2016-01-14 13.0 0.255296 Low 92.248119
14 2016-01-15 14.0 0.205302 Low 103.301747
15 2016-01-16 15.0 0.246407 Low 107.158250
16 2016-01-17 16.0 0.202039 High 96.411279
17 2016-01-18 17.0 0.734529 High 88.177103
18 2016-01-19 18.0 0.275703 Medium 82.885365
19 2016-01-20 19.0 0.084449 High 98.803349
A C B
0 2016-01-01 Low NaN
2 2016-01-03 High NaN
5 2016-01-06 High NaN
有時可能但願採起一個對象和從新索引,其軸被標記爲與另外一個對象相同。 考慮下面的例子來理解這一點。函數
import pandas as pd import numpy as np df1 = pd.DataFrame(np.random.randn(10,3),columns=['col1','col2','col3']) df2 = pd.DataFrame(np.random.randn(7,3),columns=['col1','col2','col3']) print(df1) print(df2) df1 = df1.reindex_like(df2) # 在df1中,把和df2同樣的標籤行提取出來 print(df1)
輸出結果:spa
col1 col2 col3
0 0.989992 0.543438 -2.311684
1 -0.704759 -0.555589 -0.570049
2 -0.658263 -0.605368 -0.025520
3 1.533949 -0.936191 -0.071094
4 -0.729812 -0.339670 0.468700
5 -0.164076 0.075098 0.654549
6 -0.491034 1.096496 -0.166250
7 0.230918 -1.561643 1.501326
8 0.703623 -0.407445 -0.792633
9 0.340817 -1.132127 -0.695821
col1 col2 col3
0 0.144380 0.295776 -0.743097
1 -1.597853 0.029949 -1.605222
2 0.626728 -0.077997 -0.167353
3 0.466008 0.695279 -0.047752
4 -1.088821 -0.456605 1.192847
5 -0.020330 1.616297 -0.368196
6 -1.038790 -1.264894 0.059060
col1 col2 col3
0 0.989992 0.543438 -2.311684
1 -0.704759 -0.555589 -0.570049
2 -0.658263 -0.605368 -0.025520
3 1.533949 -0.936191 -0.071094
4 -0.729812 -0.339670 0.468700
5 -0.164076 0.075098 0.654549
6 -0.491034 1.096496 -0.166250
注意 - 在這裏,
df1
數據幀(DataFrame)被更改並從新編號,如df2
。 列名稱應該匹配,不然將爲整個列標籤添加NAN
。rest
reindex()
採用可選參數方法,它是一個填充方法,其值以下:code
pad/ffill
- 向前填充值bfill/backfill
- 向後填充值nearest
- 從最近的索引值填充import pandas as pd import numpy as np df1 = pd.DataFrame(np.random.randn(6,3),columns=['col1','col2','col3']) df2 = pd.DataFrame(np.random.randn(2,3),columns=['col1','col2','col3']) # Padding NAN's print(df2.reindex_like(df1)) print('\n') # Now Fill the NAN's with preceding Values print ("Data Frame with Forward Fill:") print (df2.reindex_like(df1,method='ffill'))
輸出結果:orm
col1 col2 col3 0 1.311620 -0.707176 0.599863 1 -0.423455 -0.700265 1.133371 2 NaN NaN NaN 3 NaN NaN NaN 4 NaN NaN NaN 5 NaN NaN NaN Data Frame with Forward Fill: col1 col2 col3 0 1.311620 -0.707176 0.599863 1 -0.423455 -0.700265 1.133371 2 -0.423455 -0.700265 1.133371 3 -0.423455 -0.700265 1.133371 4 -0.423455 -0.700265 1.133371 5 -0.423455 -0.700265 1.133371
注 - 最後四行被填充了。對象
限制參數在重建索引時提供對填充的額外控制。限制指定連續匹配的最大計數。blog
import pandas as pd import numpy as np df1 = pd.DataFrame(np.random.randn(6,3),columns=['col1','col2','col3']) df2 = pd.DataFrame(np.random.randn(2,3),columns=['col1','col2','col3']) # Padding NAN's print(df2.reindex_like(df1)) print('\n') # Now Fill the NAN's with preceding Values print ("Data Frame with Forward Fill limiting to 1:") print(df2.reindex_like(df1,method='ffill',limit=1))
輸出結果:
col1 col2 col3 0 0.247784 2.128727 0.702576 1 -0.055713 -0.021732 -0.174577 2 NaN NaN NaN 3 NaN NaN NaN 4 NaN NaN NaN 5 NaN NaN NaN Data Frame with Forward Fill limiting to 1: col1 col2 col3 0 0.247784 2.128727 0.702576 1 -0.055713 -0.021732 -0.174577 2 -0.055713 -0.021732 -0.174577 3 NaN NaN NaN 4 NaN NaN NaN 5 NaN NaN NaN
注意 - 只有第
7
行由前6
行填充。 而後,其它行按原樣保留。
rename()
方法容許基於一些映射(字典或者系列)或任意函數來從新標記一個軸。
import pandas as pd import numpy as np df1 = pd.DataFrame(np.random.randn(6,3),columns=['col1','col2','col3']) print(df1) print('\n') print ("After renaming the rows and columns:") print(df1.rename(columns={'col1' : 'c1', 'col2' : 'c2'},index = {0 : 'apple', 1 : 'banana', 2 : 'durian'}))
輸出結果:
col1 col2 col3 0 0.486791 0.105759 1.540122 1 -0.990237 1.007885 -0.217896 2 -0.483855 -1.645027 -1.194113 3 -0.122316 0.566277 -0.366028 4 -0.231524 -0.721172 -0.112007 5 0.438810 0.000225 0.435479 After renaming the rows and columns: c1 c2 col3 apple 0.486791 0.105759 1.540122 banana -0.990237 1.007885 -0.217896 durian -0.483855 -1.645027 -1.194113 3 -0.122316 0.566277 -0.366028 4 -0.231524 -0.721172 -0.112007 5 0.438810 0.000225 0.435479
rename()
方法提供了一個inplace
命名參數,默認爲False
並複製底層數據。 指定傳遞inplace = True
則表示將數據重命名。