Pandas | 08 重建索引

時間 2019-11-10

標籤 pandas 重建索引简体版

原文原文鏈接

從新索引會更改DataFrame的行標籤和列標籤。shell

能夠經過索引來實現多個操做：app

從新排序現有數據以匹配一組新的標籤。
在沒有標籤數據的標籤位置插入缺失值(NA)標記。

import pandas as pd
import numpy as np

N=20

df = pd.DataFrame({
   'A': pd.date_range(start='2016-01-01',periods=N,freq='D'),
   'x': np.linspace(0,stop=N-1,num=N),
   'y': np.random.rand(N),
   'C': np.random.choice(['Low','Medium','High'],N).tolist(),
   'D': np.random.normal(100, 10, size=(N)).tolist()
})
print(df)
print('\n')

#reindex the DataFrame
df_reindexed = df.reindex(index=[0,2,5], columns=['A', 'C', 'B'])　　　　　　# 將符合的提取出來了 print (df_reindexed)

輸出結果：dom

            A     x         y       C           D
0  2016-01-01   0.0  0.910736     Low  105.308796
1  2016-01-02   1.0  0.570500     Low   91.024238
2  2016-01-03   2.0  0.930298    High  112.359308
3  2016-01-04   3.0  0.251355  Medium  106.155192
4  2016-01-05   4.0  0.579235     Low   90.079651
5  2016-01-06   5.0  0.623852    High  110.592218
6  2016-01-07   6.0  0.621130  Medium   96.222673
7  2016-01-08   7.0  0.989647  Medium   92.253444
8  2016-01-09   8.0  0.506653  Medium  102.601417
9  2016-01-10   9.0  0.099482     Low   97.721659
10 2016-01-11  10.0  0.254750  Medium   75.502131
11 2016-01-12  11.0  0.543014  Medium   88.895951
12 2016-01-13  12.0  0.911283  Medium   79.526056
13 2016-01-14  13.0  0.255296     Low   92.248119
14 2016-01-15  14.0  0.205302     Low  103.301747
15 2016-01-16  15.0  0.246407     Low  107.158250
16 2016-01-17  16.0  0.202039    High   96.411279
17 2016-01-18  17.0  0.734529    High   88.177103
18 2016-01-19  18.0  0.275703  Medium   82.885365
19 2016-01-20  19.0  0.084449    High   98.803349


           A     C   B
0 2016-01-01   Low NaN
2 2016-01-03  High NaN
5 2016-01-06  High NaN

重建索引與其餘對象對齊

有時可能但願採起一個對象和從新索引，其軸被標記爲與另外一個對象相同。考慮下面的例子來理解這一點。函數

import pandas as pd
import numpy as np

df1 = pd.DataFrame(np.random.randn(10,3),columns=['col1','col2','col3'])
df2 = pd.DataFrame(np.random.randn(7,3),columns=['col1','col2','col3'])
print(df1)
print(df2)

df1 = df1.reindex_like(df2)                 # 在df1中，把和df2同樣的標籤行提取出來
print(df1)

輸出結果：spa

       col1      col2      col3
0  0.989992  0.543438 -2.311684
1 -0.704759 -0.555589 -0.570049
2 -0.658263 -0.605368 -0.025520
3  1.533949 -0.936191 -0.071094
4 -0.729812 -0.339670  0.468700
5 -0.164076  0.075098  0.654549
6 -0.491034  1.096496 -0.166250
7  0.230918 -1.561643  1.501326
8  0.703623 -0.407445 -0.792633
9  0.340817 -1.132127 -0.695821

       col1      col2      col3
0  0.144380  0.295776 -0.743097
1 -1.597853  0.029949 -1.605222
2  0.626728 -0.077997 -0.167353
3  0.466008  0.695279 -0.047752
4 -1.088821 -0.456605  1.192847
5 -0.020330  1.616297 -0.368196
6 -1.038790 -1.264894  0.059060

       col1      col2      col3
0  0.989992  0.543438 -2.311684
1 -0.704759 -0.555589 -0.570049
2 -0.658263 -0.605368 -0.025520
3  1.533949 -0.936191 -0.071094
4 -0.729812 -0.339670  0.468700
5 -0.164076  0.075098  0.654549
6 -0.491034  1.096496 -0.166250

注意 - 在這裏，df1數據幀(DataFrame)被更改並從新編號，如df2。 列名稱應該匹配，不然將爲整個列標籤添加NAN。rest

填充時從新加註

reindex()採用可選參數方法，它是一個填充方法，其值以下：code

pad/ffill - 向前填充值
bfill/backfill - 向後填充值
nearest - 從最近的索引值填充

import pandas as pd
import numpy as np

df1 = pd.DataFrame(np.random.randn(6,3),columns=['col1','col2','col3'])
df2 = pd.DataFrame(np.random.randn(2,3),columns=['col1','col2','col3'])

# Padding NAN's
print(df2.reindex_like(df1))
print('\n')

# Now Fill the NAN's with preceding Values
print ("Data Frame with Forward Fill:")
print (df2.reindex_like(df1,method='ffill'))

輸出結果：orm

col1 col2 col3 0 1.311620 -0.707176 0.599863 1 -0.423455 -0.700265 1.133371 2 NaN NaN NaN 3 NaN NaN NaN 4 NaN NaN NaN 5 NaN NaN NaN Data Frame with Forward Fill: col1 col2 col3 0 1.311620 -0.707176 0.599863 1 -0.423455 -0.700265 1.133371 2 -0.423455 -0.700265 1.133371 3 -0.423455 -0.700265 1.133371 4 -0.423455 -0.700265 1.133371 5 -0.423455 -0.700265 1.133371

注 - 最後四行被填充了。對象

重建索引時的填充限制

限制參數在重建索引時提供對填充的額外控制。限制指定連續匹配的最大計數。blog

import pandas as pd
import numpy as np

df1 = pd.DataFrame(np.random.randn(6,3),columns=['col1','col2','col3'])
df2 = pd.DataFrame(np.random.randn(2,3),columns=['col1','col2','col3'])

# Padding NAN's
print(df2.reindex_like(df1))
print('\n')

# Now Fill the NAN's with preceding Values
print ("Data Frame with Forward Fill limiting to 1:")
print(df2.reindex_like(df1,method='ffill',limit=1))

輸出結果：

col1 col2 col3 0 0.247784 2.128727 0.702576 1 -0.055713 -0.021732 -0.174577 2 NaN NaN NaN 3 NaN NaN NaN 4 NaN NaN NaN 5 NaN NaN NaN Data Frame with Forward Fill limiting to 1: col1 col2 col3 0 0.247784 2.128727 0.702576 1 -0.055713 -0.021732 -0.174577 2 -0.055713 -0.021732 -0.174577 3 NaN NaN NaN 4 NaN NaN NaN 5 NaN NaN NaN

注意 - 只有第7行由前6行填充。而後，其它行按原樣保留。

重命名

rename()方法容許基於一些映射(字典或者系列)或任意函數來從新標記一個軸。

import pandas as pd
import numpy as np

df1 = pd.DataFrame(np.random.randn(6,3),columns=['col1','col2','col3'])
print(df1)
print('\n')

print ("After renaming the rows and columns:")
print(df1.rename(columns={'col1' : 'c1', 'col2' : 'c2'},index = {0 : 'apple', 1 : 'banana', 2 : 'durian'}))

輸出結果：

col1 col2 col3 0 0.486791 0.105759 1.540122 1 -0.990237 1.007885 -0.217896 2 -0.483855 -1.645027 -1.194113 3 -0.122316 0.566277 -0.366028 4 -0.231524 -0.721172 -0.112007 5 0.438810 0.000225 0.435479 After renaming the rows and columns: c1 c2 col3 apple 0.486791 0.105759 1.540122 banana -0.990237 1.007885 -0.217896 durian -0.483855 -1.645027 -1.194113 3 -0.122316 0.566277 -0.366028 4 -0.231524 -0.721172 -0.112007 5 0.438810 0.000225 0.435479

rename()方法提供了一個inplace命名參數，默認爲False並複製底層數據。指定傳遞inplace = True則表示將數據重命名。

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。