Pandas學習筆記系列:html
原文:https://morvanzhou.github.io/tutorials/data-manipulation/np-pd/3-6-pd-concat/ 本文有刪改python
pandas
處理多組數據的時候每每會要用到數據的合併處理,使用 concat
是一種基本的合併方式.並且concat
中有不少參數能夠調整,合併成你想要的數據形式.git
axis=0是預設值,所以未設定任何參數時,函數默認axis=0。github
import pandas as pd import numpy as np #定義資料集 df1 = pd.DataFrame(np.ones((3,4))*0, columns=['a','b','c','d']) df2 = pd.DataFrame(np.ones((3,4))*1, columns=['a','b','c','d']) df3 = pd.DataFrame(np.ones((3,4))*2, columns=['a','b','c','d']) #concat縱向合併 res = pd.concat([df1, df2, df3], axis=0) #打印結果 print(res) """ a b c d 0 0.0 0.0 0.0 0.0 1 0.0 0.0 0.0 0.0 2 0.0 0.0 0.0 0.0 0 1.0 1.0 1.0 1.0 1 1.0 1.0 1.0 1.0 2 1.0 1.0 1.0 1.0 0 2.0 2.0 2.0 2.0 1 2.0 2.0 2.0 2.0 2 2.0 2.0 2.0 2.0 """
仔細觀察會發現結果的index
是0, 1, 2, 0, 1, 2, 0, 1, 2,若要將index
重置,請看下面。app
#承上一個例子,並將index_ignore設定爲True res = pd.concat([df1, df2, df3], axis=0, ignore_index=True) #打印結果 print(res) """ a b c d 0 0.0 0.0 0.0 0.0 1 0.0 0.0 0.0 0.0 2 0.0 0.0 0.0 0.0 3 1.0 1.0 1.0 1.0 4 1.0 1.0 1.0 1.0 5 1.0 1.0 1.0 1.0 6 2.0 2.0 2.0 2.0 7 2.0 2.0 2.0 2.0 8 2.0 2.0 2.0 2.0 """
結果的index變0, 1, 2, 3, 4, 5, 6, 7, 8。函數
join='outer'
爲預設值,所以未設定任何參數時,函數默認join='outer'。此方式是依照column來作縱向合併,有相同的column上下合併在一塊兒,其餘獨自的column個自成列,本來沒有值的位置皆以NaN填充。學習
import pandas as pd import numpy as np #定義資料集 df1 = pd.DataFrame(np.ones((3,4))*0, columns=['a','b','c','d'], index=[1,2,3]) df2 = pd.DataFrame(np.ones((3,4))*1, columns=['b','c','d','e'], index=[2,3,4]) #縱向"外"合併df1與df2 res = pd.concat([df1, df2], axis=0, join='outer') print(res) """ a b c d e 1 0.0 0.0 0.0 0.0 NaN 2 0.0 0.0 0.0 0.0 NaN 3 0.0 0.0 0.0 0.0 NaN 2 NaN 1.0 1.0 1.0 1.0 3 NaN 1.0 1.0 1.0 1.0 4 NaN 1.0 1.0 1.0 1.0 """
原理同上個例子的說明,但只有相同的column合併在一塊兒,其餘的會被拋棄。spa
#承上一個例子 #縱向"內"合併df1與df2 res = pd.concat([df1, df2], axis=0, join='inner') #打印結果 print(res) """ b c d 1 0.0 0.0 0.0 2 0.0 0.0 0.0 3 0.0 0.0 0.0 2 1.0 1.0 1.0 3 1.0 1.0 1.0 4 1.0 1.0 1.0 """ #重置index並打印結果 res = pd.concat([df1, df2], axis=0, join='inner', ignore_index=True) print(res) """ b c d 0 0.0 0.0 0.0 1 0.0 0.0 0.0 2 0.0 0.0 0.0 3 1.0 1.0 1.0 4 1.0 1.0 1.0 5 1.0 1.0 1.0 """
import pandas as pd import numpy as np #定義資料集 df1 = pd.DataFrame(np.ones((3,4))*0, columns=['a','b','c','d'], index=[1,2,3]) df2 = pd.DataFrame(np.ones((3,4))*1, columns=['b','c','d','e'], index=[2,3,4]) #依照`df1.index`進行橫向合併 res = pd.concat([df1, df2], axis=1, join_axes=[df1.index]) #打印結果 print(res) """ a b c d b c d e 1 0.0 0.0 0.0 0.0 NaN NaN NaN NaN 2 0.0 0.0 0.0 0.0 1.0 1.0 1.0 1.0 3 0.0 0.0 0.0 0.0 1.0 1.0 1.0 1.0 """ #移除join_axes,並打印結果 res = pd.concat([df1, df2], axis=1) print(res) """ a b c d b c d e 1 0.0 0.0 0.0 0.0 NaN NaN NaN NaN 2 0.0 0.0 0.0 0.0 1.0 1.0 1.0 1.0 3 0.0 0.0 0.0 0.0 1.0 1.0 1.0 1.0 4 NaN NaN NaN NaN 1.0 1.0 1.0 1.0 """
append只有縱向合併,沒有橫向合併。code
import pandas as pd import numpy as np #定義資料集 df1 = pd.DataFrame(np.ones((3,4))*0, columns=['a','b','c','d']) df2 = pd.DataFrame(np.ones((3,4))*1, columns=['a','b','c','d']) df3 = pd.DataFrame(np.ones((3,4))*1, columns=['a','b','c','d']) s1 = pd.Series([1,2,3,4], index=['a','b','c','d']) #將df2合併到df1的下面,以及重置index,並打印出結果 res = df1.append(df2, ignore_index=True) print(res) """ a b c d 0 0.0 0.0 0.0 0.0 1 0.0 0.0 0.0 0.0 2 0.0 0.0 0.0 0.0 3 1.0 1.0 1.0 1.0 4 1.0 1.0 1.0 1.0 5 1.0 1.0 1.0 1.0 """ #合併多個df,將df2與df3合併至df1的下面,以及重置index,並打印出結果 res = df1.append([df2, df3], ignore_index=True) print(res) """ a b c d 0 0.0 0.0 0.0 0.0 1 0.0 0.0 0.0 0.0 2 0.0 0.0 0.0 0.0 3 1.0 1.0 1.0 1.0 4 1.0 1.0 1.0 1.0 5 1.0 1.0 1.0 1.0 6 1.0 1.0 1.0 1.0 7 1.0 1.0 1.0 1.0 8 1.0 1.0 1.0 1.0 """ #合併series,將s1合併至df1,以及重置index,並打印出結果 res = df1.append(s1, ignore_index=True) print(res) """ a b c d 0 0.0 0.0 0.0 0.0 1 0.0 0.0 0.0 0.0 2 0.0 0.0 0.0 0.0 3 1.0 2.0 3.0 4.0 """