目錄python
# 導入相關庫 import numpy as np import pandas as pd """ 拼接 有兩個DataFrame,都存儲了用戶的一些信息,如今要拼接起來,組成一個DataFrame,如何實現呢? """ data1 = { "name": ["Tom", "Bob"], "age": [18, 30], "city": ["Bei Jing ", "Shang Hai "] } df1 = pd.DataFrame(data=data1) df1 Out[85]: name age city 0 Tom 18 Bei Jing 1 Bob 30 Shang Hai data2 = { "name": ["Mary", "James"], "age": [35, 18], "city": ["Guang Zhou", "Shen Zhen"] } df2 = pd.DataFrame(data=data2) df2 Out[86]: name age city 0 Mary 35 Guang Zhou 1 James 18 Shen Zhen
def append(self, other, ignore_index=False,verify_integrity=False, sort=None):
append 是最簡單的拼接兩個DataFrame的方法。app
df1.append(df2) Out[87]: name age city 0 Tom 18 Bei Jing 1 Bob 30 Shang Hai 0 Mary 35 Guang Zhou 1 James 18 Shen Zhen
能夠看到,拼接後的索引默認仍是原有的索引,若是想要從新生成索引的話,設置參數 ignore_index=True 便可。spa
df1.append(df2, ignore_index=True) Out[88]: name age city 0 Tom 18 Bei Jing 1 Bob 30 Shang Hai 2 Mary 35 Guang Zhou 3 James 18 Shen Zhen
除了 append 這種方式以外,還有 concat 這種方式能夠實現相同的功能。blog
pd.concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, sort=None, copy=True):
例子索引
objs=[df1, df2] pd.concat(objs, ignore_index=True) Out[89]: name age city 0 Tom 18 Bei Jing 1 Bob 30 Shang Hai 2 Mary 35 Guang Zhou 3 James 18 Shen Zhen
若是想要區分出不一樣的DataFrame的數據,能夠經過設置參數 keys,固然得設置參數 ignore_index=False。ci
pd.concat(objs, ignore_index=False, keys=["df1", "df2"]) Out[90]: name age city df1 0 Tom 18 Bei Jing 1 Bob 30 Shang Hai df2 0 Mary 35 Guang Zhou 1 James 18 Shen Zhen
有兩個DataFrame,分別存儲了用戶的部分信息,如今須要將用戶的這些信息關聯起來,如何實現呢?pandas
def merge(self, right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True, indicator=False, validate=None):
經過 pd.merge 能夠關聯兩個DataFrame,這裏咱們設置參數 on="name",表示依據 name 來做爲關聯鍵。默認how='inner',咱們能夠設置成outerit
data1 = { "name": ["Tom", "Bob", "Mary", "James"], "age": [18, 30, 35, 18], "city": ["Bei Jing ", "Shang Hai ", "Guang Zhou", "Shen Zhen"] } df1 = pd.DataFrame(data=data1) df1 data2 = {"name": ["Bob", "Mary", "James", "Andy"], "sex": ["male", "female", "male", np.nan], "income": [8000, 8000, 4000, 6000] } df2 = pd.DataFrame(data=data2) df2 pd.merge(df1,df2,on="name") Out[91]: name age city sex income 0 Bob 30 Shang Hai male 8000 1 Mary 35 Guang Zhou female 8000 2 James 18 Shen Zhen male 4000 #關聯後發現數據變少了,只有 3 行數據,這是由於默認關聯的方式是 inner,若是不想丟失任何數據,能夠設置參數 how="outer"。 pd.merge(df1,df2,on="name",how="outer") Out[92]: name age city sex income 0 Tom 18.0 Bei Jing NaN NaN 1 Bob 30.0 Shang Hai male 8000.0 2 Mary 35.0 Guang Zhou female 8000.0 3 James 18.0 Shen Zhen male 4000.0 4 Andy NaN NaN NaN 6000.0
若是咱們想保留左邊全部的數據,能夠設置參數 how="left";反之,若是想保留右邊的全部數據,能夠設置參數 how="right"class
pd.merge(df1, df2, on="name", how="left") Out[93]: name age city sex income 0 Tom 18 Bei Jing NaN NaN 1 Bob 30 Shang Hai male 8000.0 2 Mary 35 Guang Zhou female 8000.0 3 James 18 Shen Zhen male 4000.0
有時候,兩個 DataFrame 中須要關聯的鍵的名稱不同,能夠經過 left_on 和 right_on 來分別設置。import
df1.rename(columns={"name": "name1"}, inplace=True) df1 Out[94]: name1 age city 0 Tom 18 Bei Jing 1 Bob 30 Shang Hai 2 Mary 35 Guang Zhou 3 James 18 Shen Zhen df2.rename(columns={"name": "name2"}, inplace=True) df2 Out[95]: name2 sex income 0 Bob male 8000 1 Mary female 8000 2 James male 4000 3 Andy NaN 6000 pd.merge(df1, df2, left_on="name1", right_on="name2") Out[96]: name1 age city name2 sex income 0 Bob 30 Shang Hai Bob male 8000 1 Mary 35 Guang Zhou Mary female 8000 2 James 18 Shen Zhen James male 4000
有時候,兩個DataFrame中都包含相同名稱的字段,如何處理呢?
咱們能夠設置參數 suffixes,默認 suffixes=('_x', '_y') 表示將相同名稱的左邊的DataFrame的字段名加上後綴 _x,右邊加上後綴 _y。
df1["sex"] = "male" df1 Out[97]: name1 age city sex 0 Tom 18 Bei Jing male 1 Bob 30 Shang Hai male 2 Mary 35 Guang Zhou male 3 James 18 Shen Zhen male pd.merge(df1, df2, left_on="name1", right_on="name2") Out[98]: name1 age city sex_x name2 sex_y income 0 Bob 30 Shang Hai male Bob male 8000 1 Mary 35 Guang Zhou male Mary female 8000 2 James 18 Shen Zhen male James male 4000 pd.merge(df1, df2, left_on="name1", right_on="name2", suffixes=("_left", "_right")) Out[99]: name1 age city sex_left name2 sex_right income 0 Bob 30 Shang Hai male Bob male 8000 1 Mary 35 Guang Zhou male Mary female 8000 2 James 18 Shen Zhen male James male 4000
def join(self, other, on=None, how='left', lsuffix='', rsuffix='',sort=False):
除了 merge 這種方式外,還能夠經過 join 這種方式實現關聯。相比 merge,join 這種方式有如下幾個不一樣:
(1)默認參數on=None,表示關聯時使用左邊和右邊的索引做爲鍵,設置參數on能夠指定的是關聯時左邊的所用到的鍵名
(2)左邊和右邊字段名稱重複時,經過設置參數 lsuffix 和 rsuffix 來解決。
df1.join(df2.set_index("name2"), on="name1", lsuffix="_left") Out[100]: name1 age city sex_left sex income 0 Tom 18 Bei Jing male NaN NaN 1 Bob 30 Shang Hai male male 8000.0 2 Mary 35 Guang Zhou male female 8000.0 3 James 18 Shen Zhen male male 4000.0