Pandas | 19 合併/鏈接

Pandas具備功能全面的高性能內存中鏈接操做,與SQL等關係數據庫很是類似。
Pandas提供了一個單獨的merge()函數,做爲DataFrame對象之間全部標準數據庫鏈接操做的入口 -python

 
pd.merge(left, right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=True)
  • left - 一個DataFrame對象。
  • right - 另外一個DataFrame對象。
  • on - 列(名稱)鏈接,必須在左和右DataFrame對象中存在(找到)。
  • left_on - 左側DataFrame中的列用做鍵,能夠是列名或長度等於DataFrame長度的數組。
  • right_on - 來自右的DataFrame的列做爲鍵,能夠是列名或長度等於DataFrame長度的數組。
  • left_index - 若是爲True,則使用左側DataFrame中的索引(行標籤)做爲其鏈接鍵。 在具備MultiIndex(分層)的DataFrame的狀況下,級別的數量必須與來自右DataFrame的鏈接鍵的數量相匹配。
  • right_index - 與右DataFrame的left_index具備相同的用法。
  • how - 它是left, right, outer以及inner之中的一個,默認爲內inner。 下面將介紹每種方法的用法。
  • sort - 按照字典順序經過鏈接鍵對結果DataFrame進行排序。默認爲True,設置爲False時,在不少狀況下大大提升性能。

如今建立兩個不一樣的DataFrame並對其執行合併操做。shell

import pandas as pd
left
= pd.DataFrame({ 'id':[1,2,3,4,5], 'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'], 'subject_id':['sub1','sub2','sub4','sub6','sub5']})
right
= pd.DataFrame( {'id':[1,2,3,4,5], 'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'], 'subject_id':['sub2','sub4','sub3','sub6','sub5']}) print (left) print("========================================") print (right)

輸出結果:數據庫

Name id subject_id 0 Alex 1 sub1 1 Amy 2 sub2 2 Allen 3 sub4 3 Alice 4 sub6 4 Ayoung 5 sub5 ======================================== Name id subject_id 0 Billy 1 sub2 1 Brian 2 sub4 2 Bran 3 sub3 3 Bryce 4 sub6 4 Betty 5 sub5
 

在一個鍵上合併兩個數據幀數組

import pandas as pd

left = pd.DataFrame({
         'id':[1,2,3,4,5],
         'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
         'subject_id':['sub1','sub2','sub4','sub6','sub5']})

right = pd.DataFrame(
         {'id':[1,2,3,4,5],
         'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
         'subject_id':['sub2','sub4','sub3','sub6','sub5']})

rs = pd.merge(left,right,on='id')
print(rs)

輸出結果:函數

Name_x id subject_id_x Name_y subject_id_y 0 Alex 1 sub1 Billy sub2 1 Amy 2 sub2 Brian sub4 2 Allen 3 sub4 Bran sub3 3 Alice 4 sub6 Bryce sub6 4 Ayoung 5 sub5 Betty sub5
 

合併多個鍵上的兩個數據框性能

import pandas as pd

left = pd.DataFrame({
         'id':[1,2,3,4,5],
         'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
         'subject_id':['sub1','sub2','sub4','sub6','sub5']})

right = pd.DataFrame(
         {'id':[1,2,3,4,5],
         'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
         'subject_id':['sub2','sub4','sub3','sub6','sub5']})

rs = pd.merge(left,right,on=['id','subject_id'])
print(rs)

輸出結果:spa

Name_x id subject_id Name_y 0 Alice 4 sub6 Bryce 1 Ayoung 5 sub5 Betty
 

合併使用「how」的參數

如何合併參數指定如何肯定哪些鍵將被包含在結果表中。若是組合鍵沒有出如今左側或右側表中,則鏈接表中的值將爲NAcode

這裏是how選項和SQL等效名稱的總結 -對象

合併方法 SQL等效 描述
left LEFT OUTER JOIN 使用左側對象的鍵
right RIGHT OUTER JOIN 使用右側對象的鍵
outer FULL OUTER JOIN 使用鍵的聯合
inner INNER JOIN 使用鍵的交集

Left Join示例blog

import pandas as pd
left = pd.DataFrame({
         'id':[1,2,3,4,5],
         'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
         'subject_id':['sub1','sub2','sub4','sub6','sub5']})

right = pd.DataFrame(
         {'id':[1,2,3,4,5],
         'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
         'subject_id':['sub2','sub4','sub3','sub6','sub5']})

rs = pd.merge(left, right, on='subject_id', how='left')
print (rs)

輸出結果:

Name_x id_x subject_id Name_y id_y 0 Alex 1 sub1 NaN NaN 1 Amy 2 sub2 Billy 1.0 2 Allen 3 sub4 Brian 2.0 3 Alice 4 sub6 Bryce 4.0 4 Ayoung 5 sub5 Betty 5.0
 

Right Join示例

import pandas as pd

left = pd.DataFrame({
         'id':[1,2,3,4,5],
         'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
         'subject_id':['sub1','sub2','sub4','sub6','sub5']})

right = pd.DataFrame(
         {'id':[1,2,3,4,5],
         'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
         'subject_id':['sub2','sub4','sub3','sub6','sub5']})

rs = pd.merge(left, right, on='subject_id', how='right')
print (rs)

輸出結果:

Name_x id_x subject_id Name_y id_y 0 Amy 2.0 sub2 Billy 1 1 Allen 3.0 sub4 Brian 2 2 Alice 4.0 sub6 Bryce 4 3 Ayoung 5.0 sub5 Betty 5 4 NaN NaN sub3 Bran 3
 

Outer Join示例

import pandas as pd

left = pd.DataFrame({
         'id':[1,2,3,4,5],
         'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
         'subject_id':['sub1','sub2','sub4','sub6','sub5']})

right = pd.DataFrame(
         {'id':[1,2,3,4,5],
         'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
         'subject_id':['sub2','sub4','sub3','sub6','sub5']})

rs = pd.merge(left, right, how='outer', on='subject_id')
print (rs)

輸出結果:

Name_x id_x subject_id Name_y id_y 0 Alex 1.0 sub1 NaN NaN 1 Amy 2.0 sub2 Billy 1.0 2 Allen 3.0 sub4 Brian 2.0 3 Alice 4.0 sub6 Bryce 4.0 4 Ayoung 5.0 sub5 Betty 5.0 5 NaN NaN sub3 Bran 3.0
 

Inner Join示例

鏈接將在索引上進行。鏈接(Join)操做將授予它所調用的對象。因此,a.join(b)不等於b.join(a)

import pandas as pd

left = pd.DataFrame({
         'id':[1,2,3,4,5],
         'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
         'subject_id':['sub1','sub2','sub4','sub6','sub5']})

right = pd.DataFrame(
         {'id':[1,2,3,4,5],
         'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
         'subject_id':['sub2','sub4','sub3','sub6','sub5']})

rs = pd.merge(left, right, on='subject_id', how='inner')
print (rs)

輸出結果:

Name_x id_x subject_id Name_y id_y 0 Amy 2 sub2 Billy 1 1 Allen 3 sub4 Brian 2 2 Alice 4 sub6 Bryce 4 3 Ayoung 5 sub5 Betty 5
相關文章
相關標籤/搜索