Pandas | 20 級聯

Pandas提供了各類工具(功能),能夠輕鬆地將SeriesDataFramePanel對象組合在一塊兒。python

pd.concat(objs,axis=0,join='outer',join_axes=None,ignore_index=False)
  • objs - 這是Series,DataFrame或Panel對象的序列或映射。
  • axis - {0,1,...},默認爲0,這是鏈接的軸。
  • join - {'inner', 'outer'},默認inner。如何處理其餘軸上的索引。聯合的外部和交叉的內部。
  • ignore_index − 布爾值,默認爲False。若是指定爲True,則不要使用鏈接軸上的索引值。結果軸將被標記爲:0,...,n-1
  • join_axes - 這是Index對象的列表。用於其餘(n-1)軸的特定索引,而不是執行內部/外部集邏輯。

鏈接對象

concat()函數完成了沿軸執行級聯操做的全部重要工做。下面代碼中,建立不一樣的對象並進行鏈接。shell

import pandas as pd

one = pd.DataFrame({
         'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
         'subject_id':['sub1','sub2','sub4','sub6','sub5'],
         'Marks_scored':[98,90,87,69,78]},
         index=[1,2,3,4,5])

two = pd.DataFrame({
         'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
         'subject_id':['sub2','sub4','sub3','sub6','sub5'],
         'Marks_scored':[89,80,79,97,88]},
         index=[1,2,3,4,5])

rs = pd.concat([one,two])
print(rs)

輸出結果:app

Marks_scored Name subject_id 1 98 Alex sub1 2 90 Amy sub2 3 87 Allen sub4 4 69 Alice sub6 5 78 Ayoung sub5 1 89 Billy sub2 2 80 Brian sub4 3 79 Bran sub3 4 97 Bryce sub6 5 88 Betty sub5
 

假設想把特定的鍵與每一個碎片的DataFrame關聯起來。能夠經過使用鍵參數來實現這一點 -函數

import pandas as pd

one = pd.DataFrame({
         'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
         'subject_id':['sub1','sub2','sub4','sub6','sub5'],
         'Marks_scored':[98,90,87,69,78]},
         index=[1,2,3,4,5])

two = pd.DataFrame({
         'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
         'subject_id':['sub2','sub4','sub3','sub6','sub5'],
         'Marks_scored':[89,80,79,97,88]},
         index=[1,2,3,4,5])

rs = pd.concat([one,two],keys=['x','y'])
print(rs)

輸出結果:工具

Marks_scored Name subject_id x 1 98 Alex sub1 2 90 Amy sub2 3 87 Allen sub4 4 69 Alice sub6 5 78 Ayoung sub5 y 1 89 Billy sub2 2 80 Brian sub4 3 79 Bran sub3 4 97 Bryce sub6 5 88 Betty sub5
 

結果的索引是重複的; 每一個索引重複。若是想要生成的對象必須遵循本身的索引,請將ignore_index設置爲True。參考如下示例代碼 -spa

import pandas as pd

one = pd.DataFrame({
         'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
         'subject_id':['sub1','sub2','sub4','sub6','sub5'],
         'Marks_scored':[98,90,87,69,78]},
         index=[1,2,3,4,5])

two = pd.DataFrame({
         'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
         'subject_id':['sub2','sub4','sub3','sub6','sub5'],
         'Marks_scored':[89,80,79,97,88]},
         index=[1,2,3,4,5])

rs = pd.concat([one,two],keys=['x','y'],ignore_index=True)
print(rs)
輸出結果:
Marks_scored Name subject_id 0 98 Alex sub1 1 90 Amy sub2 2 87 Allen sub4 3 69 Alice sub6 4 78 Ayoung sub5 5 89 Billy sub2 6 80 Brian sub4 7 79 Bran sub3 8 97 Bryce sub6 9 88 Betty sub5
 

觀察,索引徹底改變,鍵也被覆蓋。若是須要沿axis=1添加兩個對象,則會添加新列。code

import pandas as pd

one = pd.DataFrame({
         'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
         'subject_id':['sub1','sub2','sub4','sub6','sub5'],
         'Marks_scored':[98,90,87,69,78]},
         index=[1,2,3,4,5])

two = pd.DataFrame({
         'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
         'subject_id':['sub2','sub4','sub3','sub6','sub5'],
         'Marks_scored':[89,80,79,97,88]},
         index=[1,2,3,4,5])

rs = pd.concat([one,two],axis=1)
print(rs)

輸出結果:對象

Marks_scored Name subject_id Marks_scored Name subject_id 1 98 Alex sub1 89 Billy sub2 2 90 Amy sub2 80 Brian sub4 3 87 Allen sub4 79 Bran sub3 4 69 Alice sub6 97 Bryce sub6 5 78 Ayoung sub5 88 Betty sub5
 

使用附加鏈接

鏈接的一個有用的快捷方式是在Series和DataFrame實例的append方法。這些方法實際上早於concat()方法。 它們沿axis=0鏈接,即索引 -blog

import pandas as pd

one = pd.DataFrame({
         'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
         'subject_id':['sub1','sub2','sub4','sub6','sub5'],
         'Marks_scored':[98,90,87,69,78]},
         index=[1,2,3,4,5])

two = pd.DataFrame({
         'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
         'subject_id':['sub2','sub4','sub3','sub6','sub5'],
         'Marks_scored':[89,80,79,97,88]},
         index=[1,2,3,4,5])

rs = one.append(two)
print(rs)

輸出結果:索引

Marks_scored Name subject_id 1 98 Alex sub1 2 90 Amy sub2 3 87 Allen sub4 4 69 Alice sub6 5 78 Ayoung sub5 1 89 Billy sub2 2 80 Brian sub4 3 79 Bran sub3 4 97 Bryce sub6 5 88 Betty sub5
 

append()函數也能夠帶多個對象 -

import pandas as pd

one = pd.DataFrame({
         'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
         'subject_id':['sub1','sub2','sub4','sub6','sub5'],
         'Marks_scored':[98,90,87,69,78]},
         index=[1,2,3,4,5])

two = pd.DataFrame({
         'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
         'subject_id':['sub2','sub4','sub3','sub6','sub5'],
         'Marks_scored':[89,80,79,97,88]},
         index=[1,2,3,4,5])

rs = one.append([two,one,two])
print(rs)

輸出結果:

Marks_scored Name subject_id 1 98 Alex sub1 2 90 Amy sub2 3 87 Allen sub4 4 69 Alice sub6 5 78 Ayoung sub5 1 89 Billy sub2 2 80 Brian sub4 3 79 Bran sub3 4 97 Bryce sub6 5 88 Betty sub5 1 98 Alex sub1 2 90 Amy sub2 3 87 Allen sub4 4 69 Alice sub6 5 78 Ayoung sub5 1 89 Billy sub2 2 80 Brian sub4 3 79 Bran sub3 4 97 Bryce sub6 5 88 Betty sub5
 

時間序列

Pandas爲時間序列數據的工做時間提供了一個強大的工具,尤爲是在金融領域。在處理時間序列數據時,咱們常常遇到如下狀況 -

  • 生成時間序列
  • 將時間序列轉換爲不一樣的頻率

Pandas提供了一個相對緊湊和自包含的工具來執行上述任務。

獲取當前時間

datetime.now()用於獲取當前的日期和時間。

import pandas as pd

print pd.datetime.now()

輸出結果:

2017-11-03 02:17:45.997992
 

建立一個時間戳

時間戳數據是時間序列數據的最基本類型,它將數值與時間點相關聯。 對於Pandas對象來講,意味着使用時間點。舉個例子 -

import pandas as pd

time = pd.Timestamp('2018-11-01')
print(time)

輸出結果:

2018-11-01 00:00:00
 

也能夠轉換整數或浮動時期。這些的默認單位是納秒(由於這些是如何存儲時間戳的)。 然而,時代每每存儲在另外一個能夠指定的單元中。 再舉一個例子 -

import pandas as pd

time = pd.Timestamp(1588686880,unit='s')
print(time)

輸出結果:

2020-05-05 13:54:40
 

建立一個時間範圍

import pandas as pd

time = pd.date_range("12:00", "23:59", freq="30min").time
print(time)
輸出結果:
[datetime.time(12, 0) datetime.time(12, 30) datetime.time(13, 0) datetime.time(13, 30) datetime.time(14, 0) datetime.time(14, 30) datetime.time(15, 0) datetime.time(15, 30) datetime.time(16, 0) datetime.time(16, 30) datetime.time(17, 0) datetime.time(17, 30) datetime.time(18, 0) datetime.time(18, 30) datetime.time(19, 0) datetime.time(19, 30) datetime.time(20, 0) datetime.time(20, 30) datetime.time(21, 0) datetime.time(21, 30) datetime.time(22, 0) datetime.time(22, 30) datetime.time(23, 0) datetime.time(23, 30)]
 

改變時間的頻率

import pandas as pd

time = pd.date_range("12:00", "23:59", freq="H").time
print(time)

輸出結果:

[datetime.time(12, 0) datetime.time(13, 0) datetime.time(14, 0) datetime.time(15, 0) datetime.time(16, 0) datetime.time(17, 0) datetime.time(18, 0) datetime.time(19, 0) datetime.time(20, 0) datetime.time(21, 0) datetime.time(22, 0) datetime.time(23, 0)]
 

轉換爲時間戳

要轉換相似日期的對象(例如字符串,時代或混合)的序列或相似列表的對象,可使用to_datetime函數。當傳遞時將返回一個Series(具備相同的索引),而相似列表被轉換爲DatetimeIndex。 看看下面的例子 -

import pandas as pd

time = pd.to_datetime(pd.Series(['Jul 31, 2009','2019-10-10', None]))
print(time)

輸出結果:

0 2009-07-31 1 2019-10-10 2 NaT dtype: datetime64[ns]
 

NaT表示不是一個時間的值(至關於NaN)

import pandas as pd
import pandas as pd

time = pd.to_datetime(['2009/11/23', '2019.12.31', None])
print(time)

輸出結果:

DatetimeIndex(['2009-11-23', '2019-12-31', 'NaT'], dtype='datetime64[ns]', freq=None)
相關文章
相關標籤/搜索