Pandas提供了各類工具(功能),能夠輕鬆地將Series
,DataFrame
和Panel
對象組合在一塊兒。python
pd.concat(objs,axis=0,join='outer',join_axes=None,ignore_index=False)
{0,1,...}
,默認爲0
,這是鏈接的軸。{'inner', 'outer'}
,默認inner
。如何處理其餘軸上的索引。聯合的外部和交叉的內部。False
。若是指定爲True
,則不要使用鏈接軸上的索引值。結果軸將被標記爲:0,...,n-1
。(n-1)
軸的特定索引,而不是執行內部/外部集邏輯。concat()
函數完成了沿軸執行級聯操做的全部重要工做。下面代碼中,建立不一樣的對象並進行鏈接。shell
import pandas as pd one = pd.DataFrame({ 'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'], 'subject_id':['sub1','sub2','sub4','sub6','sub5'], 'Marks_scored':[98,90,87,69,78]}, index=[1,2,3,4,5]) two = pd.DataFrame({ 'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'], 'subject_id':['sub2','sub4','sub3','sub6','sub5'], 'Marks_scored':[89,80,79,97,88]}, index=[1,2,3,4,5]) rs = pd.concat([one,two]) print(rs)
輸出結果:app
Marks_scored Name subject_id 1 98 Alex sub1 2 90 Amy sub2 3 87 Allen sub4 4 69 Alice sub6 5 78 Ayoung sub5 1 89 Billy sub2 2 80 Brian sub4 3 79 Bran sub3 4 97 Bryce sub6 5 88 Betty sub5
假設想把特定的鍵與每一個碎片的DataFrame關聯起來。能夠經過使用鍵參數來實現這一點 -函數
import pandas as pd one = pd.DataFrame({ 'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'], 'subject_id':['sub1','sub2','sub4','sub6','sub5'], 'Marks_scored':[98,90,87,69,78]}, index=[1,2,3,4,5]) two = pd.DataFrame({ 'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'], 'subject_id':['sub2','sub4','sub3','sub6','sub5'], 'Marks_scored':[89,80,79,97,88]}, index=[1,2,3,4,5]) rs = pd.concat([one,two],keys=['x','y']) print(rs)
輸出結果:工具
Marks_scored Name subject_id x 1 98 Alex sub1 2 90 Amy sub2 3 87 Allen sub4 4 69 Alice sub6 5 78 Ayoung sub5 y 1 89 Billy sub2 2 80 Brian sub4 3 79 Bran sub3 4 97 Bryce sub6 5 88 Betty sub5
結果的索引是重複的; 每一個索引重複。若是想要生成的對象必須遵循本身的索引,請將ignore_index
設置爲True
。參考如下示例代碼 -spa
import pandas as pd one = pd.DataFrame({ 'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'], 'subject_id':['sub1','sub2','sub4','sub6','sub5'], 'Marks_scored':[98,90,87,69,78]}, index=[1,2,3,4,5]) two = pd.DataFrame({ 'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'], 'subject_id':['sub2','sub4','sub3','sub6','sub5'], 'Marks_scored':[89,80,79,97,88]}, index=[1,2,3,4,5]) rs = pd.concat([one,two],keys=['x','y'],ignore_index=True) print(rs)
輸出結果:
Marks_scored Name subject_id 0 98 Alex sub1 1 90 Amy sub2 2 87 Allen sub4 3 69 Alice sub6 4 78 Ayoung sub5 5 89 Billy sub2 6 80 Brian sub4 7 79 Bran sub3 8 97 Bryce sub6 9 88 Betty sub5
觀察,索引徹底改變,鍵也被覆蓋。若是須要沿axis=1
添加兩個對象,則會添加新列。code
import pandas as pd one = pd.DataFrame({ 'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'], 'subject_id':['sub1','sub2','sub4','sub6','sub5'], 'Marks_scored':[98,90,87,69,78]}, index=[1,2,3,4,5]) two = pd.DataFrame({ 'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'], 'subject_id':['sub2','sub4','sub3','sub6','sub5'], 'Marks_scored':[89,80,79,97,88]}, index=[1,2,3,4,5]) rs = pd.concat([one,two],axis=1) print(rs)
輸出結果:對象
Marks_scored Name subject_id Marks_scored Name subject_id 1 98 Alex sub1 89 Billy sub2 2 90 Amy sub2 80 Brian sub4 3 87 Allen sub4 79 Bran sub3 4 69 Alice sub6 97 Bryce sub6 5 78 Ayoung sub5 88 Betty sub5
鏈接的一個有用的快捷方式是在Series和DataFrame實例的append
方法。這些方法實際上早於concat()
方法。 它們沿axis=0
鏈接,即索引 -blog
import pandas as pd one = pd.DataFrame({ 'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'], 'subject_id':['sub1','sub2','sub4','sub6','sub5'], 'Marks_scored':[98,90,87,69,78]}, index=[1,2,3,4,5]) two = pd.DataFrame({ 'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'], 'subject_id':['sub2','sub4','sub3','sub6','sub5'], 'Marks_scored':[89,80,79,97,88]}, index=[1,2,3,4,5]) rs = one.append(two) print(rs)
輸出結果:索引
Marks_scored Name subject_id 1 98 Alex sub1 2 90 Amy sub2 3 87 Allen sub4 4 69 Alice sub6 5 78 Ayoung sub5 1 89 Billy sub2 2 80 Brian sub4 3 79 Bran sub3 4 97 Bryce sub6 5 88 Betty sub5
append()
函數也能夠帶多個對象 -
import pandas as pd one = pd.DataFrame({ 'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'], 'subject_id':['sub1','sub2','sub4','sub6','sub5'], 'Marks_scored':[98,90,87,69,78]}, index=[1,2,3,4,5]) two = pd.DataFrame({ 'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'], 'subject_id':['sub2','sub4','sub3','sub6','sub5'], 'Marks_scored':[89,80,79,97,88]}, index=[1,2,3,4,5]) rs = one.append([two,one,two]) print(rs)
輸出結果:
Marks_scored Name subject_id 1 98 Alex sub1 2 90 Amy sub2 3 87 Allen sub4 4 69 Alice sub6 5 78 Ayoung sub5 1 89 Billy sub2 2 80 Brian sub4 3 79 Bran sub3 4 97 Bryce sub6 5 88 Betty sub5 1 98 Alex sub1 2 90 Amy sub2 3 87 Allen sub4 4 69 Alice sub6 5 78 Ayoung sub5 1 89 Billy sub2 2 80 Brian sub4 3 79 Bran sub3 4 97 Bryce sub6 5 88 Betty sub5
Pandas爲時間序列數據的工做時間提供了一個強大的工具,尤爲是在金融領域。在處理時間序列數據時,咱們常常遇到如下狀況 -
Pandas提供了一個相對緊湊和自包含的工具來執行上述任務。
datetime.now()
用於獲取當前的日期和時間。
import pandas as pd print pd.datetime.now()
輸出結果:
2017-11-03 02:17:45.997992
時間戳數據是時間序列數據的最基本類型,它將數值與時間點相關聯。 對於Pandas對象來講,意味着使用時間點。舉個例子 -
import pandas as pd time = pd.Timestamp('2018-11-01') print(time)
輸出結果:
2018-11-01 00:00:00
也能夠轉換整數或浮動時期。這些的默認單位是納秒(由於這些是如何存儲時間戳的)。 然而,時代每每存儲在另外一個能夠指定的單元中。 再舉一個例子 -
import pandas as pd time = pd.Timestamp(1588686880,unit='s') print(time)
輸出結果:
2020-05-05 13:54:40
import pandas as pd time = pd.date_range("12:00", "23:59", freq="30min").time print(time)
輸出結果:
[datetime.time(12, 0) datetime.time(12, 30) datetime.time(13, 0) datetime.time(13, 30) datetime.time(14, 0) datetime.time(14, 30) datetime.time(15, 0) datetime.time(15, 30) datetime.time(16, 0) datetime.time(16, 30) datetime.time(17, 0) datetime.time(17, 30) datetime.time(18, 0) datetime.time(18, 30) datetime.time(19, 0) datetime.time(19, 30) datetime.time(20, 0) datetime.time(20, 30) datetime.time(21, 0) datetime.time(21, 30) datetime.time(22, 0) datetime.time(22, 30) datetime.time(23, 0) datetime.time(23, 30)]
import pandas as pd time = pd.date_range("12:00", "23:59", freq="H").time print(time)
輸出結果:
[datetime.time(12, 0) datetime.time(13, 0) datetime.time(14, 0) datetime.time(15, 0) datetime.time(16, 0) datetime.time(17, 0) datetime.time(18, 0) datetime.time(19, 0) datetime.time(20, 0) datetime.time(21, 0) datetime.time(22, 0) datetime.time(23, 0)]
要轉換相似日期的對象(例如字符串,時代或混合)的序列或相似列表的對象,可使用to_datetime
函數。當傳遞時將返回一個Series(具備相同的索引),而相似列表被轉換爲DatetimeIndex
。 看看下面的例子 -
import pandas as pd time = pd.to_datetime(pd.Series(['Jul 31, 2009','2019-10-10', None])) print(time)
輸出結果:
0 2009-07-31 1 2019-10-10 2 NaT dtype: datetime64[ns]
NaT
表示不是一個時間的值(至關於NaN
)
import pandas as pd import pandas as pd time = pd.to_datetime(['2009/11/23', '2019.12.31', None]) print(time)
輸出結果:
DatetimeIndex(['2009-11-23', '2019-12-31', 'NaT'], dtype='datetime64[ns]', freq=None)