爲了處理數字數據,Pandas提供了幾個變體,如滾動,展開和指數移動窗口統計的權重。 其中包括總和,均值,中位數,方差,協方差,相關性等。本章討論的是在DataFrame對象上應用這些方法。shell
.rolling()函數
這個函數能夠應用於一系列數據。指定window=n
參數,並應用適當的統計函數。dom
import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(10, 4),index = pd.date_range('1/1/2020', periods=10),columns = ['A', 'B', 'C', 'D']) print(df) print('\n') print (df.rolling(window=3).mean())
輸出結果 :函數
A B C D
2020-01-01 0.517788 0.524324 0.723912 -0.316153
2020-01-02 0.553257 -0.489424 -0.942906 -0.002625
2020-01-03 -0.113628 0.291778 -1.216583 1.192297
2020-01-04 0.549378 2.526383 -1.006843 -0.209177
2020-01-05 0.646680 0.249695 -0.420748 0.502700
2020-01-06 -0.323045 -0.962962 0.035932 -1.342486
2020-01-07 -1.209534 0.138791 0.756402 0.229242
2020-01-08 -0.473912 -1.734865 0.269594 -0.293566
2020-01-09 2.144167 0.508603 0.076023 -0.246540
2020-01-10 -0.199808 0.887562 0.196244 0.831584
A B C D
2020-01-01 NaN NaN NaN NaN
2020-01-02 NaN NaN NaN NaN
2020-01-03 0.319139 0.108892 -0.478526 0.291173
2020-01-04 0.329669 0.776245 -1.055444 0.326831
2020-01-05 0.360810 1.022618 -0.881391 0.495273
2020-01-06 0.291004 0.604372 -0.463887 -0.349655
2020-01-07 -0.295300 -0.191492 0.123862 -0.203515
2020-01-08 -0.668830 -0.853012 0.353976 -0.468937
2020-01-09 0.153574 -0.362490 0.367340 -0.103622
2020-01-10 0.490149 -0.112900 0.180621 0.097159
注 - 因爲窗口大小爲
3
(window
),第三個元素的值將是n
,n-1
和n-2
元素的平均值。因此這樣也能夠應用上面提到的各類函數了。spa
.expanding()函數
這個函數能夠應用於一系列數據。 指定min_periods = n
參數並在其上應用適當的統計函數。code
import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(10, 4), index = pd.date_range('1/1/2018', periods=10), columns = ['A', 'B', 'C', 'D']) print(df) print('\n') print (df.expanding(min_periods=3).mean())
輸出結果:對象
A B C D
2018-01-01 0.246692 0.511610 -0.440860 -0.241488
2018-01-02 -0.287958 1.554392 -0.870998 -0.141933
2018-01-03 -0.219975 -0.217251 3.032686 -0.800669
2018-01-04 -0.297885 0.336629 -0.313112 -0.633826
2018-01-05 -0.226151 -0.266663 0.988562 -0.424164
2018-01-06 -0.641176 -2.556270 1.907479 0.779536
2018-01-07 0.022907 -0.333231 1.784900 1.075321
2018-01-08 -1.045178 0.295636 0.127447 -1.417171
2018-01-09 1.048741 0.841395 0.104583 1.015302
2018-01-10 -0.209738 0.333223 -1.279857 -0.380164
A B C D
2018-01-01 NaN NaN NaN NaN
2018-01-02 NaN NaN NaN NaN
2018-01-03 -0.087081 0.616250 0.573609 -0.394697
2018-01-04 -0.139782 0.546345 0.351929 -0.454479
2018-01-05 -0.157056 0.383744 0.479256 -0.448416
2018-01-06 -0.237742 -0.106259 0.717293 -0.243757
2018-01-07 -0.200507 -0.138683 0.869808 -0.055318
2018-01-08 -0.306091 -0.084393 0.777013 -0.225549
2018-01-09 -0.155554 0.018472 0.702299 -0.087677
2018-01-10 -0.160972 0.049947 0.504083 -0.116926
.ewm()函數
ewm()
可應用於系列數據。指定com
,span
,halflife
參數,並在其上應用適當的統計函數。它以指數形式分配權重。blog
import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(10, 4), index = pd.date_range('1/1/2019', periods=10), columns = ['A', 'B', 'C', 'D']) print(df) print('\n') print (df.ewm(com=0.5).mean())
輸出結果:pandas
A B C D
2019-01-01 1.204552 -0.936226 0.629811 -0.424075
2019-01-02 0.593300 -0.356715 0.313949 0.547324
2019-01-03 0.545719 -1.061298 0.578605 -0.290907
2019-01-04 -1.146018 1.585733 0.520032 -0.705019
2019-01-05 -0.773724 0.907562 0.948446 -0.427746
2019-01-06 -0.033501 -1.787833 -1.978037 0.304845
2019-01-07 0.689540 -0.457179 1.584107 1.932602
2019-01-08 1.052232 0.135262 0.246501 0.698567
2019-01-09 0.124396 -1.289378 0.279960 -0.896865
2019-01-10 -1.083088 0.399733 0.903997 -0.738203
A B C D
2019-01-01 1.204552 -0.936226 0.629811 -0.424075
2019-01-02 0.746113 -0.501593 0.392915 0.304474
2019-01-03 0.607378 -0.889081 0.521470 -0.107713
2019-01-04 -0.576164 0.781418 0.520499 -0.510895
2019-01-05 -0.708415 0.865861 0.806976 -0.455233
2019-01-06 -0.257855 -0.905698 -1.052250 0.052182
2019-01-07 0.374031 -0.606548 0.706125 1.306369
2019-01-08 0.826234 -0.111933 0.399662 0.901106
2019-01-09 0.358318 -0.896936 0.319857 -0.297602
2019-01-10 -0.602636 -0.032475 0.709290 -0.591341
窗口函數主要用於經過平滑曲線來反應數據內的趨勢。若是平常數據中有不少變化,而且有不少數據點可用,那麼採樣和繪圖就是一種方法,應用窗口計算並在結果上繪製圖形是另外一種方法。 經過這些方法,能夠平滑曲線或趨勢。io