[TOC]python
程序簡介
<font size=3>調用statsmodels模塊對上證指數的收盤價進行ARIMA模型動態建模,ARIMA適合短時間預測,所以輸入爲15個數據,輸出爲1個數據 程序輸入:原序列,須要日後預測的個數 程序輸出:預測序列,模型結構(白噪聲檢驗、單根檢驗、一階差分自相關圖、一階差分偏自相關圖)api
<b>差分整合移動平均自迴歸模型(ARIMA)</b>,ARIMA(p,d,q)中,AR是」自迴歸」,p爲自迴歸項數;MA爲」滑動平均」,q爲滑動平均項數,d爲使之成爲平穩序列所作的差分次數(階數)。</font>app
程序/數據集下載
<font size=3>點擊進入下載地址</font>函數
代碼分析
<font size=3>導入模塊、路徑<br></font>測試
# -*- coding: utf-8 -*- from Module.BuildModel import ARIMA from sklearn.metrics import mean_absolute_error import pandas as pd import numpy as np import matplotlib.pyplot as plt import os #路徑目錄 baseDir = ''#當前目錄 staticDir = os.path.join(baseDir,'Static')#靜態文件目錄 resultDir = os.path.join(baseDir,'Result')#結果文件目錄
<font size=3>讀取數據、分紅訓練集和測試集,查看前5行<br></font>ui
#讀取數據 data = pd.read_csv(staticDir+'/000001.csv',encoding='gbk') train = data['收盤價'].values[-20:-10]#訓練數據 test = data['收盤價'].values[-10:]#測試數據 data.head()
<div> <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }url
.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }
</style> <table border="1" class="dataframe"> <thead> <tr style="text-align: right;"> <th></th> <th>日期</th> <th>股票代碼</th> <th>名稱</th> <th>收盤價</th> <th>最高價</th> <th>最低價</th> <th>開盤價</th> <th>前收盤</th> <th>漲跌額</th> <th>漲跌幅</th> <th>成交量</th> <th>成交金額</th> </tr> </thead> <tbody> <tr> <th>0</th> <td>2020-02-18</td> <td>'000001</td> <td>上證指數</td> <td>2984.9716</td> <td>2990.6003</td> <td>2960.7751</td> <td>2981.4097</td> <td>2983.6224</td> <td>1.3492</td> <td>0.0452</td> <td>311665913</td> <td>3.74998562648e+11</td> </tr> <tr> <th>1</th> <td>2020-02-17</td> <td>'000001</td> <td>上證指數</td> <td>2983.6224</td> <td>2983.6371</td> <td>2924.9913</td> <td>2924.9913</td> <td>2917.0077</td> <td>66.6147</td> <td>2.2837</td> <td>313198007</td> <td>3.67014340129e+11</td> </tr> <tr> <th>2</th> <td>2020-02-14</td> <td>'000001</td> <td>上證指數</td> <td>2917.0077</td> <td>2926.9427</td> <td>2899.5739</td> <td>2899.8659</td> <td>2906.0735</td> <td>10.9342</td> <td>0.3763</td> <td>250650627</td> <td>3.08080368726e+11</td> </tr> <tr> <th>3</th> <td>2020-02-13</td> <td>'000001</td> <td>上證指數</td> <td>2906.0735</td> <td>2935.4060</td> <td>2901.2425</td> <td>2927.1443</td> <td>2926.8991</td> <td>-20.8256</td> <td>-0.7115</td> <td>274804844</td> <td>3.34526327364e+11</td> </tr> <tr> <th>4</th> <td>2020-02-12</td> <td>'000001</td> <td>上證指數</td> <td>2926.8991</td> <td>2926.8991</td> <td>2892.4240</td> <td>2895.5561</td> <td>2901.6744</td> <td>25.2247</td> <td>0.8693</td> <td>248733429</td> <td>2.97534420493e+11</td> </tr> </tbody> </table> </div>spa
<font size=3>調用ARIMA函數進行動態建模,函數自動繪製一階差分自相關圖和一階差分偏自相關圖,即每次預測1個值,持續屢次,計算並打印MAE.net
其中ARIMA函數位置在Module/BuildModel.py,代碼會在下文給出 </font>code
#ARIMA動態建模 yPre = [] for i in range(test.shape[0]): #只預測1個數 result = ARIMA(train,1) yPre.append(result['predict']['value'][0]) #更新訓練集 train = train.tolist()[:-1] train.append(test[i]) train = np.array(train).reshape(-1) #計算MAE MAE = mean_absolute_error(test,yPre) print('偏差MAE',MAE)
<font size=3>這裏給出上文用到的ARIMA函數,直接運行是直接驗證代碼有效性的另外一個簡單實驗,程序會保留白噪聲檢驗、單根檢驗、一階差分自相關圖、一階差分偏自相圖</font>
# -*- coding: utf-8 -*- import os import pandas as pd import numpy as np import statsmodels.api as sm from statsmodels.graphics.tsaplots import plot_acf, plot_pacf from statsmodels.tsa.stattools import adfuller from statsmodels.tsa import arima_model from statsmodels.stats.diagnostic import acorr_ljungbox import warnings warnings.filterwarnings("ignore") def ARIMA(series,n): ''' 只討論一階差分的ARIMA模型,預測,數字索引從1開始 series:時間序列 n:須要日後預測的個數 ''' series = np.array(series) series = pd.Series(series.reshape(-1)) currentDir = os.getcwd()#當前工做路徑 #一階差分數據 fd = series.diff(1)[1:] plot_acf(fd).savefig(currentDir+'/一階差分自相關圖.png') plot_pacf(fd).savefig(currentDir+'/一階差分偏自相關圖.png') #一階差分單位根檢驗 unitP = adfuller(fd)[1] if unitP>0.05: unitAssess = '單位根檢驗中p值爲%.2f,大於0.05,該一階差分序列可能爲非平穩序列'%(unitP) #print('單位根檢驗中p值爲%.2f,大於0.05,認爲該一階差分序列判斷爲非平穩序列'%(unitP)) else: unitAssess = '單位根檢驗中p值爲%.2f,小於0.05,認爲該一階差分序列爲平穩序列'%(unitP) #print('單位根檢驗中p值爲%.2f,小於0.05,認爲該一階差分序列判斷爲平穩序列'%(unitP)) #白噪聲檢驗 noiseP = acorr_ljungbox(fd, lags=1)[-1] if noiseP<=0.05: noiseAssess = '白噪聲檢驗中p值爲%.2f,小於0.05,認爲該一階差分序列爲非白噪聲'%noiseP #print('白噪聲檢驗中p值爲%.2f,小於0.05,認爲該一階差分序列爲非白噪聲'%noiseP) else: noiseAssess = '白噪聲檢驗中p值%.2f,大於0.05,該一階差分序列可能爲白噪聲'%noiseP #print('白噪聲檢驗中%.2f,大於0.05,認爲該一階差分序列爲白噪聲'%noiseP) #BIC準則肯定p、q值 pMax = int(series.shape[0]/10)# 通常階數不超過length/10 qMax = pMax# 通常階數不超過length/10 bics = list() for p in range(pMax + 1): tmp = list() for q in range(qMax + 1): try: tmp.append(arima_model.ARIMA(series, (p, 1, q)).fit().bic) except Exception as e: #print(str(e)) tmp.append(1e+10)#加入一個很大的數 bics.append(tmp) bics = pd.DataFrame(bics) p, q = bics.stack().idxmin() #print('BIC準則下肯定p,q爲%s,%s'%(p,q)) #建模 model = arima_model.ARIMA(series,order=(p, 1, q)).fit() predict = model.forecast(n)[0] return { 'model':{'value':model,'desc':'模型'}, 'unitP':{'value':unitP,'desc':unitAssess}, 'noiseP':{'value':noiseP[0],'desc':noiseAssess}, 'p':{'value':p,'desc':'AR模型階數'}, 'q':{'value':q,'desc':'MA模型階數'}, 'params':{'value':model.params,'desc':'模型係數'}, 'predict':{'value':predict,'desc':'日後預測%d個的序列'%(n)} } if __name__ == "__main__": data = data = np.array([1.2,2.2,3.1,4.5,5.6,6.7,7.1,8.2,9.6,10.6,11,12.4,13.5,14.7,15.2]) x = data[0:10]#輸入數據 y = data[10:]#須要預測的數據 result = ARIMA(x,len(y))#預測結果,一階差分偏自相關圖,一階差分自相關圖 predict = result['predict']['value'] predict = np.round(predict,1) print('真實值:',y) print('預測值:',predict) print(result)
<font size=3>打印創建的股票模型假設檢驗結論,能夠看到序列可能爲白噪聲,可是仍然能夠強行建模</font>
#打印模型結構 print(result['noiseP']['desc']) print(result['unitP']['desc'])
白噪聲檢驗中p值0.98,大於0.05,該一階差分序列可能爲白噪聲 單位根檢驗中p值爲0.99,大於0.05,該一階差分序列可能爲非平穩序列
<font size=3>觀測值預測值對比可視化</font>
#可視化 plt.clf()#清理歷史繪圖 #用來正常顯示中文標籤 plt.rcParams['font.sans-serif']=['SimHei'] #用來正常顯示負號 plt.rcParams['axes.unicode_minus']=False plt.plot(range(test.shape[0]),yPre,label="預測值") plt.plot(range(test.shape[0]),test,label="觀測值") plt.legend() plt.title('ARIMA預測效果,MAE:%2f'%MAE) plt.savefig(resultDir+'/ARIMA預測效果.png',dpi=100,bbox_inches='tight')