statsmodels.tsa.arima_model預測時報錯TypeError: int() argument must be a string, a bytes-like object or a

時間 2019-11-06

標籤 statsmodels.tsa.arima statsmodels tsa arima model 預測報錯 typeerror int argument string bytes object 简体版

原文原文鏈接

在 python 中用 statsmodels建立 ARIMA 模型進行預測時間序列：python

import pandas as pd
import statsmodels.api as sm

df = pd.read_csv("data.csv", index_col=0, parse_dates=True)

mod = sm.tsa.statespace.SARIMAX(df['price'], enforce_stationarity=False, enforce_invertibility=False)

res = mod.fit()
res.get_prediction(start=pd.to_datetime('2018-1-1'))

運行後報錯：api

TypeError: int() argument must be a string, a bytes-like object or a number, not 'Timestamp'

這種狀況的緣由是，讀入的時間序列數據的時間沒有統一的間隔，例如打印mod._index的結果是搜索引擎

DatetimeIndex(['2016-01-01', '2016-01-08', '2016-01-15', '2016-01-22',
               '2016-01-30'],
              dtype='datetime64[ns]', name='date', freq=None)

其中2016-01-30是距離前一個時間8天，其它間隔爲7天。能夠看到這個 DatetimeIndex 的 freq 是 None 類型。
而若是將最後一天修改成2016-01-29，那麼mod._index的結果是：spa

DatetimeIndex(['2016-01-01', '2016-01-08', '2016-01-15', '2016-01-22',
               '2016-01-29'],
              dtype='datetime64[ns]', freq='W-FRI')

可是此時還會報錯debug

KeyError: 'The `start` argument could not be matched to a location related to the index of the data.'

這是因爲get_prediction的 start 參數必須是在時間序列中出現過的時間。code

debug 經驗++：使用庫時，由於層層調用，有時趕上問題光看報錯信息解決不了，而調用的代碼又沒寫錯，那麼頗有可能就是數據的問題了。雖然搜索引擎很好用，可是對於有些小問題來講，可能會變成盲目地在互聯網大海撈針。對於開源的庫，能夠看看有沒有相似的別人提過的 issue ，有時候確實是庫的bug。本身定位問題還有個辦法是對比正確完整的例子，找不一樣點。索引

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。