時間序列分解算法:STL

1. 詳解

STL (Seasonal-Trend decomposition procedure based on Loess) [1] 爲時序分解中一種常見的算法,基於LOESS將某時刻的數據\(Y_v\)分解爲趨勢份量(trend component)、週期份量(seasonal component)和餘項(remainder component):
\[ Y_v = T _v + S_v + R_v \quad v= 1, \cdots, N \]html

STL分爲內循環(inner loop)與外循環(outer loop),其中內循環主要作了趨勢擬合與週期份量的計算。假定\(T_v^{(k)}\)\(S_v{(k)}\)爲內循環中第k-1次pass結束時的趨勢份量、週期份量,初始時\(T_v^{(k)} = 0\);並有如下參數:python

  • \(n_{(i)}\)內層循環數,
  • \(n_{(o)}\)外層循環數,
  • \(n_{(p)}\)爲一個週期的樣本數,
  • \(n_{(s)}\)爲Step 2中LOESS平滑參數,
  • \(n_{(l)}\)爲Step 3中LOESS平滑參數,
  • \(n_{(t)}\)爲Step 6中LOESS平滑參數。

每一個週期相同位置的樣本點組成一個子序列(subseries),容易知道這樣的子序列共有共有\(n_(p)\)個,咱們稱其爲cycle-subseries。內循環主要分爲如下6個步驟:git

  • Step 1: 去趨勢(Detrending),減去上一輪結果的趨勢份量,\(Y_v - T_v^{(k)}\)
  • Step 2: 週期子序列平滑(Cycle-subseries smoothing),用LOESS (\(q=n_{n(s)}\), \(d=1\))對每一個子序列作迴歸,並向前向後各延展一個週期;平滑結果組成temporary seasonal series,記爲$C_v^{(k+1)}, \quad v = -n_{(p)} + 1, \cdots, -N + n_{(p)} $;
  • Step 3: 週期子序列的低通量過濾(Low-Pass Filtering),對上一個步驟的結果序列\(C_v^{(k+1)}\)依次作長度爲\(n_(p)\)\(n_(p)\)\(3\)的滑動平均(moving average),而後作LOESS (\(q=n_{n(l)}\), \(d=1\))迴歸,獲得結果序列\(L_v^{(k+1)}, \quad v = 1, \cdots, N\);至關於提取週期子序列的低通量;
  • Step 4: 去除平滑週期子序列趨勢(Detrending of Smoothed Cycle-subseries),\(S_v^{(k+1)} = C_v^{(k+1)} - L_v^{(k+1)}\)
  • Step 5: 去週期(Deseasonalizing),減去週期份量,\(Y_v - S_v^{(k+1)}\)
  • Step 6: 趨勢平滑(Trend Smoothing),對於去除週期以後的序列作LOESS (\(q=n_{n(t)}\), \(d=1\))迴歸,獲得趨勢份量\(T_v^{(k+1)}\)

外層循環主要用於調節robustness weight。若是數據序列中有outlier,則餘項會較大。定義
\[ h = 6 * median(|R_v|) \]github

對於位置爲\(v\)的數據點,其robustness weight爲
\[ \rho_v = B(|R_v|/h) \]
其中\(B\)函數爲bisquare函數:
\[ B(u) = \left \{ { \matrix { {(1-u^2)^2 } & {for \quad 0 \le u < 1} \cr { 0} & {for \quad u \ge 1} \cr } } \right. \]
而後每一次迭代的內循環中,在Step 2與Step 6中作LOESS迴歸時,鄰域權重(neighborhood weight)須要乘以\(\rho_v\),以減小outlier對迴歸的影響。STL的具體流程以下:算法

outer loop:
    計算robustness weight;
    inner loop:
        Step 1 去趨勢;
        Step 2 週期子序列平滑;
        Step 3 週期子序列的低通量過濾;
        Step 4 去除平滑週期子序列趨勢;
        Step 5 去週期;
        Step 6 趨勢平滑;

爲了使得算法具備足夠的robustness,因此設計了內循環與外循環。特別地,當\(n_{(i)}\)足夠大時,內循環結束時趨勢份量與週期份量已收斂;若時序數據中沒有明顯的outlier,能夠將\(n_{(o)}\)設爲0。api

R提供STL函數,底層爲做者Cleveland的Fortran實現。Python的statsmodels實現了一個簡單版的時序分解,經過加權滑動平均提取趨勢份量,而後對cycle-subseries每一個時間點數據求平均組成周期份量:app

def seasonal_decompose(x, model="additive", filt=None, freq=None, two_sided=True):
    _pandas_wrapper, pfreq = _maybe_get_pandas_wrapper_freq(x)
    x = np.asanyarray(x).squeeze()
    nobs = len(x)
    ...
    if filt is None:
      if freq % 2 == 0:  # split weights at ends
        filt = np.array([.5] + [1] * (freq - 1) + [.5]) / freq
        else:
          filt = np.repeat(1./freq, freq)

    nsides = int(two_sided) + 1
    # Linear filtering via convolution. Centered and backward displaced moving weighted average.
    trend = convolution_filter(x, filt, nsides)
    if model.startswith('m'):
        detrended = x / trend
    else:
        detrended = x - trend

    period_averages = seasonal_mean(detrended, freq)

    if model.startswith('m'):
        period_averages /= np.mean(period_averages)
    else:
        period_averages -= np.mean(period_averages)

    seasonal = np.tile(period_averages, nobs // freq + 1)[:nobs]

    if model.startswith('m'):
        resid = x / seasonal / trend
    else:
        resid = detrended - seasonal

    results = lmap(_pandas_wrapper, [seasonal, trend, resid, x])
    return DecomposeResult(seasonal=results[0], trend=results[1],
                           resid=results[2], observed=results[3])

R版STL分解帶噪音點數據的結果以下圖:ide

data = read.csv("artificialWithAnomaly/art_daily_flatmiddle.csv")
View(data)
data_decomp <- stl(ts(data[[2]], frequency = 1440/5), s.window = "periodic", robust = TRUE)
plot(data_decomp)

statsmodels模塊的時序分解的結果以下圖:
函數

import statsmodels.api as sm
import matplotlib.pyplot as plt
import pandas as pd
from date_utils import get_gran, format_timestamp


dta = pd.read_csv('artificialWithAnomaly/art_daily_flatmiddle.csv',
                                   usecols=['timestamp', 'value'])
dta = format_timestamp(dta)
dta = dta.set_index('timestamp')
dta['value'] = dta['value'].apply(pd.to_numeric, errors='ignore')
dta.value.interpolate(inplace=True)
res = sm.tsa.seasonal_decompose(dta.value, freq=288)
res.plot()
plt.show()

2. 參考資料

[1] Cleveland, Robert B., William S. Cleveland, and Irma Terpenning. "STL: A seasonal-trend decomposition procedure based on loess." Journal of Official Statistics 6.1 (1990): 3.oop

相關文章
相關標籤/搜索