Time Series 3

분석/통계

Time Series 3

statart 2019. 3. 14. 00:42

< ARIMA model >

- exponential smoothing은 데이터에서 trend와 seasonality를 설명하는 것을 기본으로 한다.

- ARIMA는 데이터의 autocorrelations를 설명하는 것을 목표로 한다.

→ 둘은 상호보완적으로 다루어진다.

□ Stationarity and differencing

- stationary time series : 관측된 시기에 영향을 받지 않는다.

→ white noise (trend, seasonality가 없는 상태)

cyclic behaviour를 가지고 있더라도 trend나 seasonality가 존재하지 않으면 stationary 한 것이다.

∵ 사이클은 고정된 길이를 가지고 있지 않기 때문이다.(관측하기 전에 고점, 저점을 확신할 수 없음)

- differencing : non-stationary → stationary

→ 연속적인 관측값 사이의 차이를 계산하라.

로그변환은 ts data에서 variance를 안정화함

Differencing은 mean을 안정화한다.(trend와 seasonality를 감소시킨다.)

- Random walk model

ㆍdifferenced series :

ㆍrandom walk model : (white noise)

→ 장기간에 걸친 분명한 상승/하강 trend

갑작스럽거나 예상치못한 방향의 변화

예측값은 미래의 움직음을 예측할 수 없기 때문에 마지막 관측값과 같다.

- Second-order differencing : 한번으로 stationary가 안되어 한 번 더 필요할 때

- Seasonal differencing : (lag-m differences)

- differencing이 사용되었을 때, differences에 대해 설명가능해야 한다.

- Unit root test : differencing이 필요한지를 판단하는 객관적 방법

ㆍH0 : the data are stationary.

cf) ur.kpss( ) / library(urca)

ndiffs( ) : 몇 번의 difference를 추천하는가?

nsdiffs( ) : seasonal data

□ Backshift notation

- (B operating on y_t )

- back two periods :

- describing the process of first differencing :

- second order differences :

- a seasonal difference followed by a first difference can be written as :

□ Autogressive models

- In an autoregression model, we forecast the variable of interest using a linear combination of past values of the variable.

: 그 자체에 대응하는 변수의 회귀이기 때문에 autoregression이라는 용어를 사용

- an autoregressive model of order p

: (white noise)

: AR(p) model

ㆍremarkably flexible at handling a wide range of different time series patterns.

ㆍ계수의 변화에 따라 다른 패턴을 내놓는다.

ㆍ오차는 오로지 series의 scale만을 변화시키고, 패턴 그 자체는 변화시키지 않는다.

- 모델을 stationary한 데이터로 제한하기 위해 일반적으로 계수의 범위를 제한한다.(보통 -1~1 사이, p가 증가할수록 제약도 복잡해짐)

□ Moving Average models

- 과거의 값을 사용하기보다, 과거 예측의 오차를 사용하는 모델

: MA(q) model

- 입실론의 variance는 오로지 series의 scale만을 변화시키고, 패턴 그 자체는 변화시키지 않는다.

- AR(p) 모델은 MA()로 표현할 수 있다.

- MA(q) 모델 또한 AR()로 표현이 가능

1) abs(theta) > 1 : 시점으로부터 먼 값에 가중치를 더 준다.

2) abs(theta) = 1 : 가중치가 모두 일정함

3) abs(theta) < 1 : 시점으로부터 가까운 값에 가중치를 더 준다.(invertible)

□ Non-seasonal ARIMA models (Integration)

- AR + MA : Non-seasonal ARIMA models

- ARIMA(p, d, q) model

ㆍp = order of the AR part

ㆍd = degree of first differencing involved

ㆍq = order of the MA part

1) ARIMA(0, 0, 0) : White noise

2) ARIMA(0, 1, 0) with no constant : Random walk

3) ARIMA(0, 1, 0) with constant : Random walk with drift

4) ARIMA(p, 0, 0) : AR(p)

5) ARIMA(0, 0, q) : MA(q)

- Backshift notation

→ R에서는 다른 표기법을 쓰기도 함 (차분 부분의 평균이동으로)

적절한 p,d,q를 찾는 것은 어렵지만, R에서 auto.arima( ) 를 이용하면 자동으로 이 값을 얻을 수 있다.

- c : long-term forecasts에 영향을 미친다.

- d : prediction intervals에 영향을 미친다. (the higher d , the more rapidly the PI increase in size)

ㆍd=0 → Long-term forecast sd 는 historical data의 sd와 같아진다. 그래서 PI 또한 동일해질 것이다.

- p : cycle이 있는 경우 중요하다.

ㆍcyclic forecasts를 얻으려면 p는 반드시 2 이상의 값이어야 한다.

□ ACF and PACF plots

- partial autocorrelations : t번째와 t-k번째의 관계를 살필 때, 사이의 각 단계에서의 효과를 모두 제거하여 측정 (중복효과 방지를 위함)

- k번째 partial autocorrelations 의 값은 AR(k)의 phi_k를 추정하는 것과 동일하다.

□ Estimation and order selection

- MLE

- AIC, BIC는 모델의 적절한 differencing order를 선택하는 가이드를 주지 않는다.

단지 p, q를 선택하는데 도움을 준다.

□ Point forecast

□ Prediction Interval

- differencing이 0이고 가정을 만족하면(stationary), 모든 구간에서 PI가 거의 같다.(수렴한다.)

- d > 1 이면 PI는 지속적으로 증가할 것이다.

- ARIMA-based model은 보통 아주 좁은 PI를 가지는데 이는 errors가 설명하는 variation만을 고려하기 때문이다.

parameters를 추정하거나, 모델의 order에 대한 variation은 고려하지 않는다.

또한 historical patterns가 미래에도 계속될 것이라는 가정이 필요하다.

□ Seasonal ARIMA models

- including additional seasonal terms in ARIMA models(P,D,Q)

- 차분이 들어간 모형이면 장기간 예측에서 PI는 지속적으로 커질 가능성이 높음(즉, 긴 기간에 대한 예측정확성이 낮음)

→ 그렇다면 단기간 예측에는 오히려 별 상관없이 쓸 수 ㅇ

- variance가 증가하는 경향을 보일때 log변환은 적절한 방법

□ Test set evaluation

- AICc 값으로 모델을 비교할 땐 같은 differencing order를 가지는 것이 중요했지만,

test set을 사용할 땐 그다지 중요하지 않다. 이러한 비교는 항상 타당하다.

□ ARIMA vs ETS

- linear exponential smoothing model : special case of ARIMA

- the non-linear exponential smoothing model : 적절한 ARIMA를 찾을 수 없음

모든 ETS model은 non-stationary 지만 일부 ARIMA는 stationary 하다.

< Dynamic regression models >

: we consider how to extend ARIMA models in order to allow other information to be included in the models.

we will allow the errors from a regression to contain autocorrelation.

error terms ~ ARIMA

→ 결국 모형은 두개의 오차항을 가지게 된다.(그중 ARIMA의 오차만 white noise를 가정)

- 기존의 방식(LSE) 추정시 발생하는 문제 ( 추정 시)

1) 추정된 계수는 best estimates가 아님 (계산 과정에서 몇몇 정보들은 무시된다)

2) 통계적 가설검정의 결과를 신뢰할 수 없음

3) AIC 값이 예측력을 평가하는데 좋은 지표가 아니게 됨

4) spurious regression : 계수의 연관성과 관련하여 낮은 p-value를 출력하게 되어, 중요하지 않은 변수 또한 중요하다고 판단하게 될 수 있음

→ ARIMA의 오차항을 최소화하는 계수를 찾으면 이 문제들을 피할 수 있다.

대안으로 MLE를 사용할 수 있다.

★ 가장 중요한 고려사항은 모델의 모든 변수들이 stationary 해야한다. (아닐경우 not consistent and not meaningful)

단, non-stationary 한 변수들의 조합이 stationary 하다면 예외이다. (?)

→ 따라서 우선적으로 모델안의 non-stationary variables에 대해 difference 해야한다.

(필요한 경우엔 모든 변수들에 대해 first difference를 수행할 수도 있다.)

모델 안의 모든 변수들이 stationary하다면 ARMA errors for the residuals를 고려해야한다.

- Regression with ARIMA errors in R

□ Stochastic and deterministic trends (linear trend)

- Stochastic : 트렌드가 변할 것이다.

차분 필요함

PI가 크게 측정됨 (보수적)

- deterministic : 트렌드가 변하지 않을 것이다.

차분 불필요

PI가 낮게 측정됨 (트렌드가 변하면 틀림)

- 둘의 추정 계수(+ 예측값)은 비슷하되 PI는 아주 다를 수 있음

□ Dynamic harmonic regression

- Long seasonal periods : 종종 Fourier terms를 가진 dynamic regression이 나은 경우가 있음.

- 단점은 seasonality가 변하지 않는다고 가정하는 것이다.(실제로도 별로 변하지 않음)

□ Lagged predictors

: 변수에 의한 영향이 즉각 반영되지 않고, 서서히 반영되는 경우

stats::lag( ) 함수를 통해 변수를 밀 수 있음