Guided tour on ARIMA model selection via information criteria

This guided tour contains mathematical formulas and/or Greek symbols and are therefore best viewed with Internet Explorer, Opera or Google Chrome, because other web browsers (in particular Firefox) may not display the "Symbol" fonts involved. For example, "b" should be displayed as the Greek letter "beta" rather than the Roman "b". If not, reload this guided tour in Internet Explorer, Opera or Google Chrome, or make one of them your default web browser.

ARIMA stands for AutoRegressive Integrated Moving Average. The ARIMA modeling and forecasting approach is also known as the Box-Jenkins approach.

Non-seasonal ARMA(p,q) processes

I will not discuss here the "I" in ARIMA. This is done in the guided tour on ARIMA estimation and forecasting.

In this guided tour I will explain how to select the orders p and q of an ARIMA(p,d,q) process, given the value of d. Recall that an ARIMA(p,0,q) process is the same as an ARMA(p,q) process.

As is well known (if not to you, stop here and don't use module ARIMAMODSEL!), the general form of an ARMA(p,q) process y(t) is:

y(t) = a1y(t-1) + .... + apy(t-p) + m + e(t) - b1e(t-1) - .... - bqe(t-q)

where the e(t)'s are independently distributed with zero expectation and variance s2, and m is a constant. Thus, the p in "ARMA(p,q)" is the maximum lag of the AR part, and q is the maximum lag of the MA part.

This model can be written more compactly in terms of lag polynomials and lag operators. Define the lag operator L as:

L.y(t) = y(t-1)
L2y(t) = y(t-2)
Lpy(t) = y(t-p)

etcetera. Then we can write:

y(t) - a1y(t-1) - .... - apy(t-p) = y(t) - a1L.y(t) - .... - apLpy(t) = ap(L)y(t)

say, where

ap(L) = 1 - a1L - .... - apLp

and similarly

e(t) - b1e(t-1) - .... - bqe(t-q) = e(t) - b1L.e(t) - .... - bqLqe(t) = bq(L)e(t)

say, where

bq(L) = 1 - b1L - .... - bqLq

Thus, the ARMA(p,q) model involved can now be written as:

ap(L)y(t) = m + bq(L)e(t).

Seasonal ARMA processes

In the case of seasonal time series, for example quarterly time series, there may be seasonal effects in the AR and/or MA lag polynomials. The model then becomes, for example,

ap(L)cs1(L4)y(t) = m + bq(L)ds2(L4)e(t),


cs1(L4) = 1 - g1L4 - g2L8 - .... - gs1L4.s1

is the seasonal AR lag polynomial, and

ds2(L4) = 1 - d1L4 - d2L8 - .... - ds2L4.s2

is the seasonal MA lag polynomial.

ARMA model selection on the basis of information criteria

As explained here and in my lecture notes on forecasting, the AR and MA orders p and q, respectively, and in the case of seasonal time series the seasonal AR and MA orders s1 and s2 as well, can be estimated consistently on the basis of the Hannan-Quinn and Schwarz information criteria. However, this is not a foolproof method, in particular in the presence of seasonal AR and MA polynomials, because it only works well for long time series.

A non-seasonal ARMA example

To demonstrate how this approach works, I have generated 500 observations on the stationary ARMA(1,1) process

yt = 0.5yt-1 + et + 0.5et-1

where the errors et are i.i.d. standard normally distributed. The data involved is available in (US style) CSV format, as ARIMAMODSELDATA.CSV, which should be intepreted as "annual" time series, starting from "year" 1.

Once you have imported this data file, open Menu > Single equation models > ARIMA model selection via information criteria. Then EasyReg opens with the following window:


Click "Continue":


We are not going to select a subsample. Thus, click "No", "Confirm" and "Continue":


Since the time series is stationary, there is no need for differencing. Of course, you need to verify this by conducting unit root tests first.

Click "0 times OK":


You do not know in advance whether an intercept is needed or not. Therefore, leave the intercept, and click "Continue":


Now you have to specify the maximum values of p and q. I will choose max p = a(1,3) = 3 and max q = a(2,3) = 3.

Click "Specification O.K.":


Click "Continue":


Click "Start". Then the Akaike, Hannan-Quinn and Schwarz information criteria will be computed for all combinations of p = 0,1,2,3 and q = 0,1,2,3:


All three criteria select the true ARMA(1,1) model:


A seasonal ARIMA example

To demonstrate that this approach is not foolproof, let us consider the data used in the guided tour on ARIMA estimation and forecasting. Recall that this data was generated as

(1 - 0.7L)(1 - L)y(t) = (1 - 0.7L)Dy(t) = (1 + 0.5L)(1 - 0.25L4)e(t),

where e(t) is i.i.d. standard normally distributed, and t = 1,2,....,200. This is a seasonal ARMA process in Dy(t), with p = q = 1, s1 = 0 and s2 = 1. The maximum values of there orders have been choosen as follows: p a(1,2) = 2, q a(2,2) = 2, s1 c(1,1) = 1, s2 c(2,1) = 1:


Now follow the same steps as before:




Model 8 is the optimal model indicated by the Hannan-Quinn and Schwarz information criteria, which is an ARMA(2,1) process in Dy(t), without seasonal effects. Although these two citeria generate consistent estimates of the true orders, the sample size of 200 is too small for the consistency to kick in.


Model 26 is the optimal model indicated by the Akaike information criterion, which corresponds to a seasonal ARMA model for Dy(t), with p = 2, q = 1, s1 = 0 and s2 = 1. This is close to the truth: p = q = 1, s1 = 0 and s2 = 1. This result is what we should expect, because the Akaike information criterion is known to overshoot the true orders.

This example shows that if the time series is rather short one should not rely too much on the consistency of the Hannan-Quinn and Schwarz information criteria.

This is the end of the guided tour on ARIMA model selection via information criteria