This guided tour contains mathematical formulas and/or Greek symbols and are therefore best viewed with Internet Explorer, Opera or Google Chrome, because other web browsers (in particular Firefox) may not display the "Symbol" fonts involved. For example, "b" should be displayed as the Greek letter "beta" rather than the Roman "b". If not, reload this guided tour in Internet Explorer, Opera or Google Chrome, or make one of them your default web browser.

ARIMA stands for **A**uto**R**egressive **I**ntegrated **M**oving **A**verage.
The ARIMA modeling and forecasting approach is also known as the Box-Jenkins approach.

I will discuss the "**I**" in ARIMA later. For the time being is suffices to note that an ARIMA(*p*,0,*q*) process is the same as an ARMA(*p,q*) process.

As is well known (if not to you, stop here and don't use module ARIMA!), the general form of an ARMA(*p,q*) process *y*(*t*) is:

where the *e*(*t*)'s are independently distributed with zero expectation and
variance s^{2}, and m is a constant. Thus, the *p*
in "ARMA(*p,q*)" is the maximum lag of the AR part, and *q*
is the maximum lag of the MA part.

This model can be written more compactly in terms of lag polynomials and lag operators. Define the lag operator *L* as:

etcetera. Then we can write:

say, where

and similarly

say, where

Thus, the ARMA(*p,q*) model involved can now be written as:

If the roots of *a _{p}*(

then [*a _{p}*(

where
|r _{j}| < *c*r ^{j} for some constant *c* and a r Î (0,1).
If so, we can write the ARMA model as a stationary MA(¥) process:

Similarly, if the roots of *b _{q}*(

where [*b _{q}*(

so that

with d = m/*b _{q}*(1).
Thus

which is the best one-step-ahead forecast of *y*(*t*).

A time series process is called I(*d*) if we need to apply at least *d* times the first difference operator

to make the process stationary. Now a time series process *x*(*t*) is an
ARIMA(*p,d,q*) process if

where *y*(*t*) is a stationary ARMA(*p,q*) process.

The time series *y*(*t*) I shall use has been artifically generated, as follows:

where *e*(*t*) is i.i.d. standard normally distributed. This is an
ARIMA(1,1,5) process:

The data involved is available as CSV file ARIMADATA.CSV,
with *y*(*t*) = "ARIMA test data". The data should be interpreted as quarterly time series,
starting from quarter 1950.1. Note that this data file has been created under US number setting.
Hence, if your Windows uses a comma as decimal delimiter you have to convert it to your local number setting.
See the guided tour on importing Excel files in CSV format.

Note that the MA lag polynomial *b*_{5}(*L*) = (1 + 0.5*L*)(1 - 0.25*L*^{4})
is specified as the product of a **non-seasonal** lag polynomial
*b _{ns}*

Import the data file ARIMADATA.CSV in EasyReg, and declare it quarterly time series, with first year 1950 and first quarter 1.

Next, open "Menu > Single equation models > ARIMA estimation and forecasting". Then the following window appears.

Click "Continue":

In order to conduct out of sample forecasting, either append the data with missing values (via "Menu > Input > Prepare time series for forecasting"), or select a subset of observations. I will choose the latter. Thus, click "Yes":

Choose the subsample 1950.1 through 1997.1. Then the ARIMA model will be fitten to this subsample, and the observations after 1997.1 will be used to compare forecasts and realizations.

Click "Bounds OK", and then "Confirm" and "Continue" (in the next window). Then the following window appears:

You now have to tell EasyReg what the order of integration ("*d*") is. Usually
you do not know this in advance. If so, test the unit root hypothesis, via "Menu > Data analysis > Unit root tests (root 1)".
If you don't know what a unit root is, please read my lecture
notes on unit roots. If after reading these lecture notes you still
don't understand what a unit root is and how to test for it, click "Don't know".

In our case *d* = 1, as indicated. Thus click "1 times OK":

Although in our case the process D*y*(*t*) has zero expectation so that there is no need for an intercept, in practice this is rare. Therefore, in first instance include an intercept in your model, and test afterwards whether the parameter involved is zero. Thus, click "Continue":

Now you have to specify the ARMA process for
*u*(*t*) = D*y*(*t*)
- *E*[D*y*(*t*)].
The coefficients a(1,*i*) are the non-zero coefficients of the **non-seasonal** AR lag polynomial, and the coefficients c(1,*i*) are the non-zero coefficients of the **seasonal** AR lag polynomial. Similarly, the coefficients a(2,*i*) are the non-zero coefficients of the **non-seasonal** MA lag polynomial, and the coefficients c(2,*i*) are the non-zero coefficients of the **seasonal** MA lag polynomial. If your data consist of annual time series the option of specifying seasonal lag polynomials is not available.

In our case (1 - 0.7*L*)*u*(*t*) = (1 + 0.5*L*)(1 - 0.25*L*^{4})*e*(*t*), hence

- a(1,1) = 0.7
- a(2,1) = -0.5
- c(2,1) = 0.25

For more advanced time series analysis, see for example:

- James D. Hamilton:
*Time Series Analysis*, Princeton University Press, 1994.

Now click "Specification OK":

The only action required is to click "Continue":

The model parameters will be estimated by minimizing
å_{t}*e*(*t*)^{2},
using the simplex method of Nelder and Mead. Click first "Simplex method: How it works, and stopping rules".

I recommend to use in first instance the default stopping rules. After completing the first iteration round you may wish to decrease the value of "r". Thus click "Stoppings rules OK". Then the previous window reappears. Click "Start SIMPLEX iteration":

In the current version of module ARIMA the simplex method is restarted until the parameters do not change anymore. As a double check, check "Auto restart" and restart the simplex iteration. Then click "Simplex method: How it works, and stopping rules" again, and decrease the value of "r":

Click "Stopping rules OK":

Check "Auto restart" and click "Restart SIMPLEX iteration", and then click "Done with SIMPLEX iteration":

Click "Continue":

This window is similar to the "What to do next" window when you run an OLS regression. See the guided tour on OLS estimation. It contains the estimation results, and an options menu.

Click the "Options" button:

Recall that the true values of the parameters are:

- b(1) = 0
- a(1,1) = 0.7
- a(2,1) = -0.5
- c(2,1) = 0.25

In order to see whether the estimates are significantly different from the true values, test the null hypothesis involved using the "Test parameter restrictions" option. The procedure is the same as for OLS (see the guided tour on OLS estimation), and therefore I will not demonstrate it again how to conduct this test, but only show the results:

The test result is as expected: The parameter estimates are not significantly different from the true values, at any reasonable significance level.

This window does not need explanation.

Recall that the best one-step ahead forecast of D*y*(*t*) takes the form

where the parameters g _{j} and d can be derived from the parameters of the ARMA model for
D*y*(*t*). See my
Lecture notes on forecasting.
Therefore, the best one-step ahead forecast of *y*(*t*) itself takes the form

Thus both forecast schemes use all the data up to time *t*-1. The option
"One-step ahead forecasts" generates these forecasts:

Click "Continue":

This picture displays *E*_{t-1}[D*y*(*t*)] on the vertical axis and its realisation D*y*(*t*) on the horizontal axis, for *t* = 1997.2 to 1999.4. The closer the points
(D*y*(*t*), *E*_{t-1}[D*y*(*t*)]) are to the (45 degrees) line, the better the forecasts.

Click "Continue":

The top panel displays the plots of D*y*(*t*) (solid line) and its forecasts *E*_{t-1}[D*y*(*t*)]
(dotted red line) for *t* = 1997.2 to 1999.4. The bottom panel plots the forecast errors
*y*(*t*) - *E*_{t-1}[D*y*(*t*)]

Click "Continue":

This picture displays *E*_{t-1}[*y*(*t*)] on the vertical axis and its realisation *y*(*t*) on the horizontal axis, for *t* = 1997.2 to 1999.4. The closer the points
(*y*(*t*), *E*_{t-1}[*y*(*t*)]) are to the (45 degrees) line, the better the forecasts.

Click "Continue":

The top panel displays the plots of *y*(*t*) (solid line) and its forecasts *E*_{t-1}[*y*(*t*)]
(dotted red line) for *t* = 1997.2 to 1999.4. The bottom panel plots the forecast errors
*y*(*t*) - *E*_{t-1}[*y*(*t*)]

Click "Continue". Then you will return to the "What to do next?" window.

In recursive forecasting, the unknown D*y*(*t+h-j*)'s
in the one-step ahead forecast scheme

are replaced recursively by forecasts, which then yields the best *h*-step ahead forecast:

See my
Lecture notes on forecasting.
Thus, these forecasts only use the information up to time *t* = 1997.1.
The corresponding recursive level forecast of *y*(*t+h*) is then

The results in the above windows indicate that recursive *h* step ahead forecasting only yields reasonable forecasts for modest values of *h*. This corresponds to the fact that

This is what you see happening in the last window.

Recall that the best one-step ahead forecast of D*y*(*t*+1) takes the form

In the next window the coefficients a(*j*) = g _{j+1} are plotted, and their values displayed.

This picture compares the nonparametric kernel estimator of the density of *e*(*t*) with the corresponding normal density. Note that nonparametric kernel density estimation lacks "parametric backbone" and therefore needs much more data than parametric density estimation. The effective sample size in this case is too small to do reliable nonparametric estimation.

An ARMA model can also be estimated
via the linear regression (OLS) module, using the option "Re-estimate the model with ARMA errors".
See the guided tour on OLS estimation. This option produces the same
parameter estimates as the ARIMA module. However, if you estimate an AR model directly via
the linear regression (OLS) module, without using the option
"Re-estimate the model with ARMA errors", the estimation results for the intercept, and eventually
time trend and seasonal dummy parameters, will differ from the corresponding results obtained
via the ARIMA module under review. The reason is the following.
The ARIMA module estimates an AR(*p*) model with intercept in the form

*y _{t}* = m +

*u _{t}* =
a

where m = *E*[*y _{t}*] and

*y _{t}* = a

The intercept a_{0} will be different from
m, because taking expectations in the latter case yields
_{0} +
a_{1}m
+ .... + a_{p}m,

m = (1 - a_{1} - ....
- a_{p})^{-1}a_{0}.