18 Forecast

Two types of forecasts:

In sample (prediction): The expected value of the random variable (in-sample), given the estimates of the parameters.
Out of sample (forecasting): The value of a future random variable that is not observed by the sample.

18.1 One-Step-Ahead Forecasts

Consider a model

\[ \begin{split} & y_t = \delta_0 + \alpha_1y_{t-1}+ \gamma_1z_{t-1} + u_t \\ & \E(u_t\mid \Ical_{t-1}) = 0 \\ & \var(u_t\mid \Ical_{t-1}) = \sigma^2 , \end{split} \]

Given information at time \(t,\) a one-step-ahead forecast of \(y_{t+1}\) is denoted as \(f_{t}\)

The forecast error is

\[ e_{t+1} = y_{t+1}-f_t. \]

To get a point estimate, \(\hat y_{t+1},\) we need a cost function to judge various alternatives. This cost function is called the loss function. A popular loss function is the mean squared error (MSE), which is quadratic and symmetric. We can alternatively use asymmetric functions, for example, functions that penalize positive errors more than negative errors.

The MSE, given \(\Ical_t\) is

\[ MSE(u_{t+1}) = \E(u_{t+1}^2\mid \Ical_t) = \E\left[(y_{t+1}-f_t)^2\mid \Ical_t\right] . \] An optimal choice of \(f_t\) which minimizes the expected squared forecast error is given by:

\[ f_{t} = \underset{f_t}{\text{argmin}}\; \E\left[(y_{t+1}-f_t)^2\mid \Ical_t\right] \] The FOC implies

\[ \begin{split} &\E[2(y_{t+1}-f_t)\cdot (-1)] = 0 \\ \Rightarrow \; &f_{t} = \E(y_{t+1}\mid \Ical_{t}) . \end{split} \]

The optimal point forecast under MSE is the conditional mean.

\[ \hat y_{t+1} = \E(y_{t+1}\mid \Ical_{t}) . \]

Different loss functions lead to different optimal forecast. For example, for the MAE, the optimal point forecast is the median.

Suppose that the parameters are known, then

\[ \E(y_{t+1}\mid \Ical_{t}) = \delta_0 + \alpha_1y_{t}+ \gamma_1z_{t} + \E(u_{t+1}\mid \Ical_t) , \] since \(\E(u_{t+1}\mid \Ical_t) = 0,\) the forecast of \(y_{t+1}\) at time \(t\) is given by:

\[ \E(y_{t+1}\mid \Ical_{t}) = \delta_0 + \alpha_1y_{t} + \gamma_1z_{t} . \] Plug in the OLS estimators for parameters, we have

\[ \hat{f}_{t} = \hat\delta_0 + \hat\alpha_1 y_{t} + \hat\gamma_1z_{t} . \]

\(\hat{f}_{t}\) is called a point forecast.

Forecasting steps:

Estimation of parameters

Estimate the following model

\[ y_t = \delta_0 + \alpha_1y_{t-1}+ \gamma_1z_{t-1} + u_t , \]

we get \(\hat \delta_0,\) \(\hat \alpha_1,\) and \(\hat \gamma_1.\)
Evaluation in-sample (Prediction)

For \(t=1,2,\ldots,T\) \[ \hat y_t = \hat \delta_0 + \hat \alpha_1y_{t-1}+ \hat \gamma_1z_{t-1} \]
Evaluation out-of-sample (Forecast)

Forecast starts from \(T+1.\)

\[ \hat f_T = \hat y_{T+1} = \hat\delta_0 + \hat\alpha_1 \hat y_{T} + \hat\gamma_1z_{T} \]

The forecast error is

\[ e_{T+1} = y_{T+1} - \hat f_T. \]

The variance of the 1-step forecast error is

\[ \var(e_{T+1}) = \sigma^2 + \var(\hat f_T) , \] where

\(\sigma^2\) is the variance of the regression

It comes from the sampling errors in the estimators \(\hat \delta_0,\) \(\hat \alpha_1,\) and \(\hat \gamma_1,\) and if there were exogeneous variables, they are forecasted perfectly.
\(\var(\hat f_T)\) is the variance of the forecast.

A 95% forecast interval is

\[ \hat f_T \pm 1.96 \cdot \text{se}(e_{T+1}) , \] where \[ \text{se}(e_{T+1}) = \left[\sigma^2 + \var(\hat f_T) \right]^{1/2} . \]

\(\var(\hat f_T)\) is roughly proportional to \(1/T,\) and is usually small relative to the uncertainty in the error, as measured by \(\sigma^2.\)

18.2 Multiple-Step-Ahead Forecasts

Consider the model

\[ \begin{align} & y_t = \alpha + \rho y_{t-1} + u_t \label{eq-AR} \\ & \E(u_t\mid \Ical_{t-1}) = 0 \nonumber , \end{align} \] where \(\Ical_{t-1}=\lbrace y_{t-1}, y_{t-2}, \ldots\rbrace.\)

Plug in \(y_{t-1}\) into \(\eqref{eq-AR}\).

\[ y_{t-1} = \alpha + \rho y_{t-2} + u_{t-1} \]

We have

\[ y_t = (1+\rho)\alpha + \rho^2 y_{t-2} + u_t + \rho u_{t-1} + \rho^2 u_{t-2} . \]

By \(h\) repeated substitutions, the MA representation of \(y_t\) is \[ y_t = \alpha \cdot \frac{1-\rho^h}{1-\rho} + \rho^h y_{t-h} + \sum_{j=0}^{h-1}\rho^ju_{t-j} \]

Step	Forecast	Forecast Error
\(h=1\)	\(f_T = \E\left(y_T(1)\mid \Ical_T\right) = \alpha + \rho\, y_{T}\)	\(e_T(1)=u_{T+1}\)
\(h=2\)	\(f_{T}(1) = \E\left(y_T(2)\mid \Ical_T\right) = \alpha \frac{1-\rho^2}{1-\rho} + \rho^2\, y_{T}\)	\(e_T(2)=u_{T+2} + \rho u_{T+1}\)
\(h=3\)	\(f_{T}(2) = \E\left(y_T(3)\mid \Ical_T\right) = \alpha \frac{1-\rho^3}{1-\rho} + \rho^3\, y_{T}\)	\(e_T(2) = u_{T+3} + \rho u_{T+2} + \rho^2 u_{T+1}\)

where

\(\E[y_T(h)\mid \Ical_T]\) is the \(h\)-step-ahead forecast at time \(T\) and

\[ \E[y_T(h)\mid \Ical_T] = \alpha \frac{1-\rho^h}{1-\rho} + \rho^h y_{T} \]
\(e_T(h)\) is the forecast error.

\[ e_T(h) = \sum_{\ell=0}^{h-1} \rho^\ell\, u_{T+h-\ell} \]
Variance of the forecast error

Given that \(u_t\) are uncorrelated, \[ \var\left[e_T(h)\right] = \sum_{\ell=0}^{h-1} \rho^{2\ell}\, \var(u_t) . \]

Because \(\rho^2>0,\) the forecase error variance increases with \(h.\)

Note that \(\sum_{\ell=0}^{h-1} \rho^{2\ell}\) is a geometric series with the common ratio \(q=\rho^2\).
- When \(\rho^2<1,\)
  
  \[ \sum_{\ell=0}^{h-1} \rho^{2\ell} = 1+\rho^2+\rho^4+\cdots+\rho^{h-1}= \frac{1-\rho^{2h}}{1-\rho^2} \]
  
  as \(h\) gets large the forecast error variance converges to
  
  \[ \lim_{h\to\infty} \var\left[e_T(h)\right] = \sigma^2 \lim_{h\to\infty}\sum_{\ell=0}^{h-1} \rho^{2\ell} = \frac{\sigma^2}{1-\rho^2} , \]
  
  which is the unconditional variance of \(y_t.\)
- When \(\rho=1\), in case of a random walk,
  
  \[ \begin{align}\label{eq-RW} y_t = \alpha + y_{t-1} + u_t \end{align} \]
  
  Plug in \(y_{t-1}\), we have
  
  \[ y_t = 2\alpha + y_{t-2} + u_{t-1} + u_t . \]
  
  By \(h\) substituion, \(\eqref{eq-RW}\) can be written as
  
  \[ y_t = \alpha h + y_{t-h} + \sum_{j=0}^{h-1} u_{t-j} . \]
  
  The \(h\)-step-ahead forecast is
  
  \[ \E[y_T(h)\mid \Ical_T] = \alpha h + y_{T} , \]
  
  and the forecast error is
  
  \[ e_T(h) = \sum_{\ell=0}^{h-1} u_{T+h-\ell} . \]
  
  The variance of the forecast error is
  
  \[ \var\left(e_T(h)\right) = \sigma^2 h . \]
  
  That is, the forecast variance grows without bound as the horizon \(h\) increases.
  
  This demonstrates that it is very difficult to forecast a random walk, with or without drift, far out into the future. For example, forecasts of interest rates farther into the future become dramatically less precise.

Once we have estimated the parameter, we can obtain the \(h\)-step-ahead forecast at time \(T\)

\[ \hat{f}_T(h) = \hat{\alpha} \frac{1-\hat\rho^h}{1-\hat\rho} + \hat\rho^h y_{T} . \]

The standard error of \(\hat{f}_T(h)\) is small compared with the standard error of the error term.

For \(h=2\), we approximate the 95% confidence interval (for large \(T\)) as

\[ \hat{f}_T(2) \pm 1.96\cdot \hat{\sigma} (1+\hat\rho^2)^{1/2} . \]

By doing so, we are underestimating the standard deviation of \(y_T(h),\) this interval is too narrow, but perhaps not by much, especially when \(T\) is large.

References

Chapter 18.5 Forecasting, Wooldridge, Introductory Econometrics.
Chapter 12.11 Forecast In the Presence of Autocorrelations, pp.279, Greene, Econometric Analysis.