18 Highly Persistent TS

Provided the time series we use are weakly dependent, usual OLS inference procedures are valid under assumptions weaker than the classical linear model assumptions. Unfortunately, many economic time series cannot be characterized by weak dependence.

Using time series with strong dependence in regression analysis poses no problem, if the CLM assumptions in finite samples hold. But the usual inference procedures are very susceptible to violation of these assumptions when the data are not weakly dependent, because then we cannot appeal to the law of large numbers and the central limit theorem. In this section, we provide some examples of highly persistent (or strongly dependent) time series and show how they can be transformed for use in regression analysis.

When the time series are highly persistent (they have unit roots), we must exercise extreme caution in using them directly in regression models. An alternative to using the levels is to use the first differences of the variables. For most highly persistent economic time series, the first difference is weakly dependent. Using first differences changes the nature of the model, but this method is often as informative as a model in levels. When data are highly persistent, we usually have more faith in first-difference results.

18.1 Testing for Unit Roots

We begin with an AR(1) model:

\[ u_t = \alpha + \rho u_{t-1} + \varepsilon_t, \; t = 2,3,\ldots, \tag{18.1}\] where \(\{\varepsilon_t\}\) has zero mean, given past observed \(u\):

\[ \E[\varepsilon_t\mid u_{t-1}, u_{t-2}, \ldots] = 0, \tag{18.2}\] that is, \(\{\varepsilon_t\}\) is said to be a martingale difference sequence w.r.t. \(\{u_{t-1}, u_{t-2}, \ldots\}\). If \(\{\varepsilon_t\}\) is said to be i.i.d. with zero mean, then it also satisfies the assumption (18.2).

When \(\alpha=0\) and \(\rho=1\), then \(\{u_t\}\) follows a random walk without drift.
When \(\alpha\ne 0\) and \(\rho=1\), then \(\{u_t\}\) follows a random walk with drift. We say that the model allows for a deterministic drift.

Therefore, the null hypothesis is that \(\{u_t\}\) has a unit root:

\[ \begin{split} \mathrm H_0: \rho=1 \\ \mathrm H_1: \rho<1 \\ \end{split} \] Note that \(\mathrm H_1\) is a one-sided alternative. We do not consider that \(\rho>1\) as it implies that \(u_t\) is explosive.

When \(\abs{\rho}<1,\) \(\{u_t\}\) is a stable AR(1) process, which means it is weakly dependent or asymptotically uncorrelated.

18.1.1 Dickey-Fuller (DF) Test

To test for a unit root, we reparameterize the AR(1) model.

Subtract \(u_{t-1}\) from both sides of (18.1) and define \(\theta=(\rho-1)\):

\[ \begin{split} u_t - u_{t-1} &= \alpha + \rho u_{t-1} - u_{t-1} + \varepsilon_t \\ \Rightarrow \Delta u_t &= \alpha + (\rho - 1)u_{t-1} + \varepsilon_t \end{split} \] Define \(\theta = \rho - 1\):

\[ \Delta u_t = \alpha + \theta u_{t-1} + \varepsilon_t \] Testing Hypotheses

\(\mathrm H_0: \theta = 0\) (i.e., \(\rho = 1\), unit root)
\(\mathrm H_1: \theta < 0\) (i.e., \(\rho < 1\), stationary)

Testing Procedure

Estimate:

\[ \Delta u_t = \alpha + \theta u_{t-1} + \varepsilon_t \]
Compute the t-statistic for \(\hat{\theta}\).
Compare it to the Dickey-Fuller critical values for the case with intercept (no trend).

Asymptotic Critical Values for Unit Root t Test: No Time Trend
Significance level	1%	2.5%	5%	10%
Critical value	−3.43	−3.12	−2.86	−2.57

18.1.2 Augmented Dickey-Fuller (ADF) Test

To account for serial correlation in the error term \(\varepsilon_t\), we augment the regression with lagged differences:

\[ \Delta u_t = \alpha + \theta u_{t-1} + \sum_{j=1}^{p} \gamma_j \Delta u_{t-j} + \varepsilon_t . \tag{18.3}\]

Using \(p=1\) as an example:

\[ \Delta u_t = \alpha + \theta u_{t-1} + \gamma_1 \Delta u_{t-1} + \varepsilon_t . \] where \(\abs{\gamma_1}<1\). This ensures that, under \(\mathrm H_0: \theta=0,\) \(\{\Delta u_{t}\}\) follows a stable AR(1) model. Under the alternative \(\mathrm H_1: \theta<0,\) \(\{u_t\}\) follows a stable AR(2) model.

This extended version of the Dickey-Fuller test is usually called the augmented Dickey-Fuller test because the regression has been augmented with the lagged changes, \(\Delta u_{t-j}.\) The critical values and rejection rule are the same as before. The inclusion of the lagged changes in (18.3) is intended to clean up any serial correlation in \(\Delta u_{t}.\)

Often, the lag length is dictated by the frequency of the data (as well as the sample size). For annual data, one or two lags usually suffice. For monthly data, we might include 12 lags. But there are no hard rules to follow in any case.

18.1.3 Deterministic Trend

For series that have clear time trends, we need to modify the test for unit roots. A trend-stationary process, which has a linear trend in its mean but is \(\mathrm I(0)\) about its trend, can be mistaken for a unit root process if we do not control for a time trend in the Dickey-Fuller regression. In other words, if we carry out the usual DF or augmented DF test on a trending but \(\mathrm I(0)\) series, we will probably have little power for rejecting a unit root.

If a deterministic trend is expected in the data, the corresponding model in levels (before differencing) is:

\[ u_t = \alpha + \beta t + \rho u_{t-1} + \varepsilon_t, \quad t = 2,3,\ldots \]

This allows the test to account for both a deterministic linear trend and a possible unit root.

We change the ADF regression to:

\[ \Delta u_t = \alpha + \delta t + \theta u_{t-1} + \sum_{j=1}^{p} \gamma_j \Delta u_{t-j} + \varepsilon_t \] where again the testing hypotheses are

\(\mathrm H_0: \theta = 0\)

If \(\{u_t\}\) has a unit root, then \[ \Delta u_t = \alpha + \delta t + \sum_{j=1}^{p} \gamma_j \Delta u_{t-j} + \varepsilon_t , \] so the change in \(u_t\) has a mean linear in \(t\) unless \(\delta=0.\)

It is unusal for the first difference of an economic series to have a linear trend, so a more appropriate null hypothesis is probably

\[ \mathrm H_0: \theta = 0, \delta = 0 \]
\(\mathrm H_1: \theta < 0\) (trend-stationary process)

When we include a time trend in the regression, the critical values of the test change. Intuitively, this occurs because detrending a unit root process tends to make it look more like an \(\mathrm I(0)\)process. Therefore, we require a larger magnitude for the \(t\) statistic in order to reject \(\mathrm H_0.\) The Dickey-Fuller critical values for the \(t\) test that includes a time trend are given in following table.

Asymptotic Critical Values for Unit Root t Test: Linear Time Trend
Significance level	1%	2.5%	5%	10%
Critical value	−3.96	−3.66	−3.41	−3.12

18.2 Cointegration

The notion of cointegration, which was given a formal treatment in Engle and Granger (1987), makes regressions involving \(\mathrm I(1)\) variables potentially meaningful.

If \(\{y_t: t=0,1,\ldots\}\) and \(\{x_t: t=0,1,\ldots\}\) are two \(\mathrm I(1)\) processes, then, in general,

\[ y_t - \beta x_t \]

is an \(\mathrm I(1)\) process for any number \(\beta.\) Nevertheless, it is possible that for some \(\beta \ne 0,\) \(y_t - \beta x_t\) is an \(\mathrm I(0)\) process, which means it has constant mean, constant variance, and autocorrelations that depend only on the time distance between any two variables in the series, and it is asymptotically uncorrelated.

If such a \(\beta\) exists, we say that \(y\) and \(x\) are cointegrated, and we call \(\beta\) the cointegration parameter.

18.2.1 Engle-Granger Test

The Engle-Granger test is used for testing the existence of a long-run equilibrium relationship (cointegration) between two or more non-stationary time series.

It is a two-step procedure:

Estimate the Cointegrating Regression

Suppose \(Y_t\) and \(X_t\) are non-stationary (typically \(I(1)\)) series that neither has a drift. Regress: \[ y_t = \alpha + \beta x_t + u_t \]
using OLS, and save the residuals \(\hat{u}_t\).
Test the Residuals for Stationarity

Apply the Dickey-Fuller (DF) test or Augmented Dickey-Fuller (ADF) test to the residuals:
\[ \Delta \hat{u}_t = \alpha + \theta \hat{u}_{t-1} + \sum_{i=1}^p \gamma_i \Delta \hat{u}_{t-i} + \varepsilon_t \]
If the residuals are stationary (i.e., reject the null of a unit root), the variables are cointegrated.

We compare the \(t\) statistic of \(\hat{\theta}\) to the critical value in the following table. If the \(t\) statistic is below the critical value, we have evidence that \(y_t - \beta x_t\) is \(\mathrm I(0)\) for some \(\beta;\) that is, \(y_t\) and \(x_t\) are cointegrated.

Asymptotic Critical Values for Cointegration t Test: No Time Trend
Significance level	1%	2.5%	5%	10%
Critical value	−3.90	−3.59	−3.34	−3.04

Note that we must get a \(t\) statistic much larger in magnitude to find cointegration than if we used the usual DF critical values. This happens because OLS, which minimizes the sum of squared residuals, tends to produce residuals that look like an \(\mathrm I(0)\) sequence even if \(y_t\) and \(x_t\) are not cointegrated.

If \(y_t\) and/or \(x_t\) contain a drift term, we change the regression to

\[ y_t = \alpha + \delta t + \beta x_t + u_t , \] and apply the usual DF or ADF test to the residuals \(\hat{u}_t.\) The asymptotic critical values are given as follows.

Asymptotic Critical Values for Cointegration t Test: Linear Time Trend
Significance level	1%	2.5%	5%	10%
Critical value	−4.32	−4.03	−3.78	−3.50

18.3 Cointegrated Series

18.3.1 Leads and Lags Estimator

When \(y_t\) and \(x_t\) are \(\mathrm I(1)\) and cointegrated, we can write

\[ y_t = \alpha + \beta x_t + u_t , \] where \(\{u_t\}\) is a zero mean, \(\mathrm I(0)\) process.

We want to test hypotheses about the cointegrating parameter \(\beta.\)

Generally, \(\{u_t\}\) contains serial correlation, but this does NOT affect consistency of OLS. That is, OLS yields consistent estimates for \(\beta\) and \(\alpha.\)

The issue is: because \(x_t\) is \(\mathrm I(1),\) the usual inference procedures do NOT necessarily apply. OLS is not asymptotically normally distributed, and the t statistic for \(\hat{\beta}\) does not necessarily have an approximate \(t\) distribution.

If \(\{x_t\}\) is strictly exogenous, and the errors are homoskedastic, serially uncorrelated, and normally distributed, the OLS estimator is also normally distributed (conditional on the explanatory variables) and the \(t\) statistic has an exact \(t\) distribution.

Unfortunately, these assumptions are too strong to apply to most situations. The notion of cointegration implies nothing about the relationship between \(\{x_t\}\) and \(\{u_t\}\)—indeed, they can be arbitrarily correlated.

The lack of strict exogeneity of \(\{x_t\}\) can be fixed by writing \(u_t\) as a function of the \(\Delta x_s\) for all \(s\) close to \(t.\)

For example,

\[ u_t = \eta + \phi_0 \Delta x_t + \phi_1 \Delta x_{t-1} + \phi_2 \Delta x_{t-2} + \gamma_1 \Delta x_{t+1} + \gamma_2 \Delta x_{t+2} + e_t , \]

where, by construction, \(e_t\) is uncorrelated with each of \(\Delta x_s\) appearing in the equation.

Plugging in \(u_t\) into the original regression, we get

\[ \begin{split} y_t &= \alpha_0 + \beta x_t + \phi_0 \Delta x_t + \phi_1 \Delta x_{t-1} + \phi_2 \Delta x_{t-2} \\ &\phantom{=} \;+ \gamma_1 \Delta x_{t+1} + \gamma_2 \Delta x_{t+2} + e_t . \end{split} \tag{18.4}\]

The OLS estimator of \(\beta\) from (18.4) is called the leads and lags estimator because it employs \(\Delta x.\)

This equation looks a bit strange because future \(\Delta x_{s}\) appear with both current and lagged \(\Delta x_{t}.\) The key is that the coefficient on \(x_t\) is still \(\beta\), and, by construction, \(x_t\) is now strictly exogenous in this equation.

The strict exogeneity assumption is the important condition needed to obtain an approximately normal \(t\) statistic for \(\hat\beta\). If \(u_t\) is uncorrelated with all \(\Delta x_{s},\) \(x\ne t\), then we can drop the leads and lags of the changes and simply include the contemporaneous change,\(\Delta x_{t}.\) Then, the equation we estimate looks more standard but still includes the first difference of \(x_t\) along with its level: \[ y_t = \alpha_0 + \beta x_t + \phi_0 \Delta x_t + e_t . \] In effect, adding \(\Delta x_{t}\) solves any contemporaneous endogeneity between \(x_t\) and \(u_t\).

The only issue we must worry about in (18.4) is the possibility of serial correlation in \(\{e_t\}.\) This can be dealt with by

using a serial correlation-robust standard error for \(\hat\beta;\) or
using a standard AR(1) correction, e.g., Cochrane-Orcutt (CO) estimation and Prais-Winston (PW) estimation. (See chap 12.3, Wooldridge, Introductory Econometrics)

18.4 Error Correction Models

Consider the model

\[ \Delta y_t = \alpha_0 + \alpha_1 \Delta y_{t-1} + \gamma_0 \Delta x_t + \gamma_1\Delta x_{t-1} + u_t, \]

where \(u_t\) has zero mean given \(\Delta y_{t-1},\) \(\Delta x_t,\) \(\Delta x_{t-1},\) and further lags.

If \(y_t\) and \(x_t\) are cointegrated with parameter \(\beta,\) then we can include lags of

\[ s_t = y_t - \beta x_t \] in the equation.

In the simplest case, we include one lag of \(s_t:\)

\[ \Delta y_t = \alpha_0 + \alpha_1 \Delta y_{t-1} + \gamma_0 \Delta x_t + \gamma_1\Delta x_{t-1} + \delta s_{t-1} + u_t, \tag{18.5}\] where \(\E[u_t\mid \Ical_{t-1}] =0,\) and \(\Ical_{t-1}\) contains information on all past values of \(x\) and \(y.\)

The terms \(\delta s_{t-1} = \delta (y_{t-1} - \beta x_{t-1})\) is called the error correction term and the model (18.5) is called the error correction model.

For simplicity, consider the model without lags of \(\Delta y_t\) and \(\Delta x_t:\)

\[ \Delta y_t = \alpha_0 + \gamma_0 \Delta x_t + \delta (y_{t-1} - \beta x_{t-1}) + u_t, \] where \(\delta<0.\)

If \(y_{t-1} > \beta x_{t-1},\) then \(y\) in the previous period has overshot the equilibrium; because \(\delta<0,\) the error correction term works to push \(y\) back toward the equilibrium.
If \(y_{t-1} < \beta x_{t-1},\) the error correction term works includes a positive change in \(y\) back toward the equilibrium.

The cointegrating parameter \(\beta\) can be estimated using the OLS estimator or the leads and lags estimator. Engle and Granger (1987) shows that the asymptotic efficiency of the estimators of the parameters in the error correction model is unaffected by the preliminary estimation of \(\beta.\)

The choice of \(\hat\beta\) will generally have an effect on the estimated error correction parameters in any particular sample, but we have no systematic way of deciding which preliminary estimator of \(\beta\) to use. The procedure of replacing \(\beta\) with \(\hat\beta\) is called the Engle-Granger two-step procedure.

18.5 Noncointegrated Series

If \(y_t\) and \(x_t\) are \(\mathrm I(1)\) and are NOT cointegrated, we can estimate a dynamic model in first differences.

As an example,

\[ \Delta y_t = \alpha_0 + \alpha_1 \Delta y_{t-1} + \gamma_0 \Delta x_t + \gamma_1\Delta x_{t-1} + u_t, \]

where \(u_t\) has zero mean given \(\Delta y_{t-1},\) \(\Delta x_t,\) \(\Delta x_{t-1},\) and further lags.

Engle, Robert F, and C W J Granger. 1987. “Co-Integration and Error Correction: Representation, Estimation, and Testing.” Econometrica 55: 251–76. http://www.jstor.org/stable/1913236.