20 Dynamic Models

20.1 Exogeneity and Dynamic Models

20.1.1 Introduction

Consider the model

\[ y_{it} = \bx_{it}' \bbeta + u_{it} . \] Throughout the previous section, we assumed that the idiosyncratic term \(u_{it}\) satisfied strict exogeneity with respect to the data: \[ \E[u_{it}|\bX_i] = 0. \]

It implies that for all pairs of indices \(s\) and \(t,\) it holds that \[ \E[u_{it}\bx_{is}] =0. \tag{20.1}\]

However, strict exogeneity is not possible in the presence of lagged dependent variables.

Therefore, sequential exogeneity requires only (20.1) holds for \(t\ge s.\) That is,

\[ \E[u_{it}\bx_{is}] = 0 \text{ for } t\ge s. \tag{20.2}\]

In plain language, the current and past values of \(\bx\) are uncorrelated with the current error term, although its future values may not be.

In this case, the regressor \(\bx\) is called “predetermined” as its value is determined prior to the current period.

Examples of predetermined variables:
- Lagged dependent variables: Variables that represent past values of the dependent variable.
- Exogenous variables: Variables that are determined outside the model and are not influenced by the endogenous variables within the current period.
For \(t<s\), Equation 20.1 means that past shocks are uncorrelated with future values of \(\bx\). In other words, one cannot predict future \(\bx\)’s from past shocks.

This requirement might fail if \(\bx\) is dynamic, e.g., covaraites including lagged dependent variables, and its evolution is affected by \(u_{it}\). That is, \[ \E[u_{it}\bx_{is}] \ne 0 \text{ for } t<s. \] This means, \(\bx_{is}\) can respond dynamically to past values of \(y_{it}.\)

In this section, we will discuss this challenge and some traditional approaches to dealing with it with short panel data.

20.1.2 No Strict Exogeneity In the Presence of Lagged Dependent Variables

Consider the AR(1) model, \[ y_t = \beta_0 + \beta_1 y_{t-1} + u_t, \tag{20.3}\] where the error \(u_t\) has a zero expected value, given all past values of \(y\): \[ \mathbb{E}(u_t \mid y_{t-1}, y_{t-2}, \ldots) = 0. \tag{20.4}\]

Conventions for \(t\):

The first observation is \(y_1\), and \(t=2, \ldots, T\), so that the first available equation is

\[ y_2 = \beta_0 + \beta_1 y_{1} + u_2 . \]

and there are (\(T-1\)) equations in levels. We will mostly use this convention.
The first observation is \(y_0\), and \(t=1, \ldots, T\), so that the first available equation is

\[ y_1 = \beta_0 + \beta_1 y_{0} + u_1 . \]

and there are (\(T\)) equations in levels.

Combined, these two equations imply that \[ \mathbb{E}(y_t \mid y_{t-1}, y_{t-2}, \ldots) = \mathbb{E}(y_t \mid y_{t-1}) = \beta_0 + \beta_1 y_{t-1}. \]

This result is very important. First, it means that, once \(y\) lagged one period has been controlled for, no further lags of \(y\) affect the expected value of \(y_t\). (This is where the name “first order” originates.) Second, the relationship is assumed to be linear.

Because \(x_t\) contains only \(y_{t-1}\), Equation 20.4 implies that the contemporaneous exogeneity Assumption holds. By contrast, the strict exogeneity assumption needed for unbiasedness, which does not hold.

In fact, because \(u_t\) is uncorrelated with \(y_{t-1}\) under Equation 20.4, \(u_t\) and \(y_t\) must be correlated.

\[ \begin{aligned} \text{Cov}(y_t, u_t) &= \text{Cov}(\beta_0 + \beta_1 y_{t-1} + u_t, u_t) \\ &= \text{Var}(u_t) > 0. \end{aligned} \]

Therefore, a model with a lagged dependent variable cannot satisfy the strict exogeneity assumption.

For the weak dependence condition to hold, we must assume that \(|\beta_1| < 1\). If this condition holds, then the OLS estimator from the regression of \(y_t\) on \(y_{t-1}\) produces consistent estimators of \(\beta_0\) and \(\beta_1\).

Unfortunately, \(\hat{\beta}_1^{OLS}\) is biased, and this bias can be large if the sample size is small or if \(\beta_1\) is near 1. (For \(\beta_1\) near 1, \(\hat{\beta}_1^{OLS}\) can have a severe downward bias.) In moderate to large samples, \(\hat{\beta}_1^{OLS}\) should be a good estimator of \(\beta_1\).

Equation 20.4 is called the sequential exogeneity assumption. Or more generally, using \(\bx\) to represent the vector of covariates, the sequential exogeneity is:

\[ \E(u_t \mid \bx_t, \bx_{t-1}, \ldots,) = \E(u_t) = 0, t=1,2,\ldots. \tag{20.5}\]

We can see that sequential exogeneity implies contemporaneous exogeneity.

20.2 Dynamic Panel Representation

Dynamic models account for temporal dependencies by including the lagged dependent variable, often providing more accurate results than static panel models.

A Nickell bias arises from including the lagged dependent variable as an explanatory variable, making standard panel data estimators (FE, RE, FD) inconsistent, especially in analyses with short T and large N. System-GMM estimation addresses this by instrumenting the endogenous variables with their lagged values.

More generally, consider a panel data linear model with homogeneous coefficients, a random intercept, and a dynamic process in the \(K\)-th order for the outcome: \[ y_{it} = \alpha_i + \sum_{k=1}^K \lambda_k y_{it-k} + \bbeta'\bx_{it} + u_{it}. \tag{20.6}\]

Notice that the RHS now includes the lagged dependent variables \(y_{i,t-k},\) \(k=1,\ldots,K.\) \(\lambda_k\) are the autoregressive coefficients, \(\bx_{it}\) is a vector of regressors. \(\alpha_i\) is an individual-effect, and \(u_{it}\) is an idiosyncratic error.

It is conventional to assume that \(\alpha_i\) and the error \(u_{it}\) are mutually independent

\[ \cov(\alpha_i, u_{it}) = 0, \]

and \(u_{it}\) are serially uncorrelated:

\[ \E[u_{it}u_{is}] = 0 \quad \text{for } t \ne s. \]

Adding dynamics to a model makes major changes in the interpretation of the equation. The entire history of the LHS variable enters into the equation. Any measured influence is conditioned on this history. Any impact of \(\bx_{it}\) represents the effects of new information.

In Equation 20.6, a sequential exogeneity assumption takes form \[ \E[u_{it}\mid \curl{y_{is-1}, \bx_{is}}_{s\leq t}] =0. \]

For the sake of simplicity, we will discuss a simple version of model (20.6) without extra covariates \(\bx_{it}\) and with only one lag of \(y_{it}\): \[ \color{#008B45} {y_{it} = \alpha_i + \lambda y_{it-1} + u_{it}, } \tag{20.7}\] where \(i=1,\ldots,N,\) and \(t=2,\ldots,T.\) And \(u_{it}\) satisfies sequential exogeneity in the form \(\E[u_{it}|y_{is}, s<t] = 0.\)

A popular class of estimators that are consistent as \(N\to\infty\) with \(T\) fixed first transform the model to eliminate the individual effects \(\alpha_i\), and then apply instrumental variables.

Let us difference Equation 20.7 across \(t\) to eliminate the random intercept \(\alpha_i\). The differenced equation takes form

\[ \color{#008B45} {\Delta y_{it} = \lambda \Delta y_{it-1} + \Delta u_{it}} , \tag{20.8}\]

for \(t=3,\ldots,T,\) where \(\Delta y_{it} = y_{it} - y_{it-1}.\) Equation 20.8 seems like a simple regression equation. It is tempting to just apply OLS and regress \(\Delta y_{it}\) on \(\Delta y_{it-1}\). But it turns out that Equation 20.8 has an endogeneity problem. To prove this, we can show that \(\E[\Delta y_{it-1} \Delta u_{it}] \ne 0.\)

Plug in \(\Delta y_{it-1} = y_{it-1}-y_{it-2}\) and \(\Delta u_{it} = u_{it}-u_{it-1}.\)

\[ \begin{aligned} \E[\Delta y_{it-1} \Delta u_{it}] &= \E[(y_{it-1}-y_{it-2})(u_{it}-u_{it-1})] \\ &= \underbrace{\E[y_{it-1}u_{it}]}_{0} - \underbrace{\E[y_{it-2}u_{it}]}_0 - \E[y_{it-1}u_{it-1}] + \underbrace{\E[y_{it-2}u_{it-1}]}_0 \\ &= -\E[y_{it-1}u_{it-1}] \\ &= -\E[(\alpha_i + \lambda y_{it-2} + u_{it-1}) u_{it-1}] \\ &= -\E[u_{it-1}^2] \\ &= -\sigma_u^2 \ne 0 \end{aligned} \]

Hence we conclude that

\[ \E[\Delta y_{it-1} \Delta u_{it}] \ne 0, \] and that \(\Delta y_{it-1}\) is an endogeneous regressor in the differenced Equation 20.8. Hence, we see that the first difference estimator is biased.

Note that the bias is large when \(T\) is relatively small, which is common for micro data. When \(T\) is large the Nickell’s bias is relatively small.

The dynamic panel data (DPD) models are designed to account for this endogeneity.

20.2.1 IV Estimator of Dynamic Panel Models

Given the endogeneity issue in first-differencing, instrumental variable methods offer a potential solution. We only need to find suitable instruments – variables \(z_{it}\) which satisfy relevance and exogeneity conditions: \[ \begin{aligned} \E[z_{it}\Delta y_{it-1}] \neq 0,\\ \E[z_{it}\Delta u_{it}] = 0. \end{aligned} \]

In most contexts, one has to look for external variables that can serve as instruments.

However, model (20.8) is a very special case where one can use instruments internal to the model!

20.2.1.1 The Anderson-Hsiao Estimator

In particular, consider using twice lagged level of dependent variable \(y_{it-2}\) as an instrument for \(\Delta y_{it-1}.\)

Relevance seems to be relatively straightforward, as the target endogenous variable \(\Delta y_{it-1} = y_{it-1}-y_{it-2}\) actually has the instrument inside.

\[ \E[y_{it-2}\Delta y_{it-1}] = \E[y_{it-2}(y_{it-1}-y_{it-2})] \ne 0. \]

\(y_{it-2}\) is correlated with the endogenous variable \(\Delta y_{it-1}.\)
Exogeneity can be justified by appealing to sequential exogeneity:

\[ \E[y_{it-2}\Delta u_{it}] = \E[y_{it-2}u_{it} - y_{it-2}u_{it-1}]=0. \tag{20.9}\]

\(y_{it-2}\) is not correlated with the error term \(\Delta u_{it}.\) The orthogonality condition (20.9) ensures the validity of the instrument.

Hence \(y_{it-2}\) is a valid instrument for \(\Delta y_{it-1}.\)

The resulting IV estimator (\(\hat{\bbeta}_{\mathrm{iv}} = (\bZ'\bX)^{-1}(\bZ'\by)\)) is known as the Anderson and Hsiao (1982) estimator and given by \[ \begin{split} \hat{\lambda}_{AH}^{\mathrm{iv}} &= (\bZ'\Delta\by_{-1})^{-1} (\bZ'\Delta\by) \\ &= \dfrac{ \sum_{i=1}^N \sum_{t=3}^T y_{it-2}\Delta y_{it} }{\sum_{i=1}^N \sum_{t=3}^T y_{it-2}\Delta y_{it-1} }. \end{split} \] where

\(\Delta \by\) is the stacked \(N(T-2)\times 1\) vector of observations on \(\Delta y_{it},\)
\(\Delta \by_{-1}\) is the stacked \(N(T-2)\times 1\) vector of observations on \(\Delta y_{i,t-1},\) and
\(\bZ\) is the stacked \(N(T-2)\times 1\) vector of observations on \(y_{i,t-2}.\)

The 2SLS estimator (\(\widehat{\bbeta}_{2\text{sls}} = \left[\bX'\bZ (\bZ'\bZ)^{-1} \bZ'\bX \right]^{-1} \left[\bX'\bZ (\bZ'\bZ)^{-1} \bZ'\by \right]\)) is given by

\[ \begin{split} \hat{\lambda}_{AH}^{\text{2sls}} &= \left[\Delta\by_{-1}'\bZ (\bZ'\bZ)^{-1} \bZ'\Delta\by_{-1}\right]^{-1} \left[\Delta\by_{-1}'\bZ (\bZ'\bZ)^{-1} \bZ'\Delta\by \right] . \end{split} \]

One might use the twice lagged differences \(\Delta y_{it-2} = y_{it-2}-y_{it-3}\) as an instrument for \(\Delta y_{it-1}.\) But

One further time series observation is lost if \(\Delta y_{i,t-2}\) rather than \(y_{it-2}\) is used as the instrument.
Larger asymptotic variance

The Anderson-Hsiao (AH) estimator delivers consistent but not eﬃcient estimates of the parameters in the model. This is due to the fact that the IV doesn’t exploit all the available moments conditions.

With \(T=3,\) we have one instrument \(y_{i1}\) for \(\Delta y_{i2}\) in the equation

\[ \Delta y_{i3} = \lambda \Delta y_{i2} + \Delta u_{i3} , \quad \text{for } i=1,\ldots, N. \] With \(T>3,\) further valid instruments become available for the first differenced equations in the later time periods. Efficiency can be improved by exploiting these additional instruments.

The IV estimator also ignores the structure of the error component in the transformed model.

The autocorrelation in the first differenced errors leads to inconsistency of the IV estimates.
The IV estimates would be inconsistent when other regressors are correlated with the error term.

20.2.1.2 The Arellano-Bond Estimator

The orthogonality condition (20.9) is one of many implied by the dynamic panel model. Indeed, all lags \(y_{it-2},\) \(y_{it-3},\) are valid instruments. If \(T > p + 2\) (\(p\) is the order of autocorrelation) these can be used to potentially improve estimation efficiency. This was first pointed out by Holtz-Eakin, Newey, and Rosen (1988) and further developed by Arellano and Bond (1991).

Arellano and Bond (1991) expanded the idea by using additional lags of the dependent variable as instruments.

Starting point: the FD estimator

\[ \Delta y_{it} = \lambda \Delta y_{it-1} + \Delta u_{it}, \]

Valid instruments:

\(t = 2 \text{ or } t = 1\): no instruments,
\(t = 3\): the valid instrument for \(\Delta y_{i2} = (y_{i2} - y_{i1})\) is \(y_{i1}\), the moment condition is

\[ \E[y_{i1}\Delta u_{i3}] = 0 \text{ for }i=1,2,\ldots,N \]

\(t = 4\): the valid instruments for \(\Delta y_{i3} = (y_{i3} - y_{i2})\) are \(y_{i2}\) as well as \(y_{i1}\), the moment conditions are

\[ \begin{aligned} \E[y_{i1}\Delta u_{i4}] &= 0 \\ \E[y_{i2}\Delta u_{i4}] &= 0 \end{aligned} \]

\(t = 5\): the valid instruments for \(\Delta y_{i4} = (y_{i4} - y_{i3})\) are \(y_{i3}\) as well as \(y_{i2}\) and \(y_{i1}\), the moment conditions are

\[ \begin{aligned} \E[y_{i1}\Delta u_{i5}] &= 0 \\ \E[y_{i2}\Delta u_{i5}] &= 0 \\ \E[y_{i3}\Delta u_{i5}] &= 0 \\ \end{aligned} \]

\(t = 6\): the valid instruments for \(\Delta y_{i5} = (y_{i5} - y_{i4})\) are \(y_{i4}\) as well as \(y_{i3}\), \(y_{i2}\), and \(y_{i1}\), the moment conditions are

\[ \begin{aligned} \E[y_{i1}\Delta u_{i6}] &= 0 \\ \E[y_{i2}\Delta u_{i6}] &= 0 \\ \E[y_{i3}\Delta u_{i6}] &= 0 \\ \E[y_{i4}\Delta u_{i6}] &= 0 \\ \end{aligned} \]

\(t = T\): the valid instruments for \(\Delta y_{iT-1} = (y_{iT-1} - y_{iT-2})\) are \(y_{iT-2}\) as well as \(y_{iT-3}, \dots, y_{i1}\), the moment conditions are

\[ \begin{aligned} \E[y_{i1}\Delta u_{iT}] &= 0 \\ \E[y_{i2}\Delta u_{iT}] &= 0 \\ \E[y_{i3}\Delta u_{iT}] &= 0 \\ \vdots \\ \E[y_{i,T-2}\Delta u_{iT}] &= 0 \\ \end{aligned} \]

As \(t\) increases, the number of instruments available also increases. The total number of instruments will be quadratic in \(T.\) One disadvantage of this strategy should be apparent. If \(T <10\), that may be a manageable number, but for a longer times eries, it may be necessary to restrict the number of past lags used.

We’ll have an instrument matrix with one row for each time period that we are instrumenting. \(Z_i\) is the corresponding matrix of instruments for the lagged difference.

\[ Z_i = \begin{bmatrix} y_{i,1} & 0 & 0 & 0 & 0 & 0 & \cdots & 0 & 0 \\ 0 & y_{i,1} & y_{i,2} & 0 & 0 & 0 & \cdots & 0 & 0 \\ 0 & 0 & 0 & y_{i,1} & y_{i,2} & y_{i,3} & \cdots & 0 & 0 \\ \vdots & \vdots & \vdots & \vdots & \vdots & \ddots & \cdots & \vdots & \vdots \\ 0 & 0 & 0 & 0 & \cdots & 0 & y_{i,1} & \cdots & y_{i,T-2} \end{bmatrix} \]

The number of rows of \(Z_i\) is \(T-2.\) The number of columns of \(Z_i\) depends on the length of \(T,\) it can grow very quickly with \(T:\)

\[ 1+ 2 + \cdots + (T-2) = \sum_{k=1}^{T-2} k = \frac{(T-1)(T-2)}{2} \] Rewrite the model in the vector form

\[ \Delta \by_{i\cdot} = \lambda \Delta y_{i\cdot-1} + \Delta u_{i\cdot}, \] where

\[ \begin{split} \Delta y_{i\cdot} = \begin{bmatrix} \Delta y_{i3} \\ \Delta y_{i4} \\ \vdots \\ \Delta y_{iT} \end{bmatrix}, \quad \Delta y_{i\cdot-1} = \begin{bmatrix} \Delta y_{i2} \\ \Delta y_{i3} \\ \vdots \\ \Delta y_{i,T-1} \end{bmatrix}, \quad \Delta u_{i\cdot} & = \begin{bmatrix} \Delta u_{i3} \\ \Delta u_{i4} \\ \vdots \\ \Delta u_{iT} \end{bmatrix}. \\ & \phantom{=} (T-2) \times 1 \end{split} \]

Arellano and Bond (1991) suggested using a GMM approach based on all available moment conditions:

\[ \E[Z_i'\Delta u_{i\cdot}] = \boldsymbol{0} \]

Sample analogue

\[ J_N(\lambda) = \frac{1}{N} \sum_{i=1}^N \bZ'\Delta u_{i\cdot} \]

Alternative notation: Sometimes you see the moment conditions as \(\E[Z_i' u_{i\cdot}^*] = \boldsymbol{0}\), where \(u_{i\cdot}^*\) refers to the first difference transformed errors.

There are all together \(\frac{(T-1)(T-2)}{2}\) conditions.

The one-step Arellano-Bond estimator is a GMM estimator:

\[ \hat{\lambda}^{GMM} = Q_1^{-1} \left(\sum_{i=1}^N \Delta y_{i\cdot -1}^{\prime} Z_i\right) A_1 \left(\sum_{i=1}^N Z_i' \Delta y_i \right) \] where

\[ Q_1 = \left(\sum_{i=1}^N \Delta y_{i\cdot -1}^{\prime} Z_i\right) A_1 \left(\sum_{i=1}^N Z_i' \Delta y_{i\cdot -1} \right) \] and

\[ \begin{aligned} A_1 &= \left( \sum_{i=1}^N Z_i^\prime H_i Z_i \right)^{-1} , \\ H_i &= \E[\Delta u_i \Delta u_i^{\prime}] = \sigma_u^2 \begin{bmatrix} 2 & -1 & 0 & \cdots & 0 & 0 \\ -1 & 2 & -1 & \cdots & 0 & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & 0 & \cdots & 2 & -1 \\ 0 & 0 & 0 & \cdots & -1 & 2 \\ \end{bmatrix} . \end{aligned} \]

\(A_1\) is the optimal weighting matrix: \(H_i\) is the covariance matrix of the differenced idiosyncratic errors. Note that \(H_i\) implies homoskedesticity while taking into account the dynamic structure of the error term.

The two-step estimator is based on one-step estimator:

\[ \hat{\lambda}^{GMM} = Q_2^{-1} \left(\sum_{i=1}^N \Delta y_{i\cdot -1}^{\prime} Z_i\right) A_2 \left(\sum_{i=1}^N Z_i' \Delta y_i \right) \] where

\[ Q_2 = \left(\sum_{i=1}^N \Delta y_{i\cdot -1}^{\prime} Z_i\right) A_2 \left(\sum_{i=1}^N Z_i' \Delta y_{i\cdot -1} \right) \] and

\[ \begin{aligned} A_1 &= \left( \sum_{i=1}^N Z_i^\prime G_i Z_i \right)^{-1} , \\ G_i &= \widehat{\Delta u_i}\; \widehat{\Delta u_i}^{\prime} . \end{aligned} \] That is, the only diﬀerence between one-step and two-step estimator is the weighting matrix \(A_1\) and \(A_2\). \(A_2\) uses the one-step residual from the one-step estimation.

Simulation studies have suggested very modest eﬃciency gain from using the two-step version, even in the presence of considerable heteroskedasticity. More importantly the dependence of the two-step weight matrix on estimated parameters makes the usual asymptotic distribution approximations less reliable for the two step estimator. Simulation studies have shown that the asymptotic standard errors tend to be much too small for the two-step estimator. These are the reasons that much applied work has focused on the results of one-step estimator.

The Arellano-Bond (AB) estimator is suitable for situations with:

“small \(T\), large \(N\)” panels: few time periods and many individual units
a linear functional relationship
one left-hand variable that is dynamic, depending on its own past realizations
right-hand variables that are not strictly exogenous: correlated with past and possibly current realizations of the error
fixed individual effects, implying unobserved heterogeneity
heteroskedasticity and autocorrelation within individual units’ errors, but not across them

The AB estimator is usually called difference GMM.

A potential weakness in the Arellano-Bond DPD estimator was revealed in later work by Arellano and Bover (1995) and Blundell and Bond (1998).

The lagged levels are often rather poor instruments for first differenced variables, especially if the variables are close to a random walk, i.e., \(\lambda\) is close to unity.
For long panel (large \(T\)) the number of instruments increases dramatically, i.e., \(\frac{(T-1)(T-2)}{2}.\) This leads to potential loss of efficiency.

By using the lag limits options, you may specify, for instance, that only lags 2–5 are to be used in constructing the GMM instruments.
Consistency of the GMM estimator bases on the assumption that the trans- formed error term is not serially correlated for lags greater or equal to two, i.e., \(\E[\Delta u_{it} \Delta u_{it-k}] = 0\) for \(k\ge 2.\) It’s crucial to test whether the second-order autocorrelation is zero for all periods in the sample.

If a significant AR(2) statistic is encountered, the second lags of endogenous variables will not be appropriate instruments for their current values.
When \(T > 3\) and the model is over-identified, a Sargan–Hansen test can be used to test the overidentifying restrictions.

Their modification of the estimator includes lagged levels as well as lagged differences. The expanded estimator is commonly termed System GMM. The cost of the System GMM estimator involves a set of additional restrictions on the initial conditions of the process generating \(y.\)

References

Chapter 12, “Serial Correlation and Heteroskedasticity in Time Series Regressions.” Introductory Econometrics: A Modern Approach, 7e by Jeffrey M. Wooldridge
Vladislav Morozov, “Interlude: Standard Dynamic Panel Estimators.” Econometrics with Unobserved Heterogeneity, Course material, https://vladislav-morozov.github.io/econometrics-heterogeneity/linear/linear-dynamic-panel-iv.html#extra-endogeneity-and-the-within-estimator
Dynamic panel in Stata: https://janditzen.github.io/xtdcce2/

Anderson, T. W., and Cheng Hsiao. 1982. “Formulation and estimation of dynamic models using panel data.” Journal of Econometrics 18 (1): 47–82. https://doi.org/10.1016/0304-4076(82)90095-1.

Arellano, Manuel, and Stephen Bond. 1991. “Some Tests of Specification for Panel Carlo Application to Data: Monte Carlo Evidence and an Application to Employment Equations.” Review of Economic Studies 58: 277–97.

Arellano, Manuel, and Olympia Bover. 1995. “Another Look at the Instrumental Variable Estimation of Error-Components Models.” Journal of Econometrics 68 (1): 29–51.

Blundell, Richard, and Stephen Bond. 1998. “Initial Conditions and Moment Restrictions in Dynamic Panel Data Models.” Journal of Econometrics 87: 115–43.

Holtz-Eakin, Douglas, Whitney K. Newey, and Harvey S. Rosen. 1988. “Estimating Vector Autoregressions with Panel Data.” Econometrica 56 (6): 1371–95.