24 Dynamic Models
24.1 Exogeneity and Dynamic Models
24.1.1 Introduction
Consider the model
\[ y_{it} = \bx_{it}' \bbeta + u_{it} . \] Throughout the previous section, we assumed that the idiosyncratic term \(u_{it}\) satisfied strict exogeneity with respect to the data: \[ \E[u_{it}\mid \bX_i] = 0. \]
It implies that for all pairs of indices \(s\) and \(t,\) it holds that \[ \E[u_{it}\bx_{is}] =0. \tag{24.1}\]
However, strict exogeneity is not possible in the presence of lagged dependent variables.
Therefore, sequential exogeneity requires only (24.1) holds for \(t\ge s.\) That is,
\[ \E[u_{it}\bx_{i, t-1}] =0, \]
or more generally,
\[ \E[u_{it}\bx_{is}] = 0 \text{ for } t\ge s. \tag{24.2}\]
A stronger version of sequential exogeneity can be expressed as
\[ \E[u_{it} \mid \bx_{i, t}, \bx_{i, t-1}, \ldots, \bx_{i,1}] = 0. \]
In plain language, the current and past values of \(\bx\) are uncorrelated with the current error term (past and current values are safe), although its future values may not be.
In this case, the regressor \(\bx\) is called “predetermined” as its value is determined prior to the current period.
\[ \E[u_{i,t-1}\bx_{it}] \neq 0 , \]
or more generally,
\[ \E[u_{it}\bx_{is}] \neq 0 \text{ for } t<s. \]
This means past shocks can predict future values of independent variables \(\bx\).
Examples of predetermined variables:
- Lagged dependent variables \(y_{t-1}\) as covariate: Variables that represent past values of the dependent variable. Since \(y_{t-1} = \bx_{i,t-1}' \bbeta + u_{i,t-1}\), then \(\E[u_{i,t-1} \bx_{i,t}] \neq 0\)
By contrast, exogenous variables are determined outside the model and are not influenced by the endogenous variables within the current period.
For \(t<s\), Equation 24.1 means that past shocks are uncorrelated with future values of \(\bx\). In other words, one cannot predict future \(\bx\)’s from past shocks.
This requirement might fail if \(\bx\) is dynamic, e.g., covariates including lagged dependent variables, and its evolution is affected by \(u_{it}\). That is, \[ \E[u_{it}\bx_{is}] \ne 0 \text{ for } t<s. \] This means, \(\bx_{is}\) can respond dynamically to past values of \(y_{it}.\)
In this section, we will discuss this challenge and some traditional approaches to dealing with it with short panel data.
24.1.2 Real World Example
Example 1
Consider a model where you want to investigate the effect of foreign aid on a country’s economic growth.
\[ \text{Growth}_{it} = \beta_1 \text{Aid}_{it} + \alpha_i + u_{it} \]
\(\alpha_i\) captures unobserved country-specific factors that affect growth, such as geography, culture, or institutions.
In our example, usually the poor countries which experience economic distress receive more aid. This means that \(\text{Aid}_{it}\) is likely to be correlated with \(\alpha_i\).
When omitting \(\alpha_i\) (which pooled OLS does),
\[ \begin{aligned} \varepsilon_{it} &= \alpha_i + u_{it} \quad \text{and} \\ \E[\text{Aid}_{it}\, \varepsilon_{it}] &\neq 0. \end{aligned} \]
Hence endogeneity arises (due to omitted variables in this case) and the pooled OLS estimator is biased and inconsistent.
Another challenge is \(\text{Aid}_{it}\) is likely predetermined, i.e., correlated with past shocks to growth.
For example, if a country experiences a negative shock to growth in period \(t-1\) due to natural disaters, it is likely to receive more aid in period \(t.\) This indicates a feedback loop.
Mathematically,
\[ \E[u_{i,t-1} \text{Aid}_{it}] \neq 0. \]
Causes: current foreign aid is tangled with past shocks, say an earthquake, to growth.
Consequences: because the earthquake causes both the economy to crash and the foreign aid to increase, we cannot isolate the effect of aid on growth.
Remedy: Arellano-Bond GMM estimator uses lagged levels (\(t-2\), \(t-3\), etc) of aid as instruments.
It is a standard practice to include lagged dependent variable as a regressor to as economic growth is persistent (dependent on its own past).
Example 2
We use data on U.S. industrial production growth (ip_growth), inflation rate (inflation), and the interest rate (fedfunds) to estimate the effects of an interest rate increase on economic activity and prices.
At horizon \(h\), we model the response of industrial production growth to change in interst rate as
\[ \text{ip\_growth}_{t+h} = \beta_h \Delta \text{fedfunds}_t + \mathbf{w}'\bgamma + e_{t+h} \]
where \(\mathbf{w}\) are controls, such as lags of ip_growth and changes in fedfunds.
We have reason to be concerned about the endogeneity in fedfunds. Changes in interest rates affect other economic variables, but the interest rate itself responds to those same economic variables. We need to account for this potential endogeneity to reliably estimate the effects of the shock to interest rates.
Fix: Find an instrument that relates to fedfunds but uncorrelated with any nonmonetary shocks. We can use this variable as an instrument for change in fedfunds.
24.1.3 No Strict Exogeneity In the Presence of Lagged Dependent Variables
Consider the AR(1) model, \[ y_t = \beta_0 + \beta_1 y_{t-1} + u_t, \tag{24.3}\] where the error \(u_t\) has a zero expected value, given all past values of \(y\): \[ \mathbb{E}(u_t \mid y_{t-1}, y_{t-2}, \ldots) = 0. \tag{24.4}\]
Conventions for \(t\):
The first observation is \(y_1\), and \(t=2, \ldots, T\), so that the first available equation is
\[ y_2 = \beta_0 + \beta_1 y_{1} + u_2 . \]
and there are (\(T-1\)) equations in levels. We will mostly use this convention.
The first observation is \(y_0\), and \(t=1, \ldots, T\), so that the first available equation is
\[ y_1 = \beta_0 + \beta_1 y_{0} + u_1 . \]
and there are (\(T\)) equations in levels.
Combined, these two equations imply that \[ \mathbb{E}(y_t \mid y_{t-1}, y_{t-2}, \ldots) = \mathbb{E}(y_t \mid y_{t-1}) = \beta_0 + \beta_1 y_{t-1}. \]
This result is very important. First, it means that, once \(y\) lagged one period has been controlled for, no further lags of \(y\) affect the expected value of \(y_t\). (This is where the name “first order” originates.) Second, the relationship is assumed to be linear.
Because \(x_t\) contains only \(y_{t-1}\), Equation 24.4 implies that the contemporaneous exogeneity Assumption holds. By contrast, the strict exogeneity assumption needed for unbiasedness, which does not hold.
In fact, because \(u_t\) is uncorrelated with \(y_{t-1}\) under Equation 24.4, \(u_t\) and \(y_t\) must be correlated.
\[ \begin{aligned} \text{Cov}(y_t, u_t) &= \text{Cov}(\beta_0 + \beta_1 y_{t-1} + u_t, u_t) \\ &= \text{Var}(u_t) > 0. \end{aligned} \]
Therefore, a model with a lagged dependent variable cannot satisfy the strict exogeneity assumption.
For the weak dependence condition to hold, we must assume that \(|\beta_1| < 1\). If this condition holds, then the OLS estimator from the regression of \(y_t\) on \(y_{t-1}\) produces consistent estimators of \(\beta_0\) and \(\beta_1\).
Unfortunately, \(\hat{\beta}_1^{OLS}\) is biased, and this bias can be large if the sample size is small or if \(\beta_1\) is near 1. (For \(\beta_1\) near 1, \(\hat{\beta}_1^{OLS}\) can have a severe downward bias.) In moderate to large samples, \(\hat{\beta}_1^{OLS}\) should be a good estimator of \(\beta_1\).
Equation 24.4 is called the sequential exogeneity assumption. Or more generally, using \(\bx\) to represent the vector of covariates, the sequential exogeneity is:
\[ \E(u_t \mid \bx_t, \bx_{t-1}, \ldots,) = \E(u_t) = 0, t=1,2,\ldots. \tag{24.5}\]
We can see that sequential exogeneity implies contemporaneous exogeneity.
24.2 Dynamic Panel Representation
Dynamic models account for temporal dependencies by including the lagged dependent variable, often providing more accurate results than static panel models.
A Nickell bias arises from including the lagged dependent variable as an explanatory variable, making standard panel data estimators (FE, RE, FD) inconsistent, especially in analyses with short T and large N. System-GMM estimation addresses this by instrumenting the endogenous variables with their lagged values.
More generally, consider a panel data linear model with homogeneous coefficients, a random intercept, and a dynamic process in the \(K\)-th order for the outcome: \[ y_{it} = \alpha_i + \sum_{k=1}^K \lambda_k y_{it-k} + \bbeta'\bx_{it} + u_{it}. \tag{24.6}\]
Notice that the RHS now includes the lagged dependent variables \(y_{i,t-k},\) \(k=1,\ldots,K.\) \(\lambda_k\) are the autoregressive coefficients, \(\bx_{it}\) is a vector of regressors. \(\alpha_i\) is an individual-effect, and \(u_{it}\) is an idiosyncratic error.
It is conventional to assume that \(\alpha_i\) and the error \(u_{it}\) are mutually independent
\[ \cov(\alpha_i, u_{it}) = 0, \]
and \(u_{it}\) are serially uncorrelated:
\[ \E[u_{it}u_{is}] = 0 \quad \text{for } t \ne s. \]
Adding dynamics to a model makes major changes in the interpretation of the equation. The entire history of the LHS variable enters into the equation. Any measured influence is conditioned on this history. Any impact of \(\bx_{it}\) represents the effects of new information.
In Equation 24.6, a sequential exogeneity assumption takes form \[ \E[u_{it}\mid \curl{y_{is-1}, \bx_{is}}_{s\leq t}] =0. \]
For the sake of simplicity, we will discuss a simple version of model (24.6) without extra covariates \(\bx_{it}\) and with only one lag of \(y_{it}\): \[ \color{#008B45} {y_{it} = \alpha_i + \lambda y_{it-1} + u_{it}, } \tag{24.7}\] where \(i=1,\ldots,N,\) and \(t=2,\ldots,T.\) And \(u_{it}\) satisfies sequential exogeneity in the form \(\E[u_{it}|y_{is}, s<t] = 0.\)
A popular class of estimators that are consistent as \(N\to\infty\) with \(T\) fixed first transform the model to eliminate the individual effects \(\alpha_i\), and then apply instrumental variables.
Let us difference Equation 24.7 across \(t\) to eliminate the random intercept \(\alpha_i\). The differenced equation takes form
\[ \color{#008B45} {\Delta y_{it} = \lambda \Delta y_{it-1} + \Delta u_{it}} , \tag{24.8}\]
for \(t=3,\ldots,T,\) where \(\Delta y_{it} = y_{it} - y_{it-1}.\) Equation 24.8 seems like a simple regression equation. It is tempting to just apply OLS and regress \(\Delta y_{it}\) on \(\Delta y_{it-1}\). But it turns out that Equation 24.8 has an endogeneity problem. To prove this, we can show that \(\E[\Delta y_{it-1} \Delta u_{it}] \ne 0.\)
Plug in \(\Delta y_{it-1} = y_{it-1}-y_{it-2}\) and \(\Delta u_{it} = u_{it}-u_{it-1}.\)
\[ \begin{aligned} \E[\Delta y_{it-1} \Delta u_{it}] &= \E[(y_{it-1}-y_{it-2})(u_{it}-u_{it-1})] \\ &= \underbrace{\E[y_{it-1}u_{it}]}_{0} - \underbrace{\E[y_{it-2}u_{it}]}_0 - \E[y_{it-1}u_{it-1}] + \underbrace{\E[y_{it-2}u_{it-1}]}_0 \\ &= -\E[y_{it-1}u_{it-1}] \\ &= -\E[(\alpha_i + \lambda y_{it-2} + u_{it-1}) u_{it-1}] \\ &= -\E[u_{it-1}^2] \\ &= -\sigma_u^2 \ne 0 \end{aligned} \]
Hence we conclude that
\[ \E[\Delta y_{it-1} \Delta u_{it}] \ne 0, \] and that \(\Delta y_{it-1}\) is an endogeneous regressor in the differenced Equation 24.8. Hence, we see that the first difference estimator is biased.
Note that the bias is large when \(T\) is relatively small, which is common for micro data. When \(T\) is large the Nickell’s bias is relatively small.
The dynamic panel data (DPD) models are designed to account for this endogeneity.
24.2.1 IV Estimator of Dynamic Panel Models
Given the endogeneity issue in first-differencing, instrumental variable methods offer a potential solution. We only need to find suitable instruments – variables \(z_{it}\) which satisfy relevance and exogeneity conditions: \[ \begin{aligned} \E[z_{it}\Delta y_{it-1}] \neq 0,\\ \E[z_{it}\Delta u_{it}] = 0. \end{aligned} \]
In most contexts, one has to look for external variables that can serve as instruments.
However, model (24.8) is a very special case where one can use instruments internal to the model!
24.2.1.1 The Anderson-Hsiao Estimator
In particular, consider using twice lagged level of dependent variable \(y_{it-2}\) as an instrument for \(\Delta y_{it-1}.\)
Relevance seems to be relatively straightforward, as the target endogenous variable \(\Delta y_{it-1} = y_{it-1}-y_{it-2}\) actually has the instrument inside.
\[ \E[y_{it-2}\Delta y_{it-1}] = \E[y_{it-2}(y_{it-1}-y_{it-2})] \ne 0. \]
\(y_{it-2}\) is correlated with the endogenous variable \(\Delta y_{it-1}.\)
Exogeneity can be justified by appealing to sequential exogeneity:
\[ \E[y_{it-2}\Delta u_{it}] = \E[y_{it-2}u_{it} - y_{it-2}u_{it-1}]=0. \tag{24.9}\]
\(y_{it-2}\) is not correlated with the error term \(\Delta u_{it}.\) The orthogonality condition (24.9) ensures the validity of the instrument.
Hence \(y_{it-2}\) is a valid instrument for \(\Delta y_{it-1}.\)
The resulting IV estimator (\(\hat{\bbeta}_{\mathrm{iv}} = (\bZ'\bX)^{-1}(\bZ'\by)\)) is known as the Anderson and Hsiao (1982) estimator and given by \[ \begin{split} \hat{\lambda}_{AH}^{\mathrm{iv}} &= (\bZ'\Delta\by_{-1})^{-1} (\bZ'\Delta\by) \\ &= \dfrac{ \sum_{i=1}^N \sum_{t=3}^T y_{it-2}\Delta y_{it} }{\sum_{i=1}^N \sum_{t=3}^T y_{it-2}\Delta y_{it-1} }. \end{split} \] where
- \(\Delta \by\) is the stacked \(N(T-2)\times 1\) vector of observations on \(\Delta y_{it},\)
- \(\Delta \by_{-1}\) is the stacked \(N(T-2)\times 1\) vector of observations on \(\Delta y_{i,t-1},\) and
- \(\bZ\) is the stacked \(N(T-2)\times 1\) vector of observations on \(y_{i,t-2}.\)
The 2SLS estimator (\(\widehat{\bbeta}_{2\text{sls}} = \left[\bX'\bZ (\bZ'\bZ)^{-1} \bZ'\bX \right]^{-1} \left[\bX'\bZ (\bZ'\bZ)^{-1} \bZ'\by \right]\)) is given by
\[ \begin{split} \hat{\lambda}_{AH}^{\text{2sls}} &= \left[\Delta\by_{-1}'\bZ (\bZ'\bZ)^{-1} \bZ'\Delta\by_{-1}\right]^{-1} \left[\Delta\by_{-1}'\bZ (\bZ'\bZ)^{-1} \bZ'\Delta\by \right] . \end{split} \]
One might use the twice lagged differences \(\Delta y_{it-2} = y_{it-2}-y_{it-3}\) as an instrument for \(\Delta y_{it-1}.\) But
One further time series observation is lost if \(\Delta y_{i,t-2}\) rather than \(y_{it-2}\) is used as the instrument.
Larger asymptotic variance
The Anderson-Hsiao (AH) estimator delivers consistent but not efficient estimates of the parameters in the model. This is due to the fact that the IV doesn’t exploit all the available moments conditions.
With \(T=3,\) we have one instrument \(y_{i1}\) for \(\Delta y_{i2}\) in the equation
\[ \Delta y_{i3} = \lambda \Delta y_{i2} + \Delta u_{i3} , \quad \text{for } i=1,\ldots, N. \] With \(T>3,\) further valid instruments become available for the first differenced equations in the later time periods. Efficiency can be improved by exploiting these additional instruments.
The IV estimator also ignores the structure of the error component in the transformed model.
- The autocorrelation in the first differenced errors leads to inconsistency of the IV estimates.
- The IV estimates would be inconsistent when other regressors are correlated with the error term.
24.2.1.2 The Arellano-Bond Estimator
The orthogonality condition (24.9) is one of many implied by the dynamic panel model. Indeed, all lags \(y_{it-2},\) \(y_{it-3},\) are valid instruments. If \(T > p + 2\) (\(p\) is the order of autocorrelation) these can be used to potentially improve estimation efficiency. This was first pointed out by Holtz-Eakin, Newey, and Rosen (1988) and further developed by Arellano and Bond (1991).
Arellano and Bond (1991) expanded the idea by using additional lags of the dependent variable as instruments.
Starting point: the FD estimator
\[ \Delta y_{it} = \lambda \Delta y_{it-1} + \Delta u_{it}, \]
Valid instruments:
\(t = 2 \text{ or } t = 1\): no instruments,
\(t = 3\): the valid instrument for \(\Delta y_{i2} = (y_{i2} - y_{i1})\) is \(y_{i1}\), the moment condition is
\[ \E[y_{i1}\Delta u_{i3}] = 0 \text{ for }i=1,2,\ldots,N \]
- \(t = 4\): the valid instruments for \(\Delta y_{i3} = (y_{i3} - y_{i2})\) are \(y_{i2}\) as well as \(y_{i1}\), the moment conditions are
\[ \begin{aligned} \E[y_{i1}\Delta u_{i4}] &= 0 \\ \E[y_{i2}\Delta u_{i4}] &= 0 \end{aligned} \]
- \(t = 5\): the valid instruments for \(\Delta y_{i4} = (y_{i4} - y_{i3})\) are \(y_{i3}\) as well as \(y_{i2}\) and \(y_{i1}\), the moment conditions are
\[ \begin{aligned} \E[y_{i1}\Delta u_{i5}] &= 0 \\ \E[y_{i2}\Delta u_{i5}] &= 0 \\ \E[y_{i3}\Delta u_{i5}] &= 0 \\ \end{aligned} \]
- \(t = 6\): the valid instruments for \(\Delta y_{i5} = (y_{i5} - y_{i4})\) are \(y_{i4}\) as well as \(y_{i3}\), \(y_{i2}\), and \(y_{i1}\), the moment conditions are
\[ \begin{aligned} \E[y_{i1}\Delta u_{i6}] &= 0 \\ \E[y_{i2}\Delta u_{i6}] &= 0 \\ \E[y_{i3}\Delta u_{i6}] &= 0 \\ \E[y_{i4}\Delta u_{i6}] &= 0 \\ \end{aligned} \]
- \(t = T\): the valid instruments for \(\Delta y_{iT-1} = (y_{iT-1} - y_{iT-2})\) are \(y_{iT-2}\) as well as \(y_{iT-3}, \dots, y_{i1}\), the moment conditions are
\[ \begin{aligned} \E[y_{i1}\Delta u_{iT}] &= 0 \\ \E[y_{i2}\Delta u_{iT}] &= 0 \\ \E[y_{i3}\Delta u_{iT}] &= 0 \\ \vdots \\ \E[y_{i,T-2}\Delta u_{iT}] &= 0 \\ \end{aligned} \]
As \(t\) increases, the number of instruments available also increases. The total number of instruments will be quadratic in \(T.\) One disadvantage of this strategy should be apparent. If \(T <10\), that may be a manageable number, but for a longer time series, it may be necessary to restrict the number of past lags used.
We’ll have an instrument matrix with one row for each time period that we are instrumenting. \(Z_i\) is the corresponding matrix of instruments for the lagged difference.
\[ Z_i = \begin{bmatrix} y_{i,1} & 0 & 0 & 0 & 0 & 0 & \cdots & 0 & 0 \\ 0 & y_{i,1} & y_{i,2} & 0 & 0 & 0 & \cdots & 0 & 0 \\ 0 & 0 & 0 & y_{i,1} & y_{i,2} & y_{i,3} & \cdots & 0 & 0 \\ \vdots & \vdots & \vdots & \vdots & \vdots & \ddots & \cdots & \vdots & \vdots \\ 0 & 0 & 0 & 0 & \cdots & 0 & y_{i,1} & \cdots & y_{i,T-2} \end{bmatrix} \]
The number of rows of \(Z_i\) is \(T-2.\) The number of columns of \(Z_i\) depends on the length of \(T,\) it can grow very quickly with \(T:\)
\[ 1+ 2 + \cdots + (T-2) = \sum_{k=1}^{T-2} k = \frac{(T-1)(T-2)}{2} \] Rewrite the model in the vector form
\[ \Delta \by_{i\cdot} = \lambda \Delta y_{i\cdot-1} + \Delta u_{i\cdot}, \] where
\[ \begin{split} \Delta y_{i\cdot} = \begin{bmatrix} \Delta y_{i3} \\ \Delta y_{i4} \\ \vdots \\ \Delta y_{iT} \end{bmatrix}, \quad \Delta y_{i\cdot-1} = \begin{bmatrix} \Delta y_{i2} \\ \Delta y_{i3} \\ \vdots \\ \Delta y_{i,T-1} \end{bmatrix}, \quad \Delta u_{i\cdot} & = \begin{bmatrix} \Delta u_{i3} \\ \Delta u_{i4} \\ \vdots \\ \Delta u_{iT} \end{bmatrix}. \\ & \phantom{=} (T-2) \times 1 \end{split} \]
Arellano and Bond (1991) suggested using a GMM approach based on all available moment conditions:
\[ \E[Z_i'\Delta u_{i\cdot}] = \boldsymbol{0} \]
Sample analogue
\[ J_N(\lambda) = \frac{1}{N} \sum_{i=1}^N \bZ'\Delta u_{i\cdot} \]
Alternative notation: Sometimes you see the moment conditions as \(\E[Z_i' u_{i\cdot}^*] = \boldsymbol{0}\), where \(u_{i\cdot}^*\) refers to the first difference transformed errors.
There are all together \(\frac{(T-1)(T-2)}{2}\) conditions.
The one-step Arellano-Bond estimator is a GMM estimator:
\[ \hat{\lambda}^{GMM} = Q_1^{-1} \left(\sum_{i=1}^N \Delta y_{i\cdot -1}^{\prime} Z_i\right) A_1 \left(\sum_{i=1}^N Z_i' \Delta y_i \right) \] where
\[ Q_1 = \left(\sum_{i=1}^N \Delta y_{i\cdot -1}^{\prime} Z_i\right) A_1 \left(\sum_{i=1}^N Z_i' \Delta y_{i\cdot -1} \right) \] and
\[ \begin{aligned} A_1 &= \left( \sum_{i=1}^N Z_i^\prime H_i Z_i \right)^{-1} , \\ H_i &= \E[\Delta u_i \Delta u_i^{\prime}] = \sigma_u^2 \begin{bmatrix} 2 & -1 & 0 & \cdots & 0 & 0 \\ -1 & 2 & -1 & \cdots & 0 & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & 0 & \cdots & 2 & -1 \\ 0 & 0 & 0 & \cdots & -1 & 2 \\ \end{bmatrix} . \end{aligned} \]
\(A_1\) is the optimal weighting matrix: \(H_i\) is the covariance matrix of the differenced idiosyncratic errors. Note that \(H_i\) implies homoskedesticity while taking into account the dynamic structure of the error term.
The two-step estimator is based on one-step estimator:
\[ \hat{\lambda}^{GMM} = Q_2^{-1} \left(\sum_{i=1}^N \Delta y_{i\cdot -1}^{\prime} Z_i\right) A_2 \left(\sum_{i=1}^N Z_i' \Delta y_i \right) \] where
\[ Q_2 = \left(\sum_{i=1}^N \Delta y_{i\cdot -1}^{\prime} Z_i\right) A_2 \left(\sum_{i=1}^N Z_i' \Delta y_{i\cdot -1} \right) \] and
\[ \begin{aligned} A_1 &= \left( \sum_{i=1}^N Z_i^\prime G_i Z_i \right)^{-1} , \\ G_i &= \widehat{\Delta u_i}\; \widehat{\Delta u_i}^{\prime} . \end{aligned} \] That is, the only difference between one-step and two-step estimator is the weighting matrix \(A_1\) and \(A_2\). \(A_2\) uses the one-step residual from the one-step estimation.
Simulation studies have suggested very modest efficiency gain from using the two-step version, even in the presence of considerable heteroskedasticity. More importantly the dependence of the two-step weight matrix on estimated parameters makes the usual asymptotic distribution approximations less reliable for the two step estimator. Simulation studies have shown that the asymptotic standard errors tend to be much too small for the two-step estimator. These are the reasons that much applied work has focused on the results of one-step estimator.
The Arellano-Bond (AB) estimator is suitable for situations with:
- “small \(T\), large \(N\)” panels: few time periods and many individual units
- a linear functional relationship
- one left-hand variable that is dynamic, depending on its own past realizations
- right-hand variables that are not strictly exogenous: correlated with past and possibly current realizations of the error
- fixed individual effects, implying unobserved heterogeneity
- heteroskedasticity and autocorrelation within individual units’ errors, but not across them
The AB estimator is usually called difference GMM.
A potential weakness in the Arellano-Bond DPD estimator was revealed in later work by Arellano and Bover (1995) and Blundell and Bond (1998).
The lagged levels are often rather poor instruments for first differenced variables, especially if the variables are close to a random walk, i.e., \(\lambda\) is close to unity.
For long panel (large \(T\)) the number of instruments increases dramatically, i.e., \(\frac{(T-1)(T-2)}{2}.\) This leads to potential loss of efficiency.
By using the lag limits options, you may specify, for instance, that only lags 2–5 are to be used in constructing the GMM instruments.
Diagnostic Tests for Dynamic Panel Models
AR test for the error term
Consistency of the GMM estimator bases on the assumption that the transformed error term is not serially correlated for lags greater or equal to two (AR1 is autocorrelated by construction), i.e., \(\E[\Delta u_{it} \Delta u_{it-k}] = 0\) for \(k\ge 2.\) It’s crucial to test whether the second-order autocorrelation is zero for all periods in the sample.
If a significant AR(2) statistic is encountered, the second lags of endogenous variables will not be appropriate instruments for their current values.
\(H_0\): NO autocorrelation.
Expected to reject AR(1) (\(P<.05\)) but NOT AR(2) and higher orders (\(P>.05\)).
Overidentification test (Over-restriction test, Sargan-Hansen test)
When \(T > 3\) and the model is over-identified, a Sargan–Hansen test can be used to test the overidentifying restrictions.
\(H_0\): The instruments used are valid or the moment conditions are valid.
\(H_0\) is true if the error terms in the moment conditions are uncorrelated with the instruments:
\[ \E[Z_i'\Delta u_{i\cdot}] = \boldsymbol{0} \]
Their modification of the estimator includes lagged levels as well as lagged differences. The expanded estimator is commonly termed System GMM. The cost of the System GMM estimator involves a set of additional restrictions on the initial conditions of the process generating \(y.\)
24.2.2 Foreign Aid Example Revisited
Here we present the Arellano-Bond estimator GMM for the foreign aid example.
The Baseline “Level” Equation:
\[Growth_{it} = \gamma Growth_{i,t-1} + \beta Aid_{it} + \alpha_i + u_{it}\]
- \(Growth_{i,t-1}\): Last year’s economic growth.
- \(\alpha_i\): The fixed effect (time-invariant country traits like geography).
- \(u_{it}\): The idiosyncratic error (yearly shocks, like an earthquake).
Step 1: The First-Differenced Equation
To mathematically delete the unobserved baggage (\(\alpha_i\)), we subtract year \(t-1\) from year \(t\): \[\Delta Growth_{it} = \gamma \Delta Growth_{i,t-1} + \beta \Delta Aid_{it} + \Delta u_{it}\]
- Note: \(\Delta Growth_{it}\) is just shorthand for \((Growth_{it} - Growth_{i,t-1})\). Because \(\alpha_i - \alpha_i = 0\), the country’s permanent traits are wiped out.
Step 2: The Moment Conditions (The Instruments)
Because \(\Delta Aid_{it}\) is still correlated with \(\Delta u_{it}\) (the feedback loop), the model uses deep historical lags as instruments. The specification relies on this mathematical assumption (sequential exogeneity): \[\mathbb{E}[Aid_{i,t-s} \Delta u_{it}] = 0 \quad \text{for } s \geq 2\]
- Aid from 2 or more years ago (\(t-2, t-3,\) etc.) is completely uncorrelated with the change in economic shocks today. The software uses these past, clean values to calculate the true \(\beta\).
Fixed Effects Instrumental Variables (FE-IV / 2SLS)
Alternatively, if you have a strong external instrument, you can use a fixed effects instrumental variables (FE-IV) approach. This is often called Two-Stage Least Squares (2SLS) with fixed effects.
Let’s use an external variable, the GDP of Donor Countries (\(Z_{it}\)), as our instrument for Foreign Aid (\(Aid_{it}\)). The idea is that the donor’s economic health influences how much aid they send, but it has no direct effect on the recipient country’s growth (satisfying the exclusion restriction).
Step 1: The Filter (Predicting Clean Aid)
First, we regress our endogenous variable (\(Aid\)) on our clean instrument (\(Z\)) and the country fixed effects: \[ Aid_{it} = \pi Z_{it} + \alpha_i + v_{it} \]
- \(Z_{it}\): The Instrument (e.g., Donor GDP).
- \(\pi\): How strongly the Donor GDP predicts the Aid sent to Country \(i\).
- \(v_{it}\): The “muddy” surprise aid that we are going to throw away.
From this first stage, the software saves the predicted, clean values of aid, denoted as \(\widehat{Aid}_{it}\).
Step 2: The True Effect
Next, we swap out the real, tainted aid for our predicted, clean aid: \[Growth_{it} = \beta \widehat{Aid}_{it} + \alpha_i + u_{it}\]
- \(\beta\): This is now the true, causal effect of Foreign Aid on Growth, because \(\widehat{Aid}_{it}\) has been entirely cleansed of the local economic shocks (\(u_{it}\)).
The Two Mandatory Conditions for IV:
For this specification to be valid, your instrument (\(Z_{it}\)) must pass two strict tests:
- Relevance Condition: \(Cov(Z_{it}, Aid_{it} \mid \alpha_i) \neq 0\) (Donor GDP must genuinely affect how much aid is sent).
- Exclusion Restriction (Exogeneity): \(\mathbb{E}[Z_{it} u_{it}] = 0\) (Donor GDP must have absolutely no correlation with the local earthquake in Country \(i\)).
References
- Chapter 12, “Serial Correlation and Heteroskedasticity in Time Series Regressions.” Introductory Econometrics: A Modern Approach, 7e by Jeffrey M. Wooldridge
- Vladislav Morozov, “Interlude: Standard Dynamic Panel Estimators.” Econometrics with Unobserved Heterogeneity, Course material, https://vladislav-morozov.github.io/econometrics-heterogeneity/linear/linear-dynamic-panel-iv.html#extra-endogeneity-and-the-within-estimator
- Dynamic panel in Stata: https://janditzen.github.io/xtdcce2/