Generalized Method of Moments

Model Setup

GMM formulates a set of orthogonality restrictions (moment conditions) related to an econometric model, and finds parameter estimates that come as close as possible to achieving these orthogonality properties in the sample.

\[\begin{equation}\label{eq-model-gmm} \begin{split} y_i &= \bx_i'\bbeta + \varepsilon_i, \quad i=1,\ldots,N \\ g(\bx_i)&=\bz_i \quad s.t.\; \E[\varepsilon_i \mid \bz_i] = 0 \end{split} \end{equation}\]

for $K$ variables in $\bx_i$ and for some set of $L$ instrumental variables in $\bz_i,$ where $L\ge K.$

The generalized regression model is a special case where $\bz_i=\bx_i.$

The assumption $\E[\varepsilon_i \mid \bz_i] = 0$ implies the following orthogonality condition:

\[\cov(\bz_i, \varepsilon_i) = \boldsymbol{0},\]

\[\E[\bz_i \varepsilon_i] = \E[\bz_i(y_i - \bx_i'\bbeta)] = \boldsymbol{0}.\]

By summing the terms, we find that this further implies the population moment equation,

\[\begin{equation}\label{eq-pop-ortho-cond} \E\left[\frac{1}{n} \sum_{i=1}^n \bz_i(y_i - \bx_i'\bbeta) \right] = \E[\bar{\bm}(\bbeta)] = 0 \end{equation}\]

where

\[\begin{equation} \label{eq-momenti} \bm_i(\bbeta) = \bz_i(y_i - \bx_i'\bbeta) , \end{equation}\]

and

\[\begin{equation} \bar{\bm}(\bbeta) = \frac{1}{n} \sum_{i=1}^n \bm_i(\bbeta) = \frac{1}{n} \sum_{i=1}^n \bz_i(y_i - \bx_i'\bbeta) . \end{equation}\]

Eq $\eqref{eq-momenti}$ is a $L\times 1$ function of the $i$-th observation and $K\times 1$ parameter $\bbeta.$

If the population relationship $\eqref{eq-pop-ortho-cond}$ holds for the true parameter vector, $\bbeta,$ suppose we attempt to mimic this result with a sample counterpart, or empirical moment equation,

\[\begin{equation}\label{eq-gmm-empirical-moment} \left[\frac{1}{n} \sum_{i=1}^n \bz_i(y_i - \bx_i'\hat{\bbeta}) \right] = \left[\frac{1}{n} \sum_{i=1}^n \bm_i(\hat{\bbeta}) \right] = \bar{\bm}(\hat{\bbeta}) = \boldsymbol{0}, \end{equation}\]

The methods of moments estimator (MME) for $\bbeta$ is defined as the parameter value which sets $\bar{\bm}(\bbeta)=0.$

For the identified cases, it is convenient to write $\eqref{eq-gmm-empirical-moment}$ as

\[\bar{\bm}(\hat{\bbeta}) = \left(\frac{1}{n}\bZ'\by\right) - \left(\frac{1}{n} \bZ'\bX \right)\bbeta = \boldsymbol{0}\]

which is $L$ equations (the number of variables in $\bZ$) in $K$ unknowns (the number of parameters we seek to estimate).

There are three possibilities to consider:

Underidentified if $L<K$

There is no solution as there are fewer moment equations than parameters.
Exactly identified if $L=K$

One single solution exists:
\[\hat{\bbeta} = (\bZ'\bX)^{-1} (\bZ\by) ,\]
which is the same as the IV estimator.
Overidentified if $L>K$

No unique solution to $\bar{\bm}(\hat{\bbeta}) = \boldsymbol{0}.$

In this instance, we choose the estimator based on the “least squares” criterion:
\[\hat{\bbeta}_{\text{gmm}} = \underset{\bbeta}{\arg\min}\; \bar{\bm}(\bbeta)'\bar{\bm}(\bbeta) .\]

Generalized Method of Moments (GMM) includes as special cases OLS, IV, multivariate regression, and 2SLS.

GMM Estimator

Define

\[q(\bbeta) = \bar{\bm}(\bbeta)'\bW_N\bar{\bm}(\bbeta) ,\]

the Generalized Method of Moments (GMM) estimator is defined as the minimizer of the GMM criterion $q(\bbeta).$

$\bW$ is an $L\times L$ weight matrix where $\bW>0$ (positive definite).

When $\bW=\bI_L,$ then

\[q(\bbeta) = \bar{\bm}(\bbeta)'\bar{\bm}(\bbeta) .\]

The first order conditions are

\[\begin{split} \frac{\partial }{\partial \bbeta} q(\hat{\bbeta}) &= 2 \left(\frac{\partial \bar{\bm}'(\hat{\bbeta})}{\partial \bbeta}\right) \bar{\bm}(\hat{\bbeta}) \\ &= 2 \bar{\bG}(\hat{\bbeta})' \bar{\bm}(\hat{\bbeta}) \\ &= \boldsymbol{0} . \end{split}\]

where

\[\begin{split} \color{#008B45}\bar{\bG}(\hat{\bbeta})' &\color{#008B45}= \frac{\partial \bar{\bm}'(\hat{\bbeta})}{\partial \bbeta} \\ &= \frac{\partial}{\partial \bbeta} \left( \frac{1}{n} \sum_{i=1}^n y_i'\bz_i' - \bbeta'\bx_i\bz_i' \right) \\ &= -\frac{1}{n} \sum_{i=1}^n \bx_i\bz_i' \\ &= \color{#008B45}-\frac{1}{n} \bX'\bZ \end{split}\]

Hence

\[\begin{split} \frac{\partial }{\partial \bbeta} q(\hat{\bbeta}) &= 2 \bar{\bG}(\hat{\bbeta})' \bar{\bm}(\hat{\bbeta}) \\ &= 2 \left(-\frac{1}{n}\bX'\bZ\right) \left(\frac{1}{n}\bZ'\by - \frac{1}{n}\bZ'\bX\hat{\bbeta} \right) \\ &= \boldsymbol{0} . \end{split}\]

Then we have

\[\bX'\bZ\bZ'\by = \bX'\bZ\bZ'\bX\hat{\bbeta}\]

The GMM estimator is given by:

\[\hat{\bbeta}_{\text{gmm}} = [(\bX'\bZ)(\bZ'\bX)]^{-1} (\bX'\bZ)(\bZ'\by) .\]

When $\bW$ is not an identity matrix, the FOC are

\[\begin{split} \frac{\partial }{\partial \bbeta} q(\hat{\bbeta}) &= 2 \left(\frac{\partial \bar{\bm}'(\hat{\bbeta})}{\partial \bbeta}\right) \bW \bar{\bm}(\hat{\bbeta}) \\ &= 2 \bar{\bG}(\hat{\bbeta})' \bW \bar{\bm}(\hat{\bbeta}) \\ &= \boldsymbol{0} . \end{split}\]

The GMM estimator is given by

\[\begin{equation} \begin{split} \hat{\bbeta}_{\text{gmm}} &= \left[(\bX'\bZ)\bW (\bZ'\bX) \right]^{-1} \left[(\bX'\bZ) \bW (\bZ'\by) \right] \\ &= \left[ \left(\frac{1}{n} \bX'\bZ \right)\bW \left(\frac{1}{n} \bZ'\bX \right) \right]^{-1} \left[ \left(\frac{1}{n} \bX'\bZ \right) \bW \left(\frac{1}{n}\bZ'\by \right) \right] . \end{split} \end{equation}\]

Note that

When $\bW=(\bZ’\bZ)^{-1}$ then

\[\hat{\bbeta}_{\text{gmm}} = \hat{\bbeta}_{\text{2sls}} .\]

Furthermore, if $L=K$ then

\[\hat{\bbeta}_{\text{gmm}} = \hat{\bbeta}_{\text{iv}}.\]

Distribution of GMM Estimator

Let

\[\bQ_{\bZ\bX} = \bQ_{\bX\bZ}' = \E(\bz_i \bx_i') = \text{plim} \left(\frac{1}{n}\bZ'\bX \right)\]

For simplicity, we use $\bQ$ to denote $\bQ_{\bZ\bX}$ and $\bQ_{\bX\bZ}’$ here.

Note that $\bQ$ is equal to the limit of $-\bar{\bG}(\hat{\bbeta}).$

Let

\[\color{#008B45} \bOmega = \E(\bz_i\bz_i'\varepsilon_i^2) = \E(\bm_i\bm_i')\]

where $\bm_i = \bz_i\varepsilon_i.$ $\bOmega$ is the average covariance matrix of the moment conditions.

Then

\[\left(\frac{1}{n} \bX'\bZ \right)\bW \left(\frac{1}{n} \bZ'\bX \right) \xrightarrow{p} \bQ'\bW\bQ\]

and

\[\left(\frac{1}{n} \bX'\bZ \right) \bW \left(\frac{1}{n}\bZ'\by \right) \xrightarrow{d} \bQ'\bW N(\boldsymbol{0}, \bOmega) .\]

The asymptotic distribution of the GMM estimator is

\[\sqrt{n}(\hat{\bbeta}-\bbeta) \xrightarrow{d} N(\boldsymbol{0}, \bV_\bbeta)\]

where

\[\bV_\bbeta = (\bQ'\bW\bQ)^{-1} \bQ'\bW \bOmega \bW\bQ (\bQ'\bW\bQ)^{-1} .\]

and

\[\begin{split} \bQ &= \underset{N\to\infty}{\text{plim}} (\frac{\partial \bar{\bm}(\hat{\bbeta})}{\partial \bbeta'}) \\ \bW &= \underset{N\to\infty}{\text{plim}} W_N \\ \bOmega &= \E(\bz_i\bz_i'\varepsilon_i^2) \end{split}\]

We find that the GMM estimator is asymptotically normal with a “sandwich form” asymptotic variance.

Efficient GMM

The asymptotic distribution of the GMM estimator $\widehat{\boldsymbol{\beta}}_{\text{gmm}}$ depends on the weight matrix $\boldsymbol{W}$ through the asymptotic variance $\bV_\bbeta$.

The asymptotically optimal weight matrix $\bW_0$ is one which minimizes $\bV_\bbeta.$ This turns out to be

\[\bW_0 = \bOmega^{-1}.\]

When the GMM estimator is constructed with $\bW=\bW_0 = \bOmega^{-1},$ we call it the Efficient GMM Estimator:

\[\hat{\bbeta}_{\text{gmm}} = \left[\bX'\bZ \bOmega^{-1} \bZ'\bX \right]^{-1} \left[\bX'\bZ \bOmega^{-1} \bZ'\by \right] .\]

The asymptotic variance of the efficient GMM estimator:

\[\begin{split} \bV_\bbeta &= (\bQ' \bOmega^{-1} \bQ)^{-1} \bQ' \bOmega^{-1} \bOmega \bOmega^{-1} \bQ (\bQ' \bOmega^{-1} \bQ)^{-1} \\ &= (\bQ' \bOmega^{-1} \bQ)^{-1} . \end{split}\]

Note that 2SLS estimator is GMM given the weight matrix

\[\widehat{\bW} = (\bZ'\bZ)^{-1}\]

or equivalently

\[\widehat{\bW} = \left(\frac{1}{n}\bZ'\bZ\right)^{-1} ,\]

since scaling does not matter.

Since

\[\widehat{\bW} \xrightarrow{p} \left(\E(\bz_i\bz_i')\right)^{-1} ,\]

this is equivalent to using the weight matrix

\[\begin{equation} \label{eq-W2sls} \bW_{\text{2sls}} = \left(\E(\bz_i\bz_i')\right)^{-1} . \end{equation}\]

In contrast, the efficient weight matrix takes the form

\[\bW_{\text{e}} = \left(\E(\bz_i\bz_i'\varepsilon_i^2)\right)^{-1} .\]

Suppose the structural equation error $\varepsilon_i$ is conditionally homoskedastic in the sense that

\[\E(\varepsilon_i \mid \bz_i) = \sigma^2.\]

Then

\[\bW_{\text{e}} = \left(\E(\bz_i\bz_i')\right)^{-1} \sigma^{-2},\]

or equivalently

\[\begin{equation} \label{eq-Wefficient} \bW_{\text{e}} = \left(\E(\bz_i\bz_i')\right)^{-1} \end{equation}\]

since scaling does not matter.

Now we see that the efficient weight matrix $\eqref{eq-Wefficient}$ is the same as the the 2SLS asymptotic weight matrix $\eqref{eq-W2sls}.$ This shows that the 2SLS weight matrix is the efficient weight matrix under conditional homoskedasticity.

However, his result also shows that in the general case where the error is conditionally heteroskedastic, then 2SLS is generically inefficient relative to efficient GMM.

Covariance Matrix Estimation

An estimator of the asymptotic variance of $\hat{\bbeta}_{\text{gmm}}$ can be obtained by replacing the matrices in the asymptotic variance formula by consistent estimates.

For the efficient gmm estimator the covariance matrix estimator is

\[\widehat{\bV}_\bbeta = (\widehat{\bQ}'\widehat{\bOmega}\widehat{\bQ})^{-1} .\]

where

\[\widehat{\bQ} = \frac{1}{n} \sum_{i=1}^n \bz_i\bx_i' = \frac{1}{n} \bZ'\bX\]

Two-Step GMM Estimator

The efficient GMM estimator is obtained by setting $\bW = \bOmega^{-1}.$ If we have a consistent estimate $\widehat{\bOmega}$ of $\bOmega$, then $\widehat{\bW} = \widehat{\bOmega}^{-1}$ is a consistent estimator of $\bW.$

When $\bW$ is fixed by the user, we call $\hat{\bbeta}_{\text{gmm}}$ a one-step GMM estimator.

The efficient GMM two-step estimator uses the moment conditions from the one-step estimation. For instance, we can use the 2SLS estimator $\hat{\bbeta}_{\text{2sls}}$ as the one-step estimator. Then we construct the weight matrix estimator

\[\begin{equation} \label{eq-Omega-uncentered} \widehat{\bOmega} = \frac{1}{n}\sum_{i=1}^n \widetilde{\bm}_i \widetilde{\bm}_i' , \end{equation}\]

where

\[\widetilde{\bm}_i = \bz_i (y_i - \bx_i'\hat{\bbeta}_{\text{2sls}}) .\]

An alternative moment estimators of $\bOmega$ is

\[\begin{equation} \label{eq-Omega-centered} \widehat{\bOmega}^{\ast} = \frac{1}{n}\sum_{i=1}^n \left(\widetilde{\bm}_i - \bar{\bm}\right) \left(\widetilde{\bm}_i - \bar{\bm}\right)' , \end{equation}\]

where

\[\bar{\bm} = \frac{1}{n} \sum_{i=1}^n \widetilde{\bm}_i .\]

The estimator $\widehat{\bOmega}$ in $\eqref{eq-Omega-uncentered}$ is an uncentered covariance matrix estimator while the estimator $\widehat{\bOmega}^{\ast}$ in $\eqref{eq-Omega-centered}$ is a centered version.

Either estimator is consistent when

\[\E(\bz_i\varepsilon_i) = \boldsymbol{0}\]

which holds under correct specification.

However under misspecification we may have $\E(\bz_i\varepsilon_i) \ne \boldsymbol{0} .$ In the latter context, $\widehat{\bOmega}^{\ast}$ may be viewed as a robust estimator. For some testing problems it turns out to be preferable to use a covariance matrix estimator which is robust to the alternative hypothesis. For theses reasons estimator $\widehat{\bOmega}^{\ast}$ in $\eqref{eq-Omega-centered}$ is generally preferred.

Iterated GMM

The asymptotic distribution of the two-step GMM estimator does not depend on the choice of the preliminary one-step estimator. However, the actual value of the estimator depends on this choice, and so will the finite sample distribution. This is undesirable and likely inefficient.

To remove this dependence we can iterate the estimation sequence. Specifically, given $\hat{\bbeta}_{\text{gmm}}$ we can construct an updated weight matrix estimate $\widehat{\bW}$ and then re-estimate $\hat{\bbeta}_{\text{gmm}}.$ This updating can be iterated until convergence. The result is called the iterated GMM estimator and is a common implementation of efficient GMM.

Overidentification Test

In the exactly identified cases we examined earlier (least squares, instrumental variables, maximum likelihood), the criterion for GMM estimation

\[q(\bbeta) = \bar{\bm}(\bbeta)'\bW \bar{\bm}(\bbeta)\]

would be exactly zero because we can find a set of estimates for which $\bar{\bm}(\bbeta) $is exactly zero. Thus in the exactly identified case when there are the same number of moment equations as there are parameters to estimate, the weighting matrix W is irrelevant to the solution. But if the parameters are overidentified by the moment equations, then these equations imply substantive restrictions. As such, if the hypothesis of the model that led to the moment equations in the first place is incorrect, at least some of the sample moment restrictions will be systematically violated. This conclusion provides the basis for a test of the overidentifying restrictions. By construction, when the optimal weighting matrix is used,

\[\begin{split} nq(\hat{\bbeta}) &= n\bar{\bm} (\hat{\bbeta})' \widehat{\bOmega}^{-1} \bar{\bm} (\hat{\bbeta}) \\ &= \left[\sqrt{n}\bar{\bm} (\hat{\bbeta})' \right] \curl{\text{Est.Asy. Var}(\sqrt{n}\,\bar{\bm} (\hat{\bbeta}))}^{-1} \left[\sqrt{n}\bar{\bm} (\hat{\bbeta}) \right] \end{split}\]

so $nq$ is a Wald Statistic. Therefore, under the null hypothesis that all instruments are uncorrelated with $\varepsilon_i:$

\[H_0: \E(\bz_i\varepsilon_i) = \boldsymbol{0} ,\]

we have

\[J \equiv nq(\hat{\bbeta}) \xrightarrow{d} \chi^2 [L-K] .\]

the test has a large-sample $\chi^2 [L-K]$ distribution with the degree of freedom equal to the number of overidentifying restrictions $L-K.$

For the exactly identified case, there are zero degrees of freedom and $q= 0.$

The test in this context is known as the Sargan–Hansen test or Hansen’s $J$-test.

The Sargan–Hansen test of overidentifying restrictions should be performed routinely in any overidentified model estimated with instrumental variables techniques. Instrumental variables techniques are powerful, but if a strong rejection of the null hypothesis of the Sargan–Hansen test is encountered, you should strongly doubt the validity of the estimates.

The $J$-test is a test of the model formulation, if

\[J(\hat{\bbeta}) > \text{critical value}\]

then we may want to think again about the model formulation. In practice this test is not very powerful, it can be hard to reject a mis-specified model using this particular test.

References

Chapters 10, 18. Econometric Analysis , 5th Edition, by William H. Greene, Prentice Hall, 2003.
Chapter 13 Generalized Method of Moments. Econometrics. by Bruce Hansen, 2022.