Instrumental Variables Estimation
Structural Equation
Model setup
\[\begin{equation} \begin{split} y_i &= \bx_i'\bbeta + e_i \\ \by &= \bX\bbeta + \be \quad \text{(or in matrix form)} \end{split}\label{eq:model-struct} \end{equation}\]where
\[\E(\bx_ie_i) = 0\]$\bx_i$ is said to be endogenous for $\bbeta.$ We call $\eqref{eq:model-struct}$ a structural equation, $\bbeta$ a structural parameter, and $\be$ structural errors. The name comes from the fact that the structural equation is derivable from economic theory and has a causal interpretation.
We introduce a $\ell\times 1$ random vector $\bz_i$ as an instrumental variable for $\bx_i$. $\bz_i$ needs to satisfy the following conditions:
\[\begin{align} \E(\bz_ie_i) &= 0 \label{eq:con-exo} \\ \E(\bz_i\bz_i') &> 0 \label{eq:con-collinearity} \\ \rank{\left(\E(\bz_i\bx_i')\right)} &= K. \label{eq:con-relevance} \end{align}\]- Condition $\eqref{eq:con-exo}$ requires that the instruments are uncorrelated with the regression error.
- Condition $\eqref{eq:con-collinearity}$ is a normalization which excludes linearly redundant instruments.
- Condition $\eqref{eq:con-relevance}$ is called the relevance condition. A necessary condition is that $\ell\geq K.$
Reduced Form
The reduced form is the relationship between the regressors $\bx_i$ and the instruments $\bz_i:$
\[\begin{equation} \begin{split} \bx_i &= \Gamma'\bz_i + \bu_i \\ \bX &= \bZ\Gamma + \bU \quad \text{(or in matrix form)} \end{split}\label{eq:model-reduced} \end{equation}\]where $\bu_i$ is a $K\times 1$ vector and $\bU$ is a $n\times K$ matrix.
$\bz_i$ is exogenous:
\[\E(\bz_i\bu_i') = \boldsymbol{0}.\]$\Gamma$ can be obtained by
\[\underbrace{\Gamma}_{\ell\times K} = \underbrace{\E(\bz_i\bz_i')^{-1}}_{\ell\times \ell} \;\underbrace{\E(\bz_i\bx_i')}_{\ell\times K}.\]We can also construct a reduced form equation for $y_i$. Substituting $\eqref{eq:model-reduced}$ into $\eqref{eq:model-struct},$ we get
\[\begin{equation} \begin{split} y_i &= (\Gamma'\bz_i + \bu_i)'\bbeta + e_i \\ &= \bz_i'\blambda + v_i \end{split}\label{eq:model-reduced-y} \end{equation}\]where
\[\begin{aligned} \underbrace{\blambda}_{\ell \times 1} &= \underbrace{\Gamma}_{\ell\times K}\; \underbrace{\bbeta}_{K\times 1} \\ \text{and}\quad v_i &= \bu_i'\bbeta + e_i \end{aligned}\]Observe that
\[\E(\bz_i v_i) = \E(\bz_i\bu_i')\bbeta + \E(\bz_ie_i) = \boldsymbol{0}.\]Eq. $\eqref{eq:model-reduced-y}$ is the reduced form for $y_i,$ as it expresses $y_i$ as a function of exogenous variables only.
The reduced form coefficient $\blambda$ can be obtained by
\[\blambda = \E(\bz_i\bz_i')^{-1}\E(\bz_iy_i).\]So far, the reduced form coefficient matrices $\Gamma$ and $\blambda$ are identified based on the moments of the observables $(y_i, \bx_i, \bz_i).$ That is,
\[\begin{equation} \begin{split} \Gamma = \E(\bz_i\bz_i')^{-1} \E(\bz_i\bx_i') \\ \blambda = \E(\bz_i\bz_i')^{-1} \E(\bz_iy_i) . \end{split}\label{eq:reduced-coef} \end{equation}\]These are uniquely determined by the probability distribution of $(y_i, \bx_i, \bz_i).$
We are interested in the structural parameter $\bbeta.$ It relates to $(\blambda, \Gamma)$ through
\[\begin{align} \blambda = \Gamma\bbeta . \label{eq:lambda} \end{align}\]$\bbeta$ is identified if it is uniquely determined by this relation. This is a set of $\ell$ equations with $K$ unknowns with $\ell\geq K.$ There is a unique solution if and only if $\Gamma$ has full rank $K:$
\[\begin{align} \rank{(\Gamma)} = K. \label{eq:full-rank1} \end{align}\]Under $\eqref{eq:full-rank1},$ $\bbeta$ can be uniquely solved fromThe linear system $\blambda = \Gamma\bbeta.$
On the other hand, if $\rank{(\Gamma)} < K$ then $\blambda = \Gamma\bbeta$ has fewer mutually independent linear equations than coefficients so there is not a unique solution.
Plugging $\eqref{eq:reduced-coef}$ into $\eqref{eq:lambda}$ we have
\[\E(\bz_iy_i) = \E(\bz_i\bx_i') \bbeta\]which is again a set of $\ell$ equations with $K$ unknowns. This has a unique solution if and only if
\[\begin{align} \rank{\left( \E(\bz_i\bx_i') \right)} = K . \label{eq:full-rank2} \end{align}\]$\eqref{eq:full-rank1}$ and $\eqref{eq:full-rank2}$ are equivalent ways of expressing the same requirement.
Two scenarios for the solution $\bbeta:$
-
Just identification $\ell = K$
$\ell = K$ implies $\Gamma$ is invertible, so the structural parameter equals
\[\begin{split} \bbeta &= \Gamma^{-1}\blambda \\ &= \E(\bz_i\bx_i')^{-1} \E(\bz_iy_i). \end{split}\]This solution assumes that the matrix $\bZ’\bX$ is invertible, which holds under the relevance condition.
The instrumental variables (IV) estimator $\hat{\bbeta}_{\mathrm{iv}}$ replaces the population moments by their sample versions. That is
\[\begin{equation} \begin{split} \hat{\bbeta}_{\mathrm{iv}} &= \underbrace{\left[(\bZ'\bZ)^{-1}(\bZ'\bX)\right]^{-1}}_{\Gamma^{-1}} \; \underbrace{(\bZ'\bZ)^{-1}(\bZ'\by)}_{\blambda} \\ &= (\bZ'\bX)^{-1}(\bZ'\bZ)(\bZ'\bZ)^{-1}(\bZ'\by) \\ &= (\bZ'\bX)^{-1}(\bZ'\by). \end{split} \label{eq:IV-beta} \end{equation}\]More generally, it is common to refer to any estimator of the form
\[\color{#008B45} \hat{\bbeta}_{\mathrm{iv}} = (\bW'\bX)^{-1} (\bW'\by)\]given an $n\times K$ matrix $\bW$ as an IV estimator for $\bbeta$ using the instrument $\bW.$
-
Over identification $\ell > K$
We can solve for $\bbeta$ by applying least-squares to the system of equations $\blambda = \Gamma\bbeta.$ The least-squares solution is
\[\bbeta = (\Gamma'\Gamma)^{-1}\Gamma'\blambda .\]Under $\eqref{eq:full-rank1}$, the matrix $\Gamma’\Gamma$ is invertible so the solution is unique.
Two-Stage Least Squares
Model set up
\[\begin{equation} \begin{split} & \text{vector form} & & \text{matrix form} \\ y_i &= \bx_i'\bbeta + e_i \qquad\quad & \by &= \bX\bbeta + \be \\ \bx_i &= \Gamma'\bz_i + \bu_i \qquad\quad & \bX &= \bZ\Gamma + \bU \end{split} \end{equation}\]Reformulate the model as
\[\begin{equation} \begin{split} y_i &= \bz_i'\Gamma \bbeta + v_i \\ \E(\bz_iv_i) &= \boldsymbol{0}. \end{split} \label{eq:2sls1} \end{equation}\]Define
\[\begin{aligned} \underbrace{\bw_i}_{K\times 1} &= \underbrace{\Gamma'}_{K\times \ell} \; \underbrace{\bz_i}_{\ell\times 1} \\ \underbrace{W}_{n\times K} &= \underbrace{\bZ}_{n\times \ell} \;\underbrace{\Gamma}_{\ell \times K} \quad \text{(or in matrix form)} \end{aligned}\]We can rewrite $\eqref{eq:2sls1}$ as
\[\begin{equation} \begin{split} y_i &= \bw_i' \bbeta + v_i \\ \E(\bw_iv_i) &= \boldsymbol{0}. \end{split} \label{eq:2sls} \end{equation}\]Suppose that $\Gamma$ were known. Then we would estimate $\bbeta$ by least-squares of $y_i$ on $\bw_i$
\[\begin{equation} \begin{split} \hat{\bbeta} &= (\bW'\bW)^{-1}(\bW'y) \\ &= (\Gamma'\bZ'\bZ\Gamma)^{-1} (\Gamma'\bZ'y) \end{split} \label{eq:2sls-beta} \end{equation}\]While this is infeasible, we can estimate $\Gamma$ from the reduced form regression
\[\bX = \bZ\Gamma + \bU .\]The OLS estimate of $\Gamma$ is
\[\widehat{\Gamma} = (\bZ'\bZ)^{-1} (\bZ'\bX)\]Replacing $\Gamma$ with $\widehat{\Gamma}$ in $\eqref{eq:2sls-beta}$, we obtain
\[\begin{equation} \begin{split} \color{#008B45} \widehat{\bbeta}_{2\text{sls}} &= (\widehat{\Gamma}'\bZ'\bZ\widehat{\Gamma})^{-1} (\widehat{\Gamma}'\bZ'y) \\ &= \left[\bX'\bZ (\bZ'\bZ)^{-1} (\bZ'\bZ) (\bZ'\bZ)^{-1} \bZ'\bX \right]^{-1} \left[\bX'\bZ (\bZ'\bZ)^{-1} \bZ'\by \right] \\ &= \color{#008B45} \left[\bX'\bZ (\bZ'\bZ)^{-1} \bZ'\bX \right]^{-1} \left[\bX'\bZ (\bZ'\bZ)^{-1} \bZ'\by \right] . \end{split} \end{equation}\]This is called the two-stage-least squares (2SLS) estimator.
If the model is just-identified, so that $K=\ell,$ then 2SLS simplifies to the IV estimator in $\eqref{eq:IV-beta}.$
Since the matrices $\bX’\bZ$ and $\bZ’\bX$ are square, we can factor
\[\left[\bX'\bZ (\bZ'\bZ)^{-1} \bZ'\bX \right]^{-1} = (\bZ'\bX)^{-1}(\bZ'\bZ)(\bX'\bZ)^{-1}.\]Then
\[\begin{equation} \begin{split} \widehat{\bbeta}_{2\text{sls}} &= (\bZ'\bX)^{-1}(\bZ'\bZ)(\bX'\bZ)^{-1} \bX'\bZ (\bZ'\bZ)^{-1} \bZ'\by \\ &= (\bZ'\bX)^{-1} \bZ'\by \\ &= \widehat{\bbeta}_{\text{iv}} \end{split} \end{equation}\]as claimed. This shows that the 2SLS estimator is a generalization of the IV estimator.
Alternative representation of the 2SLS estimator
Define the projection matrix
\[\bP_\bZ = \bZ(\bZ'\bZ)^{-1}\bZ'.\]We can write the 2SLS estimator more compactly as
\[\widehat{\bbeta}_{2\text{sls}} = (\bX'\bP_\bZ\bX)^{-1} \bX'\bP_\bZ \by .\]Define the fitted values for $\bX$ from the reduced form
\[\widehat{\bX} = \bP_\bZ\bX = \bZ \widehat{\Gamma} .\]Then the 2SLS estimator can be written as
\[\begin{equation} \begin{split} \widehat{\bbeta}_{2\text{sls}} &= (\bX'\bP_\bZ\bX)^{-1} \bX'\bP_\bZ \by \\ &= (\bX'\bP_\bZ'\bP_\bZ\bX)^{-1} \bX'\bP_\bZ' \by \quad \text{($\bP_\bZ$ is symmetric and idempotent)} \\ &= (\widehat{\bX}'\widehat{\bX})^{-1}\widehat{\bX}'\by \end{split} \end{equation}\]which is the least-squares estimator obtained by regressing $\by$ on the fitted values $\widehat{\bX}.$
This is the source of the “two-stage” name since it can be computed as follows.
-
Regress $\bX$ on $\bZ$, get
\[\begin{aligned} \widehat{\Gamma} = (\bZ'\bZ)^{-1} (\bZ'\bX) \end{aligned}\]and the fitted values
\[\widehat{\bX} = \bZ\widehat{\Gamma} = \bP_\bZ\bX .\] -
Regress $\by$ on $\widehat{\bX}$
\[\widehat{\bbeta}_{2\text{sls}} = (\widehat{\bX}'\widehat{\bX})^{-1}\widehat{\bX}'\by .\]