Model Setup

The generalized linear regression model is

\[\begin{equation} \begin{split} \by &= \bX\bbeta + \bvarepsilon, \\ \E[\bvarepsilon \mid \bX] &= 0, \\ \E[\bvarepsilon\bvarepsilon' \mid \bX] &= \sigma^2\bOmega = \bSigma, \\ \end{split} \label{eq-model-general} \end{equation}\]

where $\bOmega$ is a positive definite matrix.


Spherical Disturbances

Homoskedasticity and Non-Autocorrelation

Homoskedastic and Non-Autocorrelated Errors.

The left figure is joint probability density, $p(\varepsilon_i, \varepsilon_j).$ The right figure is a contour of the above perspective. If we consider the joint distribution of three error terms instead of two, the circles below would become spheres, hence the terminology “spherical disturbances.”


Nonspherical Disturbances

Heteroskedasticity and Non-Autocorrelation

Disturbances have different variances but are still assumed to be uncorrelated across observations, so $\sigma^2\bOmega$ would be

\[\sigma^2\bOmega = \sigma^2\begin{bmatrix} \omega_{11} & 0 & \cdots & 0 \\ 0 & \omega_{22} & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \omega_{nn} \end{bmatrix} = \begin{bmatrix} \sigma^2_1 & 0 & \cdots & 0 \\ 0 & \sigma^2_2 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \sigma^2_n \end{bmatrix}\]
Heteroskedasticity and Non-Autocorrelation
Heteroskedastic and Non-Autocorrelated Errors.

Autocorrelation and Homoskedasticity

\[\sigma^2\bOmega = \sigma^2\begin{bmatrix} 1 & \rho_1 & \cdots & \rho_{n-1} \\ \rho_1 & 1 & \cdots & \rho_{n-2} \\ \vdots & \vdots & \ddots & \vdots \\ \rho_{n-1} & \rho_{n-2} & \cdots & 1 \end{bmatrix}\]

In most cases, the values that appear off the diagonal decline as we move away from the diagonal.

\[\bOmega = \sigma^2 \begin{bmatrix} 1 & \rho & \cdots & \rho^{n-1} \\ \rho & 1 & \cdots & \rho^{n-2} \\ \vdots & \vdots & \ddots & \vdots \\ \rho^{n-1} & \rho^{n-2} & \cdots & 1 \\ \end{bmatrix} = \bOmega (\rho)\]

Hence, $\bOmega$ depends on only one parameter $\rho.$ As long as we have a consistent estimator for $\rho$, this will give us a consistent $\hat{\bOmega},$ by Slutsky’s Theorem.

Autocorrelation and Homoskedasticity
Autocorrelated and Homoskedastic Errors.

Heteroskedasticity and Autocorrelation

Heteroskedastic and Autocorrelated Errors.

How do nonspherical disturbances affect OLS estimator?

  • $\hat{\bbeta}^{\text{OLS}} = (\bX’\bX)^{-1} (\bX’\by)$ is still unbiased.

  • But the covariance matrix of $\hat{\bbeta}^{\text{OLS}}$ is not $\sigma^2(\bX’\bX)^{-1}$ any more. $\rightarrow$ The usual computer output will be misleading, numerically, as it will be based on using the wrong formula, namely $s^2(\bX’\bX)^{-1}.$ $\rightarrow$ The standard errors, $t$-statistics, etc. will all be incorrect.

    Without nonstochastic regressors, or conditional on $\bX,$ the sampling variance of the least squares estimator is

    \[\begin{aligned} \var{(\hat{\bbeta}\mid \bX)} &= \var{\left(\bbeta + (\bX'\bX)^{-1} (\bX'\bvarepsilon)\mid \bX\right)} \\ &= \E \left[(\bX'\bX)^{-1} (\bX'\bvarepsilon \bvarepsilon'\bX) (\bX'\bX)^{-1} \mid \bX \right] \\ &= (\bX'\bX)^{-1} \bX' (\sigma^2\bOmega) \bX (\bX'\bX)^{-1} . \end{aligned}\]

    $\hat{\bbeta}^{\text{OLS}}$ is a linear function of $\bvarepsilon.$ If $\bvarepsilon$ is normally distributed, then $\hat{\bbeta}^{\text{OLS}}$ is normally distributed with

    \[\bvarepsilon \mid \bX \sim N\left[ \bbeta, \sigma^2(\bX'\bX)^{-1} \bX' \bOmega \bX (\bX'\bX)^{-1} \right].\]
  • We need generalized estimators to deal with these issues.


  • Generalized Method of Moments (GMM) estimation makes minimal assumptions about $\bOmega$, which is robust and semiparametric.

  • Generalized least squares (GLS) and maximum likelihood estimations formulate parametric models that make specific assumptions about $\bOmega.$


GLS Estimator

Generalized least squares (GLS) does a transformation to the original model $\eqref{eq-model-general}$

\[\by = \bX\bbeta + \bvarepsilon\]

Since $\bOmega$ is a positive-definite symmetric matrix, it can be factored into

\[\bOmega = \bC\Lambda\bC'\]

where $\bC$ are the characteristic vectors of $\bOmega$ and $\Lambda$ is a diagonal matrix whose diagonal elements are the characteristic roots of $\bOmega.$

$\bOmega^{-1}$ is also positive-definite.

\[\begin{split} \bOmega^{-1} &= (\bC\Lambda\bC')^{-1} \quad (\bC'=\bC^{-1}) \\ &= \bC \Lambda^{-1} \bC' \end{split}\]

Let $\Lambda^{1/2}$ be the diagonal matrix with $i$-th diagonal element $\sqrt{\lambda_i}.$

Define

\[\bP' = \bC\bLambda^{-1/2},\]

then $\bP’\bP$ has a very nice property that $\bP’\bP = \bOmega^{-1} ,$ which you will see is convenient in the estimation below.

\[\begin{split} \bP'\bP &= \bC\bLambda^{-1/2} \bLambda^{-1/2}\bC' \\ &= \bC\bLambda^{-1}\bC'\\ &= \bOmega^{-1} . \end{split}\]

Premultiply model $\eqref{eq-model-general}$ by $\bP$, we obtain

\[\bP\by = \bP\bX\bbeta + \bP\bvarepsilon\]

or

\[\begin{equation}\label{eq-model-gls} \by^* = \bX^*\bbeta + \bvarepsilon^* . \end{equation}\]

The variance of $\bvarepsilon^*$ is

\[\begin{equation}\label{eq-covariance-gls} \begin{split} \E[\bvarepsilon^* \bvarepsilon^{*'}] &= \E[\bP\bvarepsilon\bvarepsilon'\bP'] \\ &= \bP \sigma^2\bOmega \bP' \\ &= \sigma^2 \bP\bOmega \bP' \end{split} \end{equation}\]

Given $\bOmega^{-1} = \bP’\bP$, then

\[\bOmega = (\bP'\bP)^{-1}.\]

Since $\bP$ is both square and non-singular,

\[\bOmega = \bP^{-1}(\bP')^{-1} .\]

Plug into $\eqref{eq-covariance-gls},$ we have

\[\E[\bvarepsilon^* \bvarepsilon^{*'}] = \sigma^2 \bP \bP^{-1}(\bP')^{-1} \bP' = \sigma^2\bI\]

Hence the transformed model $\eqref{eq-model-gls}$ satisfies the classical regression assumptions.

In the classical model, OLS is BLUE, hence,

\[\begin{aligned} \hat{\bbeta} &= \underset{\bbeta}{\arg\min}\; \bvarepsilon^{*\prime} \bvarepsilon^*\\ &= \underset{\bbeta}{\arg\min}(\by-\bX\bbeta)'\bOmega^{-1}(\by-\bX\bbeta) \end{aligned}\]

This estimator is the generalized least squares (GLS) estimator which minimizes the generalized sum of squares $\bvarepsilon^{\ast\prime} \bvarepsilon^{\ast}$, not $ \bvarepsilon’ \bvarepsilon$.

The solution is given by

\[\begin{equation}\label{eq-gls-estimator} \begin{split} \hat{\bbeta}_{\text{gls}} &= (\bX^{*'}\bX^*)^{-1} (\bX^{*'}\by^*) \\ &= (\bX'\bP'\bP\bX)^{-1} (\bX'\bP'\bP\by) \\ &= \left(\bX'\bOmega^{-1}\bX\right)^{-1} (\bX'\bOmega^{-1}\by) . \end{split} \end{equation}\]

We can also write the GLS estimator as

\[\begin{split} \hat{\bbeta}_{\text{gls}} &= \left(\bX'(\sigma^2\bOmega)^{-1}\bX \right)^{-1} \left(\bX'(\sigma^2\bOmega)^{-1}\by \right) \\ &= \left(\bX'\bSigma^{-1}\bX \right)^{-1} \left(\bX'\bSigma^{-1}\by \right) . \qquad (\bSigma = \sigma^2\bOmega) \end{split}\]

The GLS estimator $\eqref{eq-gls-estimator}$ is in contrast with the OLS estimator, which uses a “weighting matrix,” $\bI,$ in stead of $\bOmega^{-1}.$

The GLS estimator is just the OLS estimator, applied to the transformed model, and the latter model satisfies all of the usual conditions. The Gauss-Markov Theorem applies to the GLS estimator.


Properties of the GLS Estimator

  • $\hat{\bbeta}_{\text{gls}}$ is unbiased
\[\E[\hat{\bbeta}_{\text{gls}} \mid \bX^*] = \bbeta\]
  • The sampling variance of $\hat{\bbeta}_{\text{gls}}$ is

    \[\var(\hat{\bbeta}_{\text{gls}} \mid \bX^*) = \sigma^2 \left(\bX^{*'}\bX^* \right)^{-1} = \sigma^2 \left(\bX'\bOmega^{-1}\bX\right)^{-1}\]

    The GLS estimator is asymptotically normally distributed, with mean $\bbeta$ and sampling variance $\sigma^2 (\bX’\bOmega^{-1}\bX)^{-1}.$

  • If the errors are Normally distributed, then the full sampling distribution of the GLS estimator of $\bbeta$ is:

    \[\hat{\bbeta}_{\text{gls}} \sim N \left(\bbeta, \sigma^2 \left(\bX'\bOmega^{-1}\bX \right)^{-1} \right)\]