23 Random Effects Model

Main Takeaway

For random effects panel data models:

Model: $y_{it} = \alpha_i + \bbeta'\bx_{it} + u_{it}$ with composite error $v_{it} = \alpha_i + u_{it}$
Key assumption: $\text{Cov}(\bx_{itj}, \alpha_i) = 0$ (strict exogeneity of unobserved effects)
Serial correlation: $\text{Corr}(v_{it}, v_{is}) = \frac{\sigma_a^2}{\sigma_a^2 + \sigma_u^2} > 0$ for $t \neq s$
GLS estimation: Quasi-demeaning transformation $y_{it}^* = y_{it} - \theta\bar{y}_i$ where $\theta \in [0,1]$
Efficiency trade-off: More efficient than FE when assumptions hold, but inconsistent if $\alpha_i$ correlated with $\bx_{it}$
Flexibility: Allows time-invariant regressors unlike fixed effects

We begin with the unobserved effects model where we explicitly include an intercept so that we can make the assumption that the unobserved random heterogeneity specific to the $i$th observation, $\alpha_i$, has zero mean. $\alpha_i$ is constant through time. $\beta_0$ is the mean of the unobserved heterogeneity.

\[ \begin{aligned} y_{it} &= a_i + \bbeta'\bx_{it} + u_{it}, \\ \text{or}\quad y_{it} &= a_i + \beta_0 + \beta_1 x_{it1} + \beta_2 x_{it2} + \cdots + \beta_K x_{itK} + u_{it} , \end{aligned} \] for $t = 1,2,,T, $ and $i = 1,2,\ldots,N.$

Here the random effects model assumes $\alpha_i$ is uncorrelated with each explanatory variable in all time periods.

\[ \cov(x_{itj}, a_i) = 0, \quad t=1,2,\ldots,T;\, j=1,2,\ldots,K. \]

In this case, the fixed effects or first differencing model which eliminates $\alpha_i$ results in inefficient estimators.

The Random Effects (RE) approach estimate $\bbeta$ effectively by putting $a_i$ into the error term under the assumption that $a_i$ is orthogonal to $\bx_{it},$ and then accounts for the implied serial correlation in the composite error $v_{it} = a_i + u_{it}$ using a GLS analysis.

If we define the composite error term $v_{it}$ as

\[ \color{#008B45} v_{it} = a_i + u_{it}, \tag{23.1}\]

Equation 23.1 is often called an “error components model.”

Then we have

\[ y_{it} = \beta_0 + \beta_1 x_{it1} + \cdots + \beta_k x_{itk} + v_{it}. \tag{23.2}\] Because $a_i$ is in the composite error in each time period, the $v_{it}$ are serially correlated across time.

Assumptions of random effects model:

$\E[a_i\mid \bX] = \E[u_{it}\mid \bX] = 0$
$\E[a_i\mid \bX] = \sigma_a^2$ and $\E[u_{it}\mid \bX] = \sigma_u^2$
$\E[u_{it}a_j\mid \bX] = 0$ for all $i,t$, and $j$
$\E[u_{it}u_{js}\mid \bX] = 0$ if $i\ne j$ or $t\ne s$
$\E[a_ia_j\mid \bX] = 0$ for $i\ne j$

We view the data structure as blocks of $T$ observations for group $i.$

Let

\[ \bv_i = \begin{bmatrix} v_{i1}, v_{i2}, \cdots , v_{iT} \end{bmatrix}'. \] Let’s have a look at the variance-covariance matrix of $\bv_i.$ We have

\[ \begin{aligned} \E[v_{it}^2\mid \bX] &= \E[a_i^2 + u_{it}^2 + 2a_iu_{it} \mid \bx] \\ &= \sigma_a^2 + \sigma_u^2 . \end{aligned} \] For $t\ne s$, \[ \begin{aligned} \E[v_{it}v_{is}\mid \bX] &= \E[(a_i+u_{it})(a_i+u_{is})] \\ &= \E[a_i^2 + a_iu_{it} + a_iu_{is} + u_{it}u_{is} \mid \bx] \\ &= \sigma_a^2 , \end{aligned} \]

For $i\ne j$ and all $t$ and $s$, since observations $i$ and $j$ are independent, we have: \[ \E[v_{it}v_{js}\mid \bX] = 0. \]

Let $\Sigma = \E[\bv_i\bv_i'\mid \bX]$ be the $T\times T$ covariance matrix for observation $i$, then

\[ \begin{aligned} \underset{(T\times T)}{\bSigma} &= \E[\bv_i\bv_i'\mid \bX] \\ &= \begin{bmatrix} \sigma_a^2 + \sigma_u^2 & \sigma_a^2 & \cdots & \sigma_a^2 \\ \sigma_a^2 & \sigma_a^2 + \sigma_u^2 & \cdots & \sigma_a^2 \\ \vdots & \vdots & \ddots & \vdots \\ \sigma_a^2 & \sigma_a^2 & \cdots & \sigma_a^2 + \sigma_u^2\\ \end{bmatrix} \\ &= \sigma_a^2 \bI_T + \sigma_u^2\bi_T\bi_T' \end{aligned} \] where $\bI_T$ is an $T\times T$ identity matrix and $\bi_T$ is a $T\times 1$ column vector of 1’s.

The disturbance covariance matrix for the full $NT$ observations is

\[ \underset{(NT\times NT)}{\Omega} = \begin{bmatrix} \Sigma & 0 & \cdots & 0 \\ 0 & \Sigma & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \Sigma \\ \end{bmatrix} = \bI_N \otimes \bSigma \]

Note that the errors are serially correlated under the random effects assumptions: \[ \text{Corr}(v_{it}, v_{is}) = \frac{\sigma_a^2}{\sigma_a^2 + \sigma_u^2}, \quad t \neq s, \]

where $\sigma_a^2 = \text{Var}(a_i)$ and $\sigma_u^2 = \text{Var}(u_{it})$.

This (necessarily) positive serial correlation in the error term can be substantial, and, because the usual pooled OLS standard errors ignore this correlation, they will be incorrect, as will the usual test statistics.

Generalized least squares (GLS) can be used to estimate models with autoregressive serial correlation. For the procedure to have good properties, we should have large $N$ and relatively small $T$. We assume that we have a balanced panel, although the method can be extended to unbalanced panels.

\[ \begin{split} \hat{\bbeta}_{\text{GLS}} &= \left(\bX'\bOmega^{-1}\bX \right)^{-1} \bX'\bOmega^{-1}\by \\ &= \left(\sum_{i=1}^N \bX_i'\bSigma^{-1}\bX_i \right)^{-1} \left( \sum_{i=1}^N \bX_i'\bSigma^{-1}\by_i \right) \end{split} \]

Deriving the GLS transformation that eliminates serial correlation in the errors requires sophisticated matrix algebra (see, for example, Wooldridge (2010, Chapter 10)). We will require a transformation

\[ \bP = \bOmega^{-1/2} = (\mathbf{I}_N \otimes \bSigma)^{-1/2} . \] We only need to find $\bSigma^{-1/2},$ which is

\[ \bSigma^{-1/2} = \mathbf{I} - \frac{\theta}{T} \bi_T\bi_T' , \] where

\[ \theta = 1 - \left[ \frac{\sigma_u^2}{\sigma_u^2 + T \sigma_a^2} \right]^{1/2} , \]

which is between zero and one. As $\frac{\sigma_u^2}{\sigma_u^2 + T \sigma_a^2}$ approaches zero, $\theta\to1.$

The transformation of $\by_i$ for GLS is therefore

\[ \bSigma^{-1/2} \by_i = \begin{bmatrix} y_{i1} - \theta \overline{y}_i \\ y_{i2} - \theta \overline{y}_i \\ \vdots \\ y_{iT} - \theta \overline{y}_i \\ \end{bmatrix} \] and likewise for the rows of $\bX_i.$

Then, the transformed equation turns out to be \[ \begin{aligned} y_{it} - \theta \overline{y}_i &= \beta_0 (1 - \theta) + \beta_1 (x_{it1} - \theta \overline{x}_{i1}) + \cdots \\ &\phantom{=}\quad + \beta_K (x_{itK} - \theta \overline{x}_{iK}) + (v_{it} - \theta \overline{v}_i), \end{aligned} \tag{23.3}\]

where the overbar $\overline{y}_i=T^{-1}\sum_{t=1}^Ty_{it}$ is the time averages of $y$ for individual $i$, and similarly for $\overline{x}_{ij},$ $j=1,\ldots,K.$

This is a very interesting equation, as it involves quasi-demeaned data on each variable. The fixed effects estimator subtracts the time averages from the corresponding variable. The random effects transformation subtracts a fraction of that time average, where the fraction depends on $\sigma_u^2,$ $\sigma_a^2$ and the number of time periods, $T$. This transformation is known as “theta-differencing.”

The GLS estimator is simply the pooled OLS estimator of the transformed model (23.3). The errors in (23.3) are serially uncorrelated.

Equation 23.3 can be written more compactly as follows:

\[ y_{it}^\ast = \bbeta' \bx_{it}^\ast + u_{it}^\ast, \tag{23.4}\] where \[ y_{it}^\ast = y_{it} - \theta \overline{y}_i , \] and

\[ \begin{split} \theta &= 1 - \left[ \frac{\sigma_u^2}{\sigma_u^2 + T \sigma_a^2} \right]^{1/2} , \\ \overline{y}_i &= T^{-1}\sum_{t=1}^Ty_{it} . \end{split} \]

$\hat{\bbeta}_{\text{GLS}}$ can be obtained by using OLS on the transformed model (23.4).

The transformation in (23.3) allows for explanatory variables that are constant over time, and this is one advantage of random effects (RE) over either fixed effects or first differencing. This is possible because RE assumes that the unobserved effect is uncorrelated with all explanatory variables, whether the explanatory variables are fixed over time or not. Thus, in a wage equation, we can include a variable such as education even if it does not change over time. But we are assuming that education is uncorrelated with $a_i$, which contains ability and family background. In many applications, the whole reason for using panel data is to allow the unobserved effect to be correlated with the explanatory variables.

The parameter $\theta$ is never known in practice, but it can always be estimated. There are different ways to do this, which may be based on pooled OLS or fixed effects, for example. Generally, $\hat{\theta}$ takes the form \[ \hat{\theta} = 1 - \left[ \frac{\hat{\sigma}_u^2}{\hat{\sigma}_a^2 + T \hat{\sigma}_u^2} \right]^{1/2}, \] where $\hat{\sigma}_a^2$ is a consistent estimator of $\sigma_a^2$ and $\hat{\sigma}_u^2$ is a consistent estimator of $\sigma_u^2$.

These estimators can be based on the pooled OLS or the fixed effects residuals (Within Groups or Between Groups estimators). \[ \hat{\sigma}_a^2 = \left[ \frac{NT(T - 1)}{2} - (K + 1) \right]^{-1} \sum_{i=1}^{N} \sum_{t=1}^{T-1} \sum_{s=t+1}^{T} \hat{v}_{it} \hat{v}_{is}, \] where the $\hat{v}_{it}$ are the residuals from estimating Equation 23.2 by pooled OLS.

Given this, we can estimate $\sigma_u^2$ by using \[ \hat{\sigma}_u^2 = \hat{\sigma}_v^2 - \hat{\sigma}_a^2, \] where $\hat{\sigma}_v^2$ is the square of the usual standard error of the regression from pooled OLS. [See Wooldridge (2010, Chapter 10) for additional discussion of these estimators.]

The feasible GLS estimator that uses $\hat{\theta}$ in place of $\theta$ is called the random effects estimator.

Under the random effects assumptions, the estimator is consistent (not unbiased) and asymptotically normally distributed as $N$ gets large with fixed $T$.

The properties of the random effects (RE) estimator with small $N$ and large $T$ are largely unknown, although it has certainly been used in such situations.

Comparing RE estimates and FE estimates: When $\theta=1$, RE becomes the FE model. Recall that \[ \theta = 1 - \left[ \frac{\sigma_u^2}{\sigma_a^2 + T \sigma_u^2} \right]^{1/2}, \] There are three scenarios where $\theta\to1$:

$T$ is big $\rightarrow$ Lots of variation across time for each individual $\rightarrow$ more like fixed effects
$\sigma_a^2$ is big $\rightarrow$ Lots of variation in the fixed effects $\rightarrow$ more like fixed effects
$\sigma_u^2$ is small relative to $\sigma_a^2$ $\rightarrow$ idiosyncratic variation is small, more variation from fixed effects $\rightarrow$ more like fixed effects

Comparing RE estimates and Pooled OLS: When $\theta=0$, RE becomes the Pooled OLS. This happens when the unobserved effect, $a_i,$ is relatively unimportant (because it has small variance relative to $\sigma^2_u$).

Summary of RE:

Random effects estimators are a weighted average of the between estimator (variation between individuals in a cross section) and the within/fixed estimator (variation within individuals over time)
Random effects estimators will be consistent and unbiased if fixed effects are not correlated with $\bx$’s. Fixed effects estimators will always be consistent and unbiased (under usual GM assumptions)
Random effects estimators will be more efficient (have smaller standard errors) than fixed effects estimators because they use more of the variation in $X$ (specifically, they use the cross sectional/between variation)