20  Pooled Regression

Main Takeaway

For pooled regression with panel data:

  • Model structure: \(y_{it} = \bx_{it}' \bbeta + u_{it}\) treats all observations as independent cross-sections
  • Key assumption: Strict mean independence \(\mathbb{E}[u_{it} | \bX_i] = 0\) (errors independent of all regressors across time)
  • Estimation: Standard OLS \(\hat{\bbeta}_{\text{pool}} = (\bX' \bX)^{-1} \bX' \by\)
  • Inference: Use cluster-robust standard errors to account for serial correlation within individuals
  • Limitation: Strict mean independence often violated; excludes lagged dependent variables and unobserved heterogeneity

20.1 Notation

This chapter focuses on panel data regression models whose observations are pairs \((y_{it}, \bx_{it})\) where \(y_{it}\) is the dependent variable and \(\bx_{it}\) is a \(K\)-vector of regressors — observations on individual \(i\) for time period \(t\).

It will be useful to cluster the observations at the level of the individual. We write \(\by_i = (y_{i1}, y_{i2}, \ldots, y_{iT_i})'\) as the \(T_i \times 1\) stacked observations on \(y_{it}\) for \(t \in S_i\), stacked in chronological order. Similarly, \(\bX_i = (\bx_{i1}, \bx_{i2}, \ldots, \bx_{iT_i})'\) is the \(T_i \times k\) matrix of stacked \(\bx_{it}'\).

When we assume a balanced panel, that is, \(T_i = T,\) for \(i=1,\ldots,N.\) We use this assumption for simplicity in notations throughout the chapter.

For the full sample:

  • \(\by = (\by_1', \by_2', \dots, \by_N')'\) is the \(n \times 1\) stacked vector of stacked \(\by_i,\) and
  • \(\bX = (\bX_1', \bX_2', \dots, \bX_N')'\) likewise.

20.2 Model Setup

The simplest model in panel regression is the pooled regression. At the level of the observation, the model is:

\[ \begin{split} y_{it} &= \bx_{it}' \bbeta + u_{it}, \\ \mathbb{E}[\bx_{it} u_{it}] &= 0 \end{split} \]

At the individual level:

\[ \begin{split} \by_i &= \bX_i \bbeta + \bu_i, \\ \mathbb{E}[X_i' \bu_i] &= 0 . \end{split} \]

where

\[ \underset{(T\times 1)}{\by_i} = \begin{bmatrix} y_{i1} \\ y_{i2} \\ \vdots \\ y_{iT} \\ \end{bmatrix} , \quad \underset{(T\times K)}{\bX_i} = \begin{bmatrix} \bx_{i1}' \\ \bx_{i2}' \\ \vdots \\ \bx_{iT}' \\ \end{bmatrix} , \quad \underset{(N\times 1)}{\bu_i} = \begin{bmatrix} u_{i1} \\ u_{i2} \\ \vdots \\ u_{iT} \\ \end{bmatrix} \]

For the full sample:

\[ \by = \bX\beta + \bu \] where

\[ \underset{(NT\times 1)}{\by} = \begin{bmatrix} \by_{1} \\ \by_{2} \\ \vdots \\ \by_{N} \end{bmatrix} , \underset{(NT\times 1)}{\bu} = \begin{bmatrix} \bu_{1} \\ \bu_{2} \\ \vdots \\ \bu_{N} \end{bmatrix} , \underset{(NT\times K)}{\bX} = \begin{bmatrix} \bX_{1} \\ \bX_{2} \\ \vdots \\ \bX_{N} \end{bmatrix} \]

The pooled regression model is appropriate when the errors \(u_{it}\) satisfy strict mean independence:

\[ \mathbb{E}[u_{it} \mid \bX_i] = 0 \tag{20.1}\]

This occurs when the errors \(u_{it}\) are mean independent of all regressors \(\bx_{ij}\) for all time periods \(j=1,\ldots,T.\)

This strict mean independence implies that neither lagged nor future values of \(\bx_{it}\) help predict \(u_{it}\). It excludes lagged dependent variables (such as \(y_{it-1}\)) from \(\bx_{it}\), otherwise \(u_{it}\) would be predictable given \(\bx_{it}.\)

Strict mean independence is stronger than pairwise mean independence, which only requires that \(u_{it}\) is mean independent of \(\bx_{it}\) for the same time period \(t\):

\[ \mathbb{E}[u_{it} \mid \bx_{it}] = 0 \]

as well as the projection assumption

\[ \mathbb{E}[u_{it} \bx_{ij}] = \bold{0} \]

Q: When is strict mean independence, \(\E[u_{it}\mid \bX_i]\), likely to hold?

  • Randomized controlled trials (RCTs) over time

    In a experiment where 500 patients are given either a real blood pressure drug or a placebo by a literal coin flip. The researchers track their blood pressure every week for 10 weeks.

    Here our variable of interest is the drug \(x_{it}\), and each patient has unobserved health characteristics \(\alpha_i\) (genetics, diet, exercise) that affect their blood pressure \(y_{it}\).

    Since the drug is randomly assigned, we can safely assume that

    \[ E[\alpha_{i}x_{it}] = 0 \]

  • Repeated cross-sections

    This happens in surveys where a new random sample of individuals is drawn each year.

    Each person only appears once.

  • Highly homogeneous populations

    An engineer is testing how temperature affects the battery life of 100 identical smartphones coming off the exact same assembly line, tested over 5 different cycles.

In plain language, strict mean independence is likely to hold when there is NO unobserved heterogeneity across individuals that affects both the regressors and the outcome, and when there are no lagged dependent variables in the model.

When you have reason to believe that there exists unobserved individual-specific effects, you can use fixed effects or random effects models instead, which allows for heterogeneity across individuals.


The standard estimator of \(\beta\) in the pooled regression model is least-squares, given by:

\[ \begin{split} \hat{\beta}_{\text{pool}} &= \left(\sum_{i=1}^N\sum_{t=1}^T \bx_{it}\bx_{it}' \right)^{-1} \left(\sum_{i=1}^N\sum_{t=1}^T \bx_{it}y_{it} \right) \\ &= \left( \sum_{i=1}^N X_i' X_i \right)^{-1} \left( \sum_{i=1}^N X_i' y_i \right) \\ &= (X' X)^{-1} X' y \end{split} \]

\(\hat{\beta}_{\text{pool}}\) is called the pooled regression estimator.

The vector of least-squares residuals for the \(i\)th individual is:

\[ \widehat{\bu}_i = \by_i - \bX_i \hat{\bbeta}_{\text{pool}} \]

20.3 Statistical Properties

The estimator can be rewritten as:

\[ \begin{split} \hat{\bbeta}_{\text{pool}} &= \left( \sum_{i=1}^N \bX_i' \bX_i \right)^{-1} \left( \sum_{i=1}^N \bX_i' \left(\bX_i\bbeta + \bu_i\right) \right) \\ &= \bbeta + \left( \sum_{i=1}^N \bX_i' \bX_i \right)^{-1} \left( \sum_{i=1}^N \bX_i' \bu_i \right) \end{split} \]

Given (24.4), \(\hat{\bbeta}_{\text{pool}}\) is unbiased.

  • If \(u_{it}\) is homoskedastic and serially uncorrelated: use classical variance estimator.

    Will treat all observations as independent cross-sections, ignoring the panel structure. → Wouldn’t be appropriate as serial correlation is common within individuals over time.

    Also ignoring the panel structure would lead to underestimating the standard errors, resulting in overconfident inference.

  • If \(u_{it}\) is heteroskedastic: use heteroskedasticity-robust estimator.

  • If \(u_{it}\) is serially correlated (this is highly commonly): use cluster-robust covariance matrix:

\[ \widehat{\bV}_{\text{pool}} = (\bX' \bX)^{-1} \left( \sum_{i=1}^N \bX_i' \widehat{\bu}_i \widehat{\bu}_i' \bX_i \right) (\bX' \bX)^{-1} \]

With Stata’s degrees-of-freedom adjustment:

\[ \widehat{\bV}_{\text{pool}} = \left( \frac{n - 1}{n - k} \right) \left( \frac{N}{N - 1} \right) (\bX' \bX)^{-1} \left( \sum_{i=1}^N \bX_i' \widehat{\bu}_i \widehat{\bu}_i' \bX_i \right) (\bX' \bX)^{-1} \]

When strict mean independence (24.4) fails, however, the pooled least-squares estimator \(\hat{\bbeta}_{\text{pool}}\) is not necessarily consistent for \(\bbeta\). Since strict mean independence is a strong and typically undesirable restriction, it is typically preferred to adopt one of the alternative estimation approaches such as Fixed Effects or Random Effects models.