10 OLS Finite Sample Assumptions and Properties

In this section, we give a complete listing of the finite sample, or small sample, properties of OLS under standard assumptions.

10.1 Finite Sample Classical Assumptions of OLS

Assumption 10.1 (Linear in Parameters) The stochastic process \(\{(x_{t1}, x_{t2}, \ldots, x_{tK}, y_t): t=1,2,\ldots, T\}\) follows the linear model

\[ \begin{split} y_t &= \bx'\bbeta + u_t \\ &= \beta_0 + \beta_1x_{t1} + \beta_2x_{t2} + \cdots + + \beta_Kx_{tK} + u_t, \end{split} \]

where \(\{u_t: t = 1,2,\ldots, T\}\) is the sequence of errors or disturbances. Here \(T\) is the number of observations (time periods).

In the notation \(x_{tj}\), \(t\) denotes the time period, and \(j\) is a label to indicate one of the \(K\) explanatory variables.

For example, a Finite Distributed Lag (FDL) model of order two

\[ y_t = \alpha_0 + \delta_0 z_t + \delta_1 z_{t-1} + \delta_2 z_{t-2} + u_t, \] is obtained by setting \(x_{t1}=z_t,\) \(x_{t2}=z_{t-1},\) and \(x_{t3}=z_{t-2}.\)

For the remaining assumptions, let \(\bx_t=(x_{t1}, x_{t2}, \ldots, x_{tk})\) denote the set of all independent variables at time \(t.\) Further, \(\bX\) denote the collection of all independent variables for all time periods. It is useful to think of \(\bX\) as being an array, with \(T\) rows and \(k\) columns.

\[ \bX = \begin{bmatrix} \bx_1 \\ \bx_2 \\ \vdots \\ \bx_T \end{bmatrix} = \begin{bmatrix} x_{11} & x_{12} & \cdots & x_{1K} \\ x_{21} & x_{22} & \cdots & x_{2K} \\ \vdots & \vdots & \ddots & \vdots \\ x_{T1} & x_{T2} & \cdots & x_{TK} \\ \end{bmatrix} \]

Assumption 10.2 (No Perfect Collinearity) In the sample (and therefore in the underlying time series process), no independent variable is constant nor a perfect linear combination of the others.

Assumption 10.2 allows the explanatory variables to be correlated, but it rules out perfect correlation in the sample.

Assumption 10.3 (Zero Conditional Mean) For each \(t\), the expected value of the error \(u_t,\) given the explanatory variables for all time periods, is zero. Mathematically,

\[ \E(u_t\mid \bx_1, \ldots, \bx_T) = \E(u_t\mid \bX) = 0,\, t=1,2,\ldots, T. \tag{10.1}\]

Assumption 10.3 (strict mean independence) implies that the error at time \(t\), \(u_t,\) is uncorrelated with each explanatory in every time period. \[ \E(\bx_s u_t) = \bold{0}, \quad \text{for } s=1,\ldots,T. \tag{10.2}\] This is called the strict exogeneity.

Note that (10.1) implies (10.2), but not conversely. While strict exogeneity is sufficient for identification and asymptotic theory, we will use the stronger condition (10.1) for finite sample analysis.

Strict exogeneity is typically inappropriate in dynamic models. A relaxed version is contemporaneously exogenous, which only requires \(u_t\) to be uncorrelated with the explanatory variables dated at time \(t\): in conditional mean terms,

\[ \E(u_t\mid x_{t1}, x_{t2}, \ldots, x_{tK} ) = \E(u_t\mid \bx_t) = 0. \] Contemporaneously exogeneity is much weaker than strict exogeneity because it puts no restrictions on how \(u_t\) is related to the explanatory variables in other time periods.

Assumption 10.4 (Homoskedasticity) Conditional on \(\bX,\) the variance of \(u_t\) is the same for all \(t:\) \(\var(u_t\mid \bX)=\var(u_t)=\sigma^2,\) \(t=1,2,\ldots,T.\)

Assumption 10.5 (No Serial Correlation) Conditional on \(\bX,\) the errors in two different time periods are uncorrelated: \(\cor(u_t, u_s\mid \bX)=0,\) for all \(t\ne s.\)

10.2 Finite Sample Properties of OLS

Under Assumptions 10.1 through 10.3, the OLS estimators are unbiased:

\[ \E(\hat{\beta}_j\mid \bX) = \E(\hat{\beta}_j) = \beta_j, \, j=0,1,\ldots,K. \]

Under Assumptions 10.1 through 10.5, the OLS estimators are the best linear unbiased estimators conditional on \(\bX.\)

10.3 Violation of Strict Exogeneity

Strict exogeneity (Assumption 10.3) requires \(u_t\) to be uncorrelated with \(\bx_s\), for \(s=1,2,\ldots,T.\)

If the unobservables at time \(t\)are correlated with any of the explanatory variables in any time period, then Zero Conditional Mean fails. Two leading candidates for failure are

omitted variables and
measurement error in some of the regressors.

Other more subtle causes:

Lagged effects
Either the dependent variable of one of the independent variables is based on expectation.

Example 10.1 For example, consider a static model to explain a city’s murder rate (\(mrdrte_t\)) in terms of police officers per capita (\(polpc_t\)):

\[ mrdrte_t = \beta_0 + \beta_1\, polpc_t + u_t. \]

Suppose that the city adjusts the size of its police force based on past values of the murder rate, such as

\[ polpc_{t+1} = \delta_0 + \delta_1 mrdrte_t + v_t. \]

This means that, say, \(polpc_{t+1}\) might be correlated with \(u_t\) (since a higher \(u_t\) leads to a higher \(mrdrte_t\)). If this is the case, strict exogeneity is violated.

Mathematically,

\[ \begin{aligned} \E [u_t polpc_{t+1}] &= \E [u_t (\delta_0 + \delta_1 mrdrte_t + v_t)] \\ &= \delta_1 \E[u_t\, mrdrte_t] \\ &= \delta_1 \E[u_t (\beta_0 + \beta_1\, polpc_t + u_t)] \\ &= \delta_1 \E[u_t^2] \\ &= \delta_1 \sigma^2_u. \end{aligned} \]

Example 10.2 A general representation with two explanatory variables:

\[ y_t = \beta_0 + \beta_1z_{t1} + \beta_2z_{t2} + u_t. \] Under weak dependence, the condition sufficient for consistency of OLS is

\[ \E(u_t\mid z_{t1}, z_{t2}) = 0 \] Importantly, the condition does NOT rule out correlation between, say, \(u_{t-1}\) and \(z_{t1}\). This type of correlation could arise if \(z_{t1}\) is related to past \(y_{t-1}\), such as

\[ z_{t1} = \delta_0 + \delta_1y_{t-1} + v_t. \] For example, \(z_{t1}\) might be a policy variable, such as monthly percentage change in the money supply, and this change might depend on last month’s rate of inflation (\(y_{t-1}\)). Such a mechanism generally causes \(z_{t1}\) and \(u_{t-1}\) to be correlated (as can be seen by plugging in for \(y_{t-1}\)). This kind of feedback is allowed under contemporaneous exogeneity.

In the social sciences, many explanatory variables may very well violate the strict exogeneity assumption. For example, the amount of labor input might not be strictly exogenous, as it is chosen by the farmer, and the farmer may adjust the amount of labor based on last year’s yield. Policy variables, such as growth in the money supply, expenditures on welfare, and highway speed limits, are often influenced by what has happened to the outcome variable in the past.

There are similar considerations in distributed lag models. Usually, we do not worry that \(u_t\) might be correlated with past \(z\) because we are controlling for past \(z\) in the model. But feedback from \(u\) to future \(z\) is always an issue.

10.4 Violation of No Serial Correlation

When Assumption 10.5 is false, we say that the error suffers from serial correlation, or autocorrelation, because they are correlated across time.