Least Absolute Deviations Estimation

Least absolute deviations estimation is an alternative to OLS that is less sensitive to outliers and that delivers consistent estimates of conditional median parameters.

The least absolute deviations (LAD) estimator minimized the sum of the absolute values of the residuals,

\[\begin{align}\label{eq-LAD_obj} \underset{\bbeta}{\min} \sum_{i=1}^n \abs{y_i-\bx_i'\bbeta} . \end{align}\]

Unlike OLS, which minimizes the sum of squared residuals, the LAD estimates are not available in closed form—that is, we cannot write down formulas for them.

Typically, solving the problem in $\eqref{eq-LAD_obj}$ is computationally difficult, especially with large sample size and many explanatory variables.

Fig. 1 The OLS and LAD objective functions.

Fig. 1 shows the OLS and LAD objective functions. The LAD objective function is linear on either side of zero, so that if, say, a positive residual increases by one unit, the LAD objective function increases by one unit. By contrast, the OLS objective function gives increasing importance to large residuals, and this makes OLS more sensitive to outlying observations.

Because LAD does not give increasing weight to larger residuals, it is much less sensitive to changes in the extreme values of the data than OLS. In fact, it is known that LAD is designed to estimate the parameters of the conditional median of $\by$ given the covariates $\bx_1,$ $ \bx_2,$ …, $\bx_k$ rather than the conditional mean.

Because the median is not affected by large changes in the extreme observations, it follows that the LAD parameter estimates are more resilient to outlying observations.

In addition to LAD being more computationally intensive than OLS, a second drawback of LAD is that all statistical inference involving the LAD estimators is justified only as the sample size grows. The OLS $t$ statistics have exact $t$ distributions, and $F$ statistics have exact $F$ distributions. While asymptotic versions of these statistics are available for LAD—and reported routinely by software packages that compute LAD estimates—these are justified only in large samples. Like the additional computational burden involved in computing LAD estimates, the lack of exact inference for LAD is only of minor concern, because most applications of LAD involve several hundred, if not several thousand, observations.

Least absolute deviations is a special case of what is often called robust regression. In the statistics literature, a robust regression estimator is relatively insensitive to extreme observations. Effectively, observations with large residuals are given less weight than in least squares.

LAD is also a special case of quantile regression, which is used to estimate the effect of the $x_j$ on different parts of the distribution—not just the median (or mean). For example, in a study to see how having access to a particular pension plan affects wealth, it could be that access affects high-wealth people differently from low-wealth people, and these effects both differ from the median person. Wooldridge (2010, Chapter 12) contains a treatment and examples of quantile regression.

An advantage that LAD has over OLS is that, because LAD estimates the median, it is easy to obtain partial effects using monotonic transformation.

For example, consider a model

\[\begin{align} \log(y_i) &= \bx_i'\bbeta + u_i \label{eq-LAD_linear} \\ \text{Med}(u \mid \bx) &= 0 , \nonumber \end{align}\]

which implies

\[\text{Med}(\log(y_i) \mid \bx) = \bx_i'\bbeta .\]

The conditional median passes through increasing functions, therefore,

\[\begin{align}\label{eq-LAD_nonlinear} \text{Med}(y \mid \bx) = \exp(\bx_i'\bbeta) . \end{align}\]

$\beta_j$ can be interpreted as the semi-elasticity of $\text{Med}(y \mid \bx)$ w.r.t. $x_j.$ In other words, the partial effect of $x_j$ in the linear equation $\eqref{eq-LAD_linear}$ can be used to uncover the partial effect in the nonlinear model $\eqref{eq-LAD_nonlinear}$.

By contrast, if we assume

\[\begin{align}\label{eq-mean} \E[\log(y_i)\mid \bx] = \bx_i'\bbeta , \end{align}\]

then, in general, there is no way to uncover $\E(y\mid \bx),$ unless we make a full distributional assumption for $u$ given $\bx.$ Heteroskedasticity in a linear model $\eqref{eq-mean}$ for $\log(y)$ confounds our ability to find $\E(y\mid\bx).$