11.3 Predict

Refer to [U] 20.11 Obtaining predicted values.

predict calculates predictions, residuals, influence statistics, and the like after estimation. Exactly what predict can do is determined by the previous estimation command; command-specific options are documented with each estimation command. Regardless of command-specific options, the actions of predict share certain similarities across estimation commands:

predict [type] newvar [if] [in] [, single_options]

predict newvar1 create newvar1 containing “predicted values”, i.e., \(\hat{y}_i = \E(y_i\mid \bx_i)\)
- For linear regression models, \(\hat{y}_i = \bx_i'\hat{\bbeta.}\)
- For probit/logit models, \(\hat{y}_i = F(\bx_i'\hat{\bbeta}),\) where \(F(.)\) is the logistic or normal cumulative distribution function.
predict newvar2, xb create newvar2 containing the linear prediction

Option xb means calculating the linear prediction, \(\bx_i'\hat{\bbeta},\) from the fitted model.

Note that in case of a linear regression model, predict fitted and predict fitted, xb will give you the same result.

The difference is that for probit/logit models, predict fitted gives you the predicted probability, while predict fitted, xb gives you the logit or probit index.
predict newvar2 if e(sample), xb Same as above, but only for observations used to fit the model in the previous estimation, i.e., in-sample predictions.

e(sample): return \(1\) if the observation is in the estimation sample and \(0\) otherwise.

predict can be used in out-of-sample predictions, which extends beyond the estimation sample.

You can load a new dataset and type predict to obtain results for that sample.

use dataset1      /* estimation dataset */
(fit a model)
use dataset2      /* forecast dataset */
predict yhat    /* fill in the predictions */

predict e, residuals will generate a variable e containing the residuals of the estimation

\[ e_i = y_i - \hat{y}_i \]

Consider the linear prediction

\[ \begin{split} \hat{y}_i &= \bx_i'\hat{\bbeta} \\ &= \hat{\beta}_1x_{1i} + \hat{\beta}_2x_{2i} + \cdots + \hat{\beta}_Kx_{Ki} . \end{split} \]

\(\hat{y}_i\) is called the

predcited values for in-sample predictions
forecasts for out-of-sample predictions

For logit or probit, \(\bx_i'\hat{\bbeta}\) is called the logit or probit index. The predicted probability is \(p_i=\hat{y}_i=F(\bx_i'\hat{\bbeta}),\) where \(F(.)\) is the logistic or normal cumulative distribution function. For probit, \(\hat{y}_i=\Phi(\bx_i'\hat{\bbeta})\) .

\(x_{1i},\) \(x_{2i},\) \(\ldots,\) \(x_{Ki}\) are obtained from the data currently in memory and do NOT necessarily correspond to the data on the independent variables used to fit the model (obtaining \(\hat{\beta}_1,\) \(\hat{\beta}_2,\) \(\ldots,\) \(\hat{\beta}_K\)).