11.3 Predict

predict calculates predictions, residuals, influence statistics, and the like after estimation. Exactly what predict can do is determined by the previous estimation command; command-specific options are documented with each estimation command. Regardless of command-specific options, the actions of predict share certain similarities across estimation commands:

  • predict newvar1 create newvar1 containing “predicted values”, i.e., \(\hat{y}_i = \E(y_i\mid \bx_i)\)

    • For linear regression models, \(\hat{y}_i = \bx_i'\hat{\bbeta.}\)
    • For probit/logit models, \(\hat{y}_i = F(\bx_i'\hat{\bbeta}),\) where \(F(.)\) is the logistic or normal cumulative distribution function.
  • predict newvar2, xb create newvar2 containing the linear prediction

    Option xb means caculating the linear prediction, \(\bx_i'\hat{\bbeta},\) from the fitted model.

  • predict newvar2 if e(sample), xb Same as above, but only for observations used to fit the model in the previous estimation, i.e., in-sample predictions.

    e(sample): return \(1\) if the observation is in the estimation sample and \(0\) otherwise.

  • predict can be used in out-of-sample predictions, which extende beyond the estimation sample.

    You can load a new dataset and type predict to obtain results for that sample.

    use dataset1      /* estimation dataset */
    (fit a model)
    use dataset2      /* forecast dataset */
    predict yhat    /* fill in the predictions */
  • predict e, residuals will generate a variable e containing the residuals of the estimation

    \[ e_i = y_i - \hat{y}_i \]

Consider the linear prediction

\[ \begin{split} \hat{y}_i &= \bx_i'\hat{\bbeta} \\ &= \hat{\beta}_1x_{1i} + \hat{\beta}_2x_{2i} + \cdots + \hat{\beta}_Kx_{Ki} . \end{split} \]

\(\hat{y}_i\) is called the

  • predcited values for in-sample predictions
  • forecasts for out-of-sample predictions

For logit or probit, \(\bx_i'\hat{\bbeta}\) is called the logit or probit index. The predicted probability is \(p_i=\hat{y}_i=F(\bx_i'\hat{\bbeta}),\) where \(F(.)\) is the logistic or normal cumulative distribution function. For probit, \(\hat{y}_i=\Phi(\bx_i'\hat{\bbeta})\) .

\(x_{1i},\) \(x_{2i},\) \(\ldots,\) \(x_{Ki}\) are obtained from the data currently in memory and do NOT necessarily correspond to the data on the independent variables used to fit the model (obtaining \(\hat{\beta}_1,\) \(\hat{\beta}_2,\) \(\ldots,\) \(\hat{\beta}_K\)).