11.3 Predict
Refer to [U] 20.11 Obtaining predicted values.
predict calculates predictions, residuals, influence statistics, and the like after estimation. Exactly what predict can do is determined by the previous estimation command; command-specific options are documented with each estimation command. Regardless of command-specific options, the actions of predict share certain similarities across estimation commands:
predict newvar1createnewvar1containing “predicted values”, i.e., \(\hat{y}_i = \E(y_i\mid \bx_i)\)- For linear regression models, \(\hat{y}_i = \bx_i'\hat{\bbeta.}\)
- For probit/logit models, \(\hat{y}_i = F(\bx_i'\hat{\bbeta}),\) where \(F(.)\) is the logistic or normal cumulative distribution function.
predict newvar2, xbcreatenewvar2containing the linear predictionOption
xbmeans calculating the linear prediction, \(\bx_i'\hat{\bbeta},\) from the fitted model.Note that in case of a linear regression model,
predict fittedandpredict fitted, xbwill give you the same result.The difference is that for probit/logit models,
predict fittedgives you the predicted probability, whilepredict fitted, xbgives you the logit or probit index.predict newvar2 if e(sample), xbSame as above, but only for observations used to fit the model in the previous estimation, i.e., in-sample predictions.e(sample): return \(1\) if the observation is in the estimation sample and \(0\) otherwise.predictcan be used in out-of-sample predictions, which extends beyond the estimation sample.You can load a new dataset and type
predictto obtain results for that sample.predict e, residualswill generate a variableecontaining the residuals of the estimation\[ e_i = y_i - \hat{y}_i \]
Consider the linear prediction
\[ \begin{split} \hat{y}_i &= \bx_i'\hat{\bbeta} \\ &= \hat{\beta}_1x_{1i} + \hat{\beta}_2x_{2i} + \cdots + \hat{\beta}_Kx_{Ki} . \end{split} \]
\(\hat{y}_i\) is called the
- predcited values for in-sample predictions
- forecasts for out-of-sample predictions
For logit or probit, \(\bx_i'\hat{\bbeta}\) is called the logit or probit index. The predicted probability is \(p_i=\hat{y}_i=F(\bx_i'\hat{\bbeta}),\) where \(F(.)\) is the logistic or normal cumulative distribution function. For probit, \(\hat{y}_i=\Phi(\bx_i'\hat{\bbeta})\) .
\(x_{1i},\) \(x_{2i},\) \(\ldots,\) \(x_{Ki}\) are obtained from the data currently in memory and do NOT necessarily correspond to the data on the independent variables used to fit the model (obtaining \(\hat{\beta}_1,\) \(\hat{\beta}_2,\) \(\ldots,\) \(\hat{\beta}_K\)).