11.3 Predict
Refer to [U] 20.11 Obtaining predicted values.
predict
calculates predictions, residuals, influence statistics, and the like after estimation. Exactly what predict
can do is determined by the previous estimation command; command-specific options are documented with each estimation command. Regardless of command-specific options, the actions of predict
share certain similarities across estimation commands:
predict newvar1
createnewvar1
containing “predicted values”, i.e., \(\hat{y}_i = \E(y_i\mid \bx_i)\)- For linear regression models, \(\hat{y}_i = \bx_i'\hat{\bbeta.}\)
- For probit/logit models, \(\hat{y}_i = F(\bx_i'\hat{\bbeta}),\) where \(F(.)\) is the logistic or normal cumulative distribution function.
predict newvar2, xb
createnewvar2
containing the linear predictionOption
xb
means calculating the linear prediction, \(\bx_i'\hat{\bbeta},\) from the fitted model.Note that in case of a linear regression model,
predict fitted
andpredict fitted, xb
will give you the same result.The difference is that for probit/logit models,
predict fitted
gives you the predicted probability, whilepredict fitted, xb
gives you the logit or probit index.predict newvar2 if e(sample), xb
Same as above, but only for observations used to fit the model in the previous estimation, i.e., in-sample predictions.e(sample)
: return \(1\) if the observation is in the estimation sample and \(0\) otherwise.predict
can be used in out-of-sample predictions, which extends beyond the estimation sample.You can load a new dataset and type
predict
to obtain results for that sample.predict e, residuals
will generate a variablee
containing the residuals of the estimation\[ e_i = y_i - \hat{y}_i \]
Consider the linear prediction
\[ \begin{split} \hat{y}_i &= \bx_i'\hat{\bbeta} \\ &= \hat{\beta}_1x_{1i} + \hat{\beta}_2x_{2i} + \cdots + \hat{\beta}_Kx_{Ki} . \end{split} \]
\(\hat{y}_i\) is called the
- predcited values for in-sample predictions
- forecasts for out-of-sample predictions
For logit or probit, \(\bx_i'\hat{\bbeta}\) is called the logit or probit index. The predicted probability is \(p_i=\hat{y}_i=F(\bx_i'\hat{\bbeta}),\) where \(F(.)\) is the logistic or normal cumulative distribution function. For probit, \(\hat{y}_i=\Phi(\bx_i'\hat{\bbeta})\) .
\(x_{1i},\) \(x_{2i},\) \(\ldots,\) \(x_{Ki}\) are obtained from the data currently in memory and do NOT necessarily correspond to the data on the independent variables used to fit the model (obtaining \(\hat{\beta}_1,\) \(\hat{\beta}_2,\) \(\ldots,\) \(\hat{\beta}_K\)).