11.3 Predict
predict
calculates predictions, residuals, influence statistics, and the like after estimation. Exactly what predict
can do is determined by the previous estimation command; command-specific options are documented with each estimation command. Regardless of command-specific options, the actions of predict
share certain similarities across estimation commands:
predict newvar1
createnewvar1
containing “predicted values”, i.e., \(\hat{y}_i = \E(y_i\mid \bx_i)\)- For linear regression models, \(\hat{y}_i = \bx_i'\hat{\bbeta.}\)
- For probit/logit models, \(\hat{y}_i = F(\bx_i'\hat{\bbeta}),\) where \(F(.)\) is the logistic or normal cumulative distribution function.
predict newvar2, xb
createnewvar2
containing the linear predictionOption
xb
means caculating the linear prediction, \(\bx_i'\hat{\bbeta},\) from the fitted model.predict newvar2 if e(sample), xb
Same as above, but only for observations used to fit the model in the previous estimation, i.e., in-sample predictions.e(sample)
: return \(1\) if the observation is in the estimation sample and \(0\) otherwise.predict
can be used in out-of-sample predictions, which extende beyond the estimation sample.You can load a new dataset and type
predict
to obtain results for that sample.predict e, residuals
will generate a variablee
containing the residuals of the estimation\[ e_i = y_i - \hat{y}_i \]
Consider the linear prediction
\[ \begin{split} \hat{y}_i &= \bx_i'\hat{\bbeta} \\ &= \hat{\beta}_1x_{1i} + \hat{\beta}_2x_{2i} + \cdots + \hat{\beta}_Kx_{Ki} . \end{split} \]
\(\hat{y}_i\) is called the
- predcited values for in-sample predictions
- forecasts for out-of-sample predictions
For logit or probit, \(\bx_i'\hat{\bbeta}\) is called the logit or probit index. The predicted probability is \(p_i=\hat{y}_i=F(\bx_i'\hat{\bbeta}),\) where \(F(.)\) is the logistic or normal cumulative distribution function. For probit, \(\hat{y}_i=\Phi(\bx_i'\hat{\bbeta})\) .
\(x_{1i},\) \(x_{2i},\) \(\ldots,\) \(x_{Ki}\) are obtained from the data currently in memory and do NOT necessarily correspond to the data on the independent variables used to fit the model (obtaining \(\hat{\beta}_1,\) \(\hat{\beta}_2,\) \(\ldots,\) \(\hat{\beta}_K\)).