11.4 Forecast
Refers to
- [U] 20.21 Dynamic forecasts and simulations for a quick overview.
- [TS] forecast for detailed documentation.
Foreceast: out-of-sample
forecast works with time-series and panel datasets, and you can obtain either dynamic or staticforecasts.
Dynamic forecasts use previous periods’ forecast values wherever lags appear in the model’s equations and thus allow you to obtain forecasts for multiple periods in the future.
Static forecasts use previous periods’ actual values wherever lags appear in the model’s equations, so if you use lags, you cannot make predictions much beyond the end of the time horizon in your dataset. However, static forecasts are useful during model development.
Note: Dynamic vs Static forecasts do not indicate whether the model itself is dynamic or static. It refers to how lagged values are treated when making forecasts.
Quick takeaway: Using dynamic forecasts to make predictions multiple periods into the future where you do not have observations for the lagged dependent variable.
You can incorporate outside information into your forecasts, and you can specify a future path for some of the model’s variables and obtain forecasts for the other variables conditional on that path. These features allow you to produce forecasts under different scenarios, and they allow you to explore how different policy interventions would affect your forecasts.
Before we are able to forecast, we must populate the exogenous variables over the entire forecast horizon before solving our model. 添加数据
Solving our model: means obtain forecast from our model.
11.4.1 Essential Procedure
Estimate the model
Here we use arima model as an example.
Store the estimation results using
estimate storeCreate a forecast model using
forecast create.This initialize a new model; we will call the model
mymodel.The name you give the model mainly controls how output from
forecastcommands is labeled. More importantly,forecast createcreates the internal data structures Stata uses to keep track of your model.Add all equations to the model you just created using
forecast estimates.The following command adds the stored estimation results in
myarimato the current modelmymodel.forecast estimates myarimaCompute dynamic forecasts from 2012 to 2024
11.4.2 Creates a new forecast model
The forecast create command creates a new forecast model in Stata.
You must create a model before you can add equations or solve it. You can have only one model in memory at a time.
You may optionally specify a name for your model. That name will appear in the output produced by the various forecast subcommands.
replace clear the existing model from memory before creating name. By default, forecast create issues an error message if another model is already in memory.
Note that you can add multiple equations to a forecast model.
11.4.3 Add equations/identifies
Add estimation results to a forecast model currently in memory.
modelname is the name of a stored estimation result being added; it is generated by estimates store modelname.
Options:
predict(p_options): callpredictusingp_optionsnames(newnamelist[ , replace]): usenewnamelistfor the names of left-hand-side (LHS) variables in the estimation result being added, i.e.,modelname.forecast estimatescreates a new variable in the dataset for each element ofnamelist.You MUST use this option of any of the LHS variables contains time series operators, e.g.,
D.,L..If a variable of the same name already exists in your dataset,
forecast estimatesexits with an error unless you specify thereplaceoption, in which case existing variables are overwritten.
Add estimation results stored in myestimates to the forecast model currently in memory.
11.4.3.1 Add an Identity to a forecast Model
An identity is a nonstochastic equation that expresses an endogenous variable in the model as a function of other variables in the model. Identities often describe the behavior of endogenous variables that are based on accounting identities or adding-up conditions.
// Add an identity to the forecast that states that y3 is the sum of y1 and y2
forecast identity y3=y1+y2
// create new variable newy before adding it to the forecast
forecast identity newy=y1+y2, generateThe difference is that if the LHS variable does not exist, you need to specify the option gen.
Ex. We have a model using annual data and want to assume that our population variable pop grows at 0.75% per year. Then we can declare endogenous variable pop by using forecast identity:
Typically, you use forecast identity to define the relationship that determines an endogenous variable that is already in your dataset.
The generate option of forecast identity is useful when you wish to use a transformation of one or more endogenous variables as a right-hand-side variable in a stochastic equation that describes another endogenous variable.
11.4.3.2 Add equations that you obtained elsewhere to your model
Up untill now, we have been using model output from Stata to add equations to a forecast model, i.e., using forecast estimates.
You use forecast coefvector to add endogenous variables to your model that are defined by linear equations.
Common use scenarios of forecast coefvector:
Sometimes, you might see the estimated coefficients for an equation in an article and want to add that equation to your model. In this case,
forecast coefvectorallows you to add equations that are stored as coefficient vectors to a forecast model.User-written estimators that do not implement a
predictcommand can also be included in forecast models viaforecast coefvector.forecast coefvectorcan also be useful in situations where you want to simulate time-series data.
cname is a Stata matrix with one row. It defines the linear equations, which are stored in a coefficient (parameter) vector.
Options:
variance(vname): specify parameter variance matrix of the estimated parameters.This option only has an effect if you specify the
simulate()option when callingforecast solveand requestsim_technique’sbetasorresiduals.errorvariance(ename): specify additive error term with variance matrixename, whereenameis the name of s Stata matrix. The number of rows and columns inenamemust match the number of equations represented by coefficient vectorcname.This option only has an effect if you specify the
simulate()option when callingforecast solveand requestsim_technique’sbetasorresiduals.names(namelist[ , replace ]): instructsforecast coefvectorto use namelist as the names of the left-hand-side variables in the coefficient vector being added. By default,forecast coefvectoruses the equation names on the column stripe of cname.You must use this option if any of the equation names stored with
cnamecontains time-series operators.
You use forecast coefvector to add endogenous variables to your model that are defined by linear equations, where the linear equations are stored in a coefficient (parameter) vector.
// Incorporate coefficient vector of the endogenous equation of y to be used by forecast solve
forecast coefvector yEx. We want to add the following eqns to a forecast model. \[ \begin{split} x_t &= 0.2 + 0.3 x_{t-1} - 0.8 z_t \\ z_t &= 0.1 + 0.7 z_{t-1} + 0.3 x_t - 0.2 x_{t-1} \end{split} \]
We first define the coefficient vector eqvector.
// define a row vector
matrix eqvector = (0.2, 0.3, -0.8, 0.1, 0.7, 0.3, -0.2)
// add equation names and variale names
// equation names are before the colon
// variable names are after the colon
matrix coleq eqvector = x:_cons x:L.x x:y y:_cons y:L.y y:x y:L.x
matrix list eqvectorWe could then add the coefficient vector to a forecast model.
11.4.3.3 Declare exogenous variables
Declaring exogenous variables with forecast exogenous is not explicitly necessary, but we nevertheless strongly encourage doing so.
Stata can check the exogenous variables before solving the model and issue an appropriate error message if missing values are found, whereas troubleshooting models for which forecasting failed is more difficult after the fact.
Undeclared exogenous variables that contain missing values within the forecast horizon will cause forecast solve to exit with a less-informative error message and require the user to do more work to pinpoint the problem.
Summary:
Endogenous variables are added to the forecast model via forecast estimates, forecast identity, and forecast coefvector.
- Equations added via
forecast estimatesare always stochastic, - while equations added via
forecast identityare always nonstochastic. - Equations added via
forecast coefvectorare treated as stochastic if optionsvariance()orerrorvariance()(or both) are specified and nonstochastic if neither is specified.
11.4.3.4 forecast adjust
forecast adjust adjusts a variable by add factoring, replacing, etc.
varname is the name of the endogenous variable that has been previously added to the model using forecast estimates or forecast coefvector.
forecast adjust specifies an adjustment to be applied to an endogenous variable in the model. Adjustments are typically used to produce alternative forecast scenarios or to incorporate outside information into a model.
// Adjust the endogenous variable y in forecast to account for the variable shock in 1990
forecast adjust y = y + shock if year==199011.4.4 Solve the foreceast
forecast solve computes static or dynamic forecasts based on the model currently in memory. Before you can solve a model, you must first create a new model using forecast create and add equations and variables to it using forecast estimates, forecast coefvector, or forecast identity.
Options:
prefix(string)andsuffix(string)specify prefix/suffix for forecast variables.You may specify
prefix()orsuffix()but NOT both.By default, forecast values will be prefixed by
f_.begin(time_constant)andend(time_constant)specify period to begin/end forecastingperiods(#)specify number of periods to forecaststaticproduce static forecasts instead of dynamic forecastsActual values of variables are used wherever lagged values of the endogenous variables appear in the model. Static forecasts are also called one-step-ahead forecasts.
By default, dynamic forecasts are produced, which use the forecast values of variables wherever lagged values of the endogenous variables appear in the model.
actualsuse actual values if available instead of forecastsactualsspecifies how nonmissing values of endogenous variables in the forecast horizon are treated. By default, nonmissing values are ignored, and forecasts are produced for all endogenous variables. When you specifyactuals,forecastsets the forecast values equal to the actual values if they are nonmissing. The forecasts for the other endogenous variables are then conditional on the known values of the endogenous variables with nonmissing data.log(log_level)logleveltakes on one of the following valueson: default, provides an iteration log showing the current panel and period for which the model is being solved as well as a sequence of dots for each period indicating the number of iterations.off: suppress the iteration log.detail: a detailed iteration log including the current values of the convergence criteria for each period in each panel (in the case of panel data) for which the model is being solved.brief: produces an iteration log showing the current panel being solved but does not show which period within the current panel is being solved.
simulate(sim_technique, sim_statistic sim_options)allows you to simulate your model to obtain measures of uncertainty surrounding the point forecasts produced by the model.Simulating a model involves repeatedly solving the model, each time accounting for the uncertainty associated with the error terms and the estimated coefficient vectors.
sim_techniquecan bebetas,errors, orresiduals.betas: draw multivariate-normal parameter vectors ← sampling error from the estimated coefficientserrors: draw additive errors from multivariate normal distribution ← uncertainty from the stochastic error terms; errors drawn from a normal distribution with mean zero and variance equal to the estimated variance of the error termsresiduals: draw additive residuals based on static forecast errors; errors drawn from the pool of static-forecast residuals
sim_statisticspecifies a summary statistic to summarize the forecasts over all the simulations.statisticcan bemean,variance, orstddev. You may specify either the prefix or the suffix that will be used to name the variables that will contain the requestedstatistic.This will store the standard deviations of our forecasts in variables prefixed with
sd_.sim_optionsincludesreps(#)request thatforecast solveperform#replications; default isreps(50)saving(filename, …)save results to filenodotssuppress replication dots. By default, one dot character is displayed for each successful replication. If during a replication convergence is not achieved, forecast solve exits with an error message.
11.4.5 Use example: forecast a panel
\[ \%\Delta \text{dim}_{it} = \beta_0 + \beta_1 \ln(\text{starts}_{it}) + \beta_2 \text{rgspgrowth}_{it} + \beta_3 \text{unrate}_{it} + u_{i} + \varepsilon_{it} \]
\(u_{i}\) refers to individual fixed effects.
When we make forecasts for any individual panel, we may want to include it in our forecasts. This can be achieved by using forecast adjust.
use https://www.stata-press.com/data/r19/statehardware, clear
generate lndim = ln(dim)
generate lnstarts = ln(starts)
quietly xtreg D.lndim lnstarts rgspgrowth unrate if qdate <= tq(2009q4), fe
predict dlndim_u, u /* obtain individual fixed effects */
estimates store dim /* store estimation results */With enough observations, we can have more confidence in the estimated panel-specific errors. If we are willing to assume that we have decent estimates of the panel-specific errors and that those panel-level effects will remain constant over the forecast horizon, then we can incorporate them into our forecasts.
Because predict only provided us with estimates of the panel-level effects for the estimation sample, we need to extend them into the forecast horizon.
An easy way to do that is to use egen to create a new set of variables:
We can use forecast adjust to incorporate these terms into our forecasts.
The following commands define our forecast model, including the estimated panel-specific terms:
/* create forecast model */
forecast create statemodel, replace
/* add equations, rename the endog variable, D.lndim, to be forecasted as dlndim */
/* since the original endog variable name includes a time series operator
it is required to name, otherwise will return error */
forecast estimates dim, name(dlndim)
/* add state fixed effects */
forecast adjust dlndim = dlndim + dlndim_u2 Note that our dependent variable contains a time series operator, we must use
name(dlndim)option offorecast estimatesto specify a valid name for the endogenous variable being added.dlndimstands for the first difference of the logarithm ofdim. We are interested in the level ofdim, so we need to back outdimfromdlndim.→ We use
forecast identityto obtain the actualdimvariable.
// reverse first difference, note that you refer to the endog var using the new name, dlndim, now
forecast identity lndim = L.lndim + dlndim
// reverse natural logarithm
forecast identity dim = exp(lndim)We used forecast adjust to perform our adjustment to dlndim immediately after we added those estimation results so that we would not forget to do so.
However, we could specify the adjustment at any time.
Regardless of when you specify an adjustment, forecast solve performs those adjustments immediately after the variable being adjusted is computed.
Finally we can solve the model. Here we obtain dynamic forecasts beginning in the first quarter of 2010: