11.4 Forecast
Refers to
- [U] 20.21 Dynamic forecasts and simulations for a quick overview.
- [TS] forecast for detailed documentation.
Foreceast: out-of-sample
forecast works with time-series and panel datasets, and you can obtain either dynamic or staticforecasts.
Dynamic forecasts use previous periods’ forecast values wherever lags appear in the model’s equations and thus allow you to obtain forecasts for multiple periods in the future.
Static forecasts use previous periods’ actual values wherever lags appear in the model’s equations, so if you use lags, you cannot make predictions much beyond the end of the time horizon in your dataset. However, static forecasts are useful during model development.
Note: Dynamic vs Static forecasts do not indicate whether the model itself is dynamic or static. It refers to how lagged values are treated when making forecasts.
Quick takeaway: Using dynamic forecasts to make predictions multiple periods into the future where you do not have observations for the lagged dependent variable.
You can incorporate outside information into your forecasts, and you can specify a future path for some of the model’s variables and obtain forecasts for the other variables conditional on that path. These features allow you to produce forecasts under different scenarios, and they allow you to explore how different policy interventions would affect your forecasts.
Before we are able to forecast, we must populate the exogenous variables over the entire forecast horizon before solving our model. 添加数据
Solving our model: means obtain forecast from our model.
11.4.1 Essential Procedure
Estimate the model
Here we use arima model as an example.
Store the estimation results using
estimate store
Create a forecast model using
forecast create
.This initialize a new model; we will call the model
mymodel.
The name you give the model mainly controls how output from
forecast
commands is labeled. More importantly,forecast create
creates the internal data structures Stata uses to keep track of your model.Add all equations to the model you just created using
forecast estimates
.The following command adds the stored estimation results in
myarima
to the current modelmymodel
.forecast estimates myarima
Compute dynamic forecasts from 2012 to 2024
11.4.2 Creates a new forecast model
The forecast create
command creates a new forecast model in Stata.
You must create a model before you can add equations or solve it. You can have only one model in memory at a time.
You may optionally specify a name
for your model. That name
will appear in the output produced by the various forecast subcommands.
replace
clear the existing model from memory before creating name
. By default, forecast create
issues an error message if another model is already in memory.
Note that you can add multiple equations to a forecast model.
11.4.3 Add equations/identifies
Add estimation results to a forecast model currently in memory.
modelname
is the name of a stored estimation result being added; it is generated by estimates store modelname
.
Options:
predict(p_options)
: callpredict
usingp_options
names(newnamelist[ , replace])
: usenewnamelist
for the names of left-hand-side (LHS) variables in the estimation result being added, i.e.,modelname
.forecast estimates
creates a new variable in the dataset for each element ofnamelist
.You MUST use this option of any of the LHS variables contains time series operators, e.g.,
D.
,L.
.If a variable of the same name already exists in your dataset,
forecast estimates
exits with an error unless you specify thereplace
option, in which case existing variables are overwritten.
Add estimation results stored in myestimates
to the forecast model currently in memory.
11.4.3.1 Add an Identity to a forecast
Model
An identity
is a nonstochastic equation that expresses an endogenous variable in the model as a function of other variables in the model. Identities often describe the behavior of endogenous variables that are based on accounting identities or adding-up conditions.
// Add an identity to the forecast that states that y3 is the sum of y1 and y2
forecast identity y3=y1+y2
// create new variable newy before adding it to the forecast
forecast identity newy=y1+y2, generate
The difference is that if the LHS variable does not exist, you need to specify the option gen
.
Ex. We have a model using annual data and want to assume that our population variable pop grows at 0.75% per year. Then we can declare endogenous variable pop by using forecast identity:
Typically, you use forecast identity
to define the relationship that determines an endogenous variable that is already in your dataset.
The generate option of forecast identity is useful when you wish to use a transformation of one or more endogenous variables as a right-hand-side variable in a stochastic equation that describes another endogenous variable.
11.4.3.2 Add equations that you obtained elsewhere to your model
Up untill now, we have been using model output from Stata to add equations to a forecast model, i.e., using forecast estimates
.
You use forecast coefvector
to add endogenous variables to your model that are defined by linear equations.
Common use scenarios of forecast coefvector
:
Sometimes, you might see the estimated coefficients for an equation in an article and want to add that equation to your model. In this case,
forecast coefvector
allows you to add equations that are stored as coefficient vectors to a forecast model.User-written estimators that do not implement a
predict
command can also be included in forecast models viaforecast coefvector
.forecast coefvector
can also be useful in situations where you want to simulate time-series data.
cname
is a Stata matrix with one row. It defines the linear equations, which are stored in a coefficient (parameter) vector.
Options:
variance(vname)
: specify parameter variance matrix of the estimated parameters.This option only has an effect if you specify the
simulate()
option when callingforecast solve
and requestsim_technique
’sbetas
orresiduals
.errorvariance(ename)
: specify additive error term with variance matrixename
, whereename
is the name of s Stata matrix. The number of rows and columns inename
must match the number of equations represented by coefficient vectorcname
.This option only has an effect if you specify the
simulate()
option when callingforecast solve
and requestsim_technique
’sbetas
orresiduals
.names(namelist[ , replace ])
: instructsforecast coefvector
to use namelist as the names of the left-hand-side variables in the coefficient vector being added. By default,forecast coefvector
uses the equation names on the column stripe of cname.You must use this option if any of the equation names stored with
cname
contains time-series operators.
You use forecast coefvector
to add endogenous variables to your model that are defined by linear equations, where the linear equations are stored in a coefficient (parameter) vector.
// Incorporate coefficient vector of the endogenous equation of y to be used by forecast solve
forecast coefvector y
Ex. We want to add the following eqns to a forecast model. \[ \begin{split} x_t &= 0.2 + 0.3 x_{t-1} - 0.8 z_t \\ z_t &= 0.1 + 0.7 z_{t-1} + 0.3 x_t - 0.2 x_{t-1} \end{split} \]
We first define the coefficient vector eqvector
.
// define a row vector
matrix eqvector = (0.2, 0.3, -0.8, 0.1, 0.7, 0.3, -0.2)
// add equation names and variale names
// equation names are before the colon
// variable names are after the colon
matrix coleq eqvector = x:_cons x:L.x x:y y:_cons y:L.y y:x y:L.x
matrix list eqvector
We could then add the coefficient vector to a forecast model.
11.4.3.3 Declare exogenous variables
Declaring exogenous variables with forecast exogenous is not explicitly necessary, but we nevertheless strongly encourage doing so.
Stata can check the exogenous variables before solving the model and issue an appropriate error message if missing values are found, whereas troubleshooting models for which forecasting failed is more difficult after the fact.
Undeclared exogenous variables that contain missing values within the forecast horizon will cause forecast solve
to exit with a less-informative error message and require the user to do more work to pinpoint the problem.
Summary:
Endogenous variables are added to the forecast model via forecast estimates
, forecast identity
, and forecast coefvector
.
- Equations added via
forecast estimates
are always stochastic, - while equations added via
forecast identity
are always nonstochastic. - Equations added via
forecast coefvector
are treated as stochastic if optionsvariance()
orerrorvariance()
(or both) are specified and nonstochastic if neither is specified.
11.4.3.4 forecast adjust
forecast adjust
adjusts a variable by add factoring, replacing, etc.
varname
is the name of the endogenous variable that has been previously added to the model using forecast estimates
or forecast coefvector
.
forecast adjust
specifies an adjustment to be applied to an endogenous variable in the model. Adjustments are typically used to produce alternative forecast scenarios or to incorporate outside information into a model.
// Adjust the endogenous variable y in forecast to account for the variable shock in 1990
forecast adjust y = y + shock if year==1990
11.4.4 Solve the foreceast
forecast solve
computes static or dynamic forecasts based on the model currently in memory. Before you can solve a model, you must first create a new model using forecast create
and add equations and variables to it using forecast estimates
, forecast coefvector
, or forecast identity
.
Options:
prefix(string)
andsuffix(string)
specify prefix/suffix for forecast variables.You may specify
prefix()
orsuffix()
but NOT both.By default, forecast values will be prefixed by
f_
.begin(time_constant)
andend(time_constant)
specify period to begin/end forecastingperiods(#)
specify number of periods to forecaststatic
produce static forecasts instead of dynamic forecastsActual values of variables are used wherever lagged values of the endogenous variables appear in the model. Static forecasts are also called one-step-ahead forecasts.
By default, dynamic forecasts are produced, which use the forecast values of variables wherever lagged values of the endogenous variables appear in the model.
actuals
use actual values if available instead of forecastsactuals
specifies how nonmissing values of endogenous variables in the forecast horizon are treated. By default, nonmissing values are ignored, and forecasts are produced for all endogenous variables. When you specifyactuals
,forecast
sets the forecast values equal to the actual values if they are nonmissing. The forecasts for the other endogenous variables are then conditional on the known values of the endogenous variables with nonmissing data.log(log_level)
loglevel
takes on one of the following valueson
: default, provides an iteration log showing the current panel and period for which the model is being solved as well as a sequence of dots for each period indicating the number of iterations.off
: suppress the iteration log.detail
: a detailed iteration log including the current values of the convergence criteria for each period in each panel (in the case of panel data) for which the model is being solved.brief
: produces an iteration log showing the current panel being solved but does not show which period within the current panel is being solved.
simulate(sim_technique, sim_statistic sim_options)
allows you to simulate your model to obtain measures of uncertainty surrounding the point forecasts produced by the model.Simulating a model involves repeatedly solving the model, each time accounting for the uncertainty associated with the error terms and the estimated coefficient vectors.
sim_technique
can bebetas
,errors
, orresiduals
.betas
: draw multivariate-normal parameter vectors ← sampling error from the estimated coefficientserrors
: draw additive errors from multivariate normal distribution ← uncertainty from the stochastic error terms; errors drawn from a normal distribution with mean zero and variance equal to the estimated variance of the error termsresiduals
: draw additive residuals based on static forecast errors; errors drawn from the pool of static-forecast residuals
sim_statistic
specifies a summary statistic to summarize the forecasts over all the simulations.statistic
can bemean
,variance
, orstddev
. You may specify either the prefix or the suffix that will be used to name the variables that will contain the requestedstatistic
.This will store the standard deviations of our forecasts in variables prefixed with
sd_
.sim_options
includesreps(#)
request thatforecast solve
perform#
replications; default isreps(50)
saving(filename, …)
save results to filenodots
suppress replication dots. By default, one dot character is displayed for each successful replication. If during a replication convergence is not achieved, forecast solve exits with an error message.
11.4.5 Use example: forecast a panel
\[ \%\Delta \text{dim}_{it} = \beta_0 + \beta_1 \ln(\text{starts}_{it}) + \beta_2 \text{rgspgrowth}_{it} + \beta_3 \text{unrate}_{it} + u_{i} + \varepsilon_{it} \]
\(u_{i}\) refers to individual fixed effects.
When we make forecasts for any individual panel, we may want to include it in our forecasts. This can be achieved by using forecast adjust
.
use https://www.stata-press.com/data/r19/statehardware, clear
generate lndim = ln(dim)
generate lnstarts = ln(starts)
quietly xtreg D.lndim lnstarts rgspgrowth unrate if qdate <= tq(2009q4), fe
predict dlndim_u, u /* obtain individual fixed effects */
estimates store dim /* store estimation results */
With enough observations, we can have more confidence in the estimated panel-specific errors. If we are willing to assume that we have decent estimates of the panel-specific errors and that those panel-level effects will remain constant over the forecast horizon, then we can incorporate them into our forecasts.
Because predict only provided us with estimates of the panel-level effects for the estimation sample, we need to extend them into the forecast horizon.
An easy way to do that is to use egen
to create a new set of variables:
We can use forecast adjust
to incorporate these terms into our forecasts.
The following commands define our forecast model, including the estimated panel-specific terms:
/* create forecast model */
forecast create statemodel, replace
/* add equations, rename the endog variable, D.lndim, to be forecasted as dlndim */
/* since the original endog variable name includes a time series operator
it is required to name, otherwise will return error */
forecast estimates dim, name(dlndim)
/* add state fixed effects */
forecast adjust dlndim = dlndim + dlndim_u2
Note that our dependent variable contains a time series operator, we must use
name(dlndim)
option offorecast estimates
to specify a valid name for the endogenous variable being added.dlndim
stands for the first difference of the logarithm ofdim
. We are interested in the level ofdim
, so we need to back outdim
fromdlndim
.→ We use
forecast identity
to obtain the actualdim
variable.
// reverse first difference, note that you refer to the endog var using the new name, dlndim, now
forecast identity lndim = L.lndim + dlndim
// reverse natural logarithm
forecast identity dim = exp(lndim)
We used forecast adjust to perform our adjustment to dlndim
immediately after we added those estimation results so that we would not forget to do so.
However, we could specify the adjustment at any time.
Regardless of when you specify an adjustment, forecast solve
performs those adjustments immediately after the variable being adjusted is computed.
Finally we can solve the model. Here we obtain dynamic forecasts beginning in the first quarter of 2010: