11.7 Common Correlated Effects Models
This section provides an overview of how Stata’s xtdcce2
package implements Common Correlated Effects (CCE) models, which are useful for panel data analysis with heterogeneous coefficients and common correlated effects.
Environment setup
- Data has to be
xtset
before usingxtdcce2
.
11.7.1 Econometric model
ARDL(1, 1) model with heterogeneous coefficients and common correlated effects (CCE) is given by:
\[ \begin{equation} \begin{split} y(i,t) &= b0(i) + b1(i) * y(i,t-1) + b2(i) * x(i,t) + b3(i) * x(i,t-1) \\ &\phantom{=}\quad + u(i,t) \end{split} \tag{11.2} \end{equation} \]
where
\[ u(i,t) = g(i) * f(t) + e(i,t) \]
f(t)
is an unobserved common factor loading,g(i)
a heterogeneous factor loading,x(i,t)
is a (1 x K) vector andb2(i)
andb3(i)
the coefficient vectors. It is assumed thatx(i,t)
is strictly exogenous.- The error
e(i,t)
is iid. - The heterogeneous coefficients
b1(i)
,b2(i)
andb3(i)
are randomly distributed around a common mean.- In the case of a static panel model, we have
b1(i) = 0
.
- In the case of a static panel model, we have
11.7.2 Estimation
11.7.2.1 Static
Pesaran (2006) shows that the averages of the coefficients b0
, b2
and b3
(for example for b2(mg) = 1/N sum(b2(i))) can be consistently estimated by adding cross sectional means of the dependent and all independent variables.
The default equation in xtdcce2
is given by:
\[ \begin{equation} y(i,t) = b0(i) + b2(i)*x(i,t) + d(i)*z(i,t) + e(i,t). \tag{11.3} \end{equation} \]
Note that Eq. (11.3) is a static model, the lagged dependent variable does not occur and only contemporaneous cross sectional averages are used.
Including the dependent and independent variables in
crosssectional()
and settingcr_lags(0)
leads to the same result.cr_lags(0)
means that only contemporaneous cross sectional means are included.crosssectional()
defines the variables to be included inz(i,t)
.Important to notice is, that
b1(i)
is set to zero.
Example
cr(_all)
means that all variables are included in the cross sectional means.It is equivalent to
crosssectional(log_rgdpo log_hc log_ck log_ngd)
.The default number of cross sectional lags is zero (
cr_lags(0)
), implying only contemporaneous cross sectional averages are used.cr_lags(3)
would include the lags of cross sectional means up to three.reportc
reports the constant term. If not specified the constant is partialled out.
11.7.2.2 Dynamic
Chudik and Pesaran (2015) extends to a dynamic panel data model (b1(i) != 0
); pT
lags of the cross sectional means are added to achieve consistency.
The mean group estimates for
b1
,b2
andb3
are consistently estimated as long asN
,T
andpT
go to infinity. This implies that the number of cross sectional units and time periods is assumed to grow with the same rate.In an empirical setting this can be interpreted as
N/T
being constant.A dataset with one dimension being large in comparison to the other would lead to inconsistent estimates, even if both dimension are large in numbers.
Stata estimates the following dynamic CCE model:
\[ \begin{equation} \begin{split} y(i,t) &= b0(i) + b1(i)*y(i,t-1) + b2(i)*x(i,t) \\ &\phantom{=}\quad + \sum_{s=t}^{t-pT} [d(i)*z(i,s)] + e(i,t). \end{split} \tag{11.4} \end{equation} \]
Eq. (11.4) is estimated if the option cr_lags()
contains a positive number.
z(i,s)
is the cross sectional average of the variables defined incrosssectional()
.
Example
xtdcce2 d.log_rgdpo L.log_rgdpo log_hc log_ck log_ngd , ///
reportc cr(log_rgdpo log_hc log_ck log_ngd) cr_lags(3)
cr_lags(3)
the number of lags is set to 3.
The variance of the mean group coefficient b1(mg)
is estimated as:
\[ \var(b1(mg)) = \frac{1}{N} \sum_{i=1}^N \left(b1(i) - b1(mg)\right)^2 \]
If the vector \(pi(mg) = \left(b0(mg), b1(mg)\right)',\) the variance is given by:
\[ \var(pi(mg)) = \frac{1}{N} \sum_{i=1}^N \left(pi(i) - pi(mg)\right) \; \left(p(i)-pi(mg)\right)' \]
11.7.2.3 Pooled Estimation
Eqs (11.3) and (11.4) can be estimated as a pooled model where the coefficients are assumed to be equal across all cross sectional units.
Hence the equations become:
Pooled Pesaran
\[ \begin{equation} y(i,t) = b0 + b2*x(i,t) + d(i)*z(i,t) + e(i,t) \end{equation} \]
Pooled Chudik and Pesaran
\[ \begin{equation} y(i,t) = b0 + b1*y(i,t-1) + b2*x(i,t) + \sum_{s=t}^{t-pT} [d(i)*z(i,s)] + e(i,t). \end{equation} \]
Variables with pooled (homogenous) coefficients are specified using the pooled(varlist)
option.
The constant is pooled by using the option pooledconstant
.
In case of a pooled estimation, the standard errors are obtained from a mean group regression.
Example
xtdcce2 d.log_rgdpo L.log_rgdpo log_hc log_ck log_ngd , ///
reportc cr(log_rgdpo log_hc log_ck log_ngd) ///
pooled(L.log_rgdpo log_hc log_ck log_ngd) cr_lags(3) pooledconstant
pooled(L.log_rgdpo log_hc log_ck log_ngd)
means all coefficients should be pooled.pooledconstant
means the constant is pooled.
11.7.4 Boostrap
xtdcce2
can bootstrap confidence intervals and standard errors. It supports two types of bootstraps: the wild bootstrap and the cross-section bootstrap.
The cross-section bootstrap is the default method.
The cross-section bootstrap draws with replacement from the cross-sectional dimension. That is it draws randomly cross-sectional units with their entire time series. It then estimates the model using
xtdcce2
.The wild bootstrap is a slower from of the wild bootstrap implemented in boottest (Roodman et. al. 2019). It reweighs the residuals with Rademacher weights from the initial regression, recalculates the dependent variable and then runs
xtdcce2
.
refs: