11.2 Data Manipulation
11.2.1 Import and Export
Shipped datasets
Stata contains some demonstration datasets in the system directories.
sysuse dir
: list the names of shipped datasets.
sysuse lifeexp
: use lifeexp
Note that use lifeexp
will return error. Data not found.
User datasets
.dta
use myauto [, clear]
: Load myauto.dta
(Stata-format) into memory.
clear
it is okay to replace the data in memory, even though the current data have not been saved to disk.
save myauto [, replace]
: Create a Stata data type file myauto.dta
replace
allows Stata to overwrite existing dataset that is the output from previous attempts to run the do file.
.csv
import delimited myauto.csv
: Import myauto.csv
to Stata’s memory
export delimited myauto.csv
” Export to myauto.csv
import delimited filename
reads text (ASCII) files in which there is one observation
per line and the values are separated by commas, tabs, or some other delimiter.
By default, Stat will check if the file is delimited by tabs or commas based on the first line of data.
export delimited filename
writes data into a file in comma-separated (.csv) format by default. You can specify any separation character delimiter that you prefer.
If filename
is specified without an extension, .csv
is assumed. If filename contains embedded spaces, enclose it in double quotes.
Options
delimiters("chars"[, collapse | asstring] )
:"chars"
specifies the delimiter";"
: uses semicolon as a delimiter;"\t"
uses tab,"whitespace"
uses whitespacecollapse
treat multiple consecutive delimiters as just one delimiter.asstring
treatchars
as one delimiter. By default, each character inchars
is treated as an individual delimiter.
clear
replace data in memory
11.2.2 Save Estimation Results
estimates store model_name
stores the current (active) estimation results under the name model_name
.
// Store estimation results as m1 for use later in the same session
. estimates store m1
// to get them back
. estimates restore m1
// Find out what you have stored
. estimates dir
estimate save
saves the current active estimation results to a file with the extension .ster
.
In a different session, you can reload those results:
// Load the saved estimation results
. estimates use basemodel
// Display the results
. estimates table
Q: What is the difference between estimates store
and estimates save
?
A: Once estimation results are stored, you can use other estimates
commands to produce tables and reports from them.
estimates table [namelist] [, options]
organizes estimation results from one or more models in a single formatted table.
If you type estimates table without arguments, a table of the most recent estimation coefficients will be shown.
// Display a table of coefficients for stored estimates m1 and m2
estimates table m1 m2
// with SE
estimates table m1 m2, se
// with sample size, adjusted 𝑅2, and stars
estimates table m1 m2, stats(N r2_a) star
You can add more results to show using options:
stats(scalarlist)
reports additional statistics in the table. Below are commonly used result identifiers:N
for sample sizer2_a
for adjusted \(R^2\)r2
for \(R^2\)F
for F-statisticchi2
for chi-squared statisticp
for p-value
stats(N r2_a)
to show sample size and adjusted \(R^2\)star
shows stars for significance levels.By default,
star(.05 .01 .001)
, which uses the following significance levels:*
for \(p < 0.05\)**
for \(p < 0.01\)***
for \(p < 0.001\)
You can change the significance levels using
star(.1 .05 .01)
to set the levels to 0.10, 0.05, and 0.01, respectively.N.B. the
star
option may not be combined with these
,t
, orp
option.
An error will be returned if you try to combine them:
b[%fmt]
how to format the coefficients.se[%fmt]
show standard errors and use optional formatt[%fmt]
show \(t\) or \(z\) statistics and use optional formatp[%fmt]
show \(p\) values and use optional formatvarlabel
display variable labels rather than variable names
// show stars for sig. levels
. estimates table, star
// show se, t, and p values
. estimates table, se t p
All statistics are shown in order under the coefficients. If you have a long list of variables, the table can be very long.
You can use keep(varlist)
to keep only the variables you want to show in the table.
varlist
is a list of variables you want to keep in the table.A list of variables can be specified as
keep(var1 var2 var3)
.Names are separated by spaces.
Not possible to use variable ranges, e.g.,
keep(var1-var3)
will return an error.When you have multiple equations, use
eqn_name:varname
to specify the variable in a specific equation.
Example of a long variable list
etable
etable
allows you to easily create a table of estimation results and export it to a variety of file types, e.g., docx
, html
, pdf
, xlsx
, tex
, txt
, markdown
, md
.
// use example of etable
. clear all
. webuse nhanes2l
(Second National Health and Nutrition Examination Survey)
. quietly regress bpsystol age weight i.region
. estimates store model1
. quietly regress bpsystol i.sex weight i.agegrp
. estimates store model2
. quietly regress bpsystol age weight i.agegrp
. estimates store model3
. etable, estimates(model1 model2 model3) showstars showstarsnote title("Table 1. Models for systolic blood pressure") export(mydoc.docx, replace)
Options:
showstars
andshowstarsnote
shows stars and notes for significance levels.export
allows you to specify the output format
Alternative to etable
: eststo
.
11.2.3 Stored Results
Stata commands that report results also store the results where they can be subsequently used by other commands or programs. This is documented in the Stored results section of the particular command in the reference manuals.
r-class commands, such as summarize, store their results in
r()
;most commands are r-class.
e-class commands, such as regress, store their results in
e()
;e-class commands are Stata’s model estimation commands.
Most estimation commands leave behind
e(b)
the coefficient vector, ande(V)
the variance–covariance matrix of the estimates (VCE)
// display coef vector
matrix list e(b)
// assign it to a variable
matrix myb = e(b)
matrix list myb
You can refer to e(b)
and e(V)
in any matrix expression:
invsym(e(V))
returns the inverse of e(V)
. Generally, invsym
requires a a square, symmetric, and positive-definite matrix.