11.2 Data Manipulation
11.2.1 Import and Export
Shipped datasets
Stata contains some demonstration datasets in the system directories.
sysuse dir: list the names of shipped datasets.
sysuse lifeexp: use lifeexp
Note that use lifeexp will return error. Data not found.
User datasets
.dta
use myauto [, clear]: Load myauto.dta (Stata-format) into memory.
clearit is okay to replace the data in memory, even though the current data have not been saved to disk.
save myauto [, replace]: Create a Stata data type file myauto.dta
replaceallows Stata to overwrite existing dataset that is the output from previous attempts to run the do file.
.csv
import delimited myauto.csv: Import myauto.csv to Stata’s memory
export delimited myauto.csv” Export to myauto.csv
import delimited filename reads text (ASCII) files in which there is one observation
per line and the values are separated by commas, tabs, or some other delimiter.
By default, Stat will check if the file is delimited by tabs or commas based on the first line of data.
export delimited filename writes data into a file in comma-separated (.csv) format by default. You can specify any separation character delimiter that you prefer.
If filename is specified without an extension, .csv is assumed. If filename contains embedded spaces, enclose it in double quotes.
Options
delimiters("chars"[, collapse | asstring] ):"chars"specifies the delimiter";": uses semicolon as a delimiter;"\t"uses tab,"whitespace"uses whitespacecollapsetreat multiple consecutive delimiters as just one delimiter.asstringtreatcharsas one delimiter. By default, each character incharsis treated as an individual delimiter.
clearreplace data in memory
11.2.2 Save Estimation Results
estimates store model_name stores the current (active) estimation results under the name model_name.
// Store estimation results as m1 for use later in the same session
. estimates store m1
// to get them back
. estimates restore m1
// Find out what you have stored
. estimates direstimate save saves the current active estimation results to a file with the extension .ster.
In a different session, you can reload those results:
// Load the saved estimation results
. estimates use basemodel
// Display the results
. estimates tableQ: What is the difference between estimates store and estimates save?
A: Once estimation results are stored, you can use other estimates
commands to produce tables and reports from them.
estimates table [namelist] [, options] organizes estimation results from one or more models in a single formatted table.
If you type estimates table without arguments, a table of the most recent estimation coefficients will be shown.
// Display a table of coefficients for stored estimates m1 and m2
estimates table m1 m2
// with SE
estimates table m1 m2, se
// with sample size, adjusted 𝑅2, and stars
estimates table m1 m2, stats(N r2_a) starYou can add more results to show using options:
stats(scalarlist)reports additional statistics in the table. Below are commonly used result identifiers:Nfor sample sizer2_afor adjusted \(R^2\)r2for \(R^2\)Ffor F-statisticchi2for chi-squared statisticpfor p-value
stats(N r2_a)to show sample size and adjusted \(R^2\)starshows stars for significance levels.By default,
star(.05 .01 .001), which uses the following significance levels:*for \(p < 0.05\)**for \(p < 0.01\)***for \(p < 0.001\)
You can change the significance levels using
star(.1 .05 .01)to set the levels to 0.10, 0.05, and 0.01, respectively.N.B. the
staroption may not be combined with these,t, orpoption.
An error will be returned if you try to combine them:
b[%fmt]how to format the coefficients.se[%fmt]show standard errors and use optional formatt[%fmt]show \(t\) or \(z\) statistics and use optional formatp[%fmt]show \(p\) values and use optional formatvarlabeldisplay variable labels rather than variable names
// show stars for sig. levels
. estimates table, star
// show se, t, and p values
. estimates table, se t pAll statistics are shown in order under the coefficients. If you have a long list of variables, the table can be very long.
You can use keep(varlist) to keep only the variables you want to show in the table.
varlistis a list of variables you want to keep in the table.A list of variables can be specified as
keep(var1 var2 var3).Names are separated by spaces.
Not possible to use variable ranges, e.g.,
keep(var1-var3)will return an error.When you have multiple equations, use
eqn_name:varnameto specify the variable in a specific equation.
Example of a long variable list
etable
etable allows you to easily create a table of estimation results and export it to a variety of file types, e.g., docx, html, pdf, xlsx, tex, txt, markdown, md.
// use example of etable
. clear all
. webuse nhanes2l
(Second National Health and Nutrition Examination Survey)
. quietly regress bpsystol age weight i.region
. estimates store model1
. quietly regress bpsystol i.sex weight i.agegrp
. estimates store model2
. quietly regress bpsystol age weight i.agegrp
. estimates store model3
. etable, estimates(model1 model2 model3) showstars showstarsnote title("Table 1. Models for systolic blood pressure") export(mydoc.docx, replace)Options:
showstarsandshowstarsnoteshows stars and notes for significance levels.exportallows you to specify the output format
Alternative to etable: eststo.
11.2.3 Stored Results
Stata commands that report results also store the results where they can be subsequently used by other commands or programs. This is documented in the Stored results section of the particular command in the reference manuals.
r-class commands, such as summarize, store their results in
r();most commands are r-class.
e-class commands, such as regress, store their results in
e();e-class commands are Stata’s model estimation commands.
Most estimation commands leave behind
e(b)the coefficient vector, ande(V)the variance–covariance matrix of the estimates (VCE)
// display coef vector
matrix list e(b)
// assign it to a variable
matrix myb = e(b)
matrix list mybYou can refer to e(b) and e(V) in any matrix expression:
invsym(e(V)) returns the inverse of e(V). Generally, invsym requires a a square, symmetric, and positive-definite matrix.