11.2 Data Manipulation

11.2.1 Import and Export

Shipped datasets

Stata contains some demonstration datasets in the system directories.

sysuse dir: list the names of shipped datasets.

sysuse lifeexp: use lifeexp

Note that use lifeexp will return error. Data not found.

User datasets

.dta

use myauto [, clear]: Load myauto.dta (Stata-format) into memory.

  • clear it is okay to replace the data in memory, even though the current data have not been saved to disk.

save myauto [, replace]: Create a Stata data type file myauto.dta

  • replace allows Stata to overwrite existing dataset that is the output from previous attempts to run the do file.

.csv

import delimited myauto.csv: Import myauto.csv to Stata’s memory

export delimited myauto.csv” Export to myauto.csv

import delimited filename reads text (ASCII) files in which there is one observation per line and the values are separated by commas, tabs, or some other delimiter.

By default, Stat will check if the file is delimited by tabs or commas based on the first line of data.

export delimited filename writes data into a file in comma-separated (.csv) format by default. You can specify any separation character delimiter that you prefer.

If filename is specified without an extension, .csv is assumed. If filename contains embedded spaces, enclose it in double quotes.

import delimited [using] filename [, import_delimited_options]

Options

  • delimiters("chars"[, collapse | asstring] ):

    • "chars" specifies the delimiter

      ";": uses semicolon as a delimiter; "\t" uses tab, "whitespace" uses whitespace

    • collapse treat multiple consecutive delimiters as just one delimiter.

    • asstring treat chars as one delimiter. By default, each character in chars is treated as an individual delimiter.

    // use example
    import delimited auto, delim(" ", collapse) colrange(:3) rowrange(8) 
  • clear replace data in memory


11.2.2 Save Estimation Results

estimates store model_name stores the current (active) estimation results under the name model_name.

// Store estimation results as m1 for use later in the same session
. estimates store m1
// to get them back 
. estimates restore m1
// Find out what you have stored 
. estimates dir

estimate save saves the current active estimation results to a file with the extension .ster.

// Save the current active estimation results
. estimate save basemodel
file basemodel.ster saved

In a different session, you can reload those results:

// Load the saved estimation results
. estimates use basemodel
// Display the results
. estimates table

Q: What is the difference between estimates store and estimates save?
A: Once estimation results are stored, you can use other estimates commands to produce tables and reports from them.


estimates table [namelist] [, options] organizes estimation results from one or more models in a single formatted table.

If you type estimates table without arguments, a table of the most recent estimation coefficients will be shown.

// Display a table of coefficients for stored estimates m1 and m2
estimates table m1 m2
// with SE
estimates table m1 m2, se

// with sample size, adjusted 𝑅2, and stars
estimates table m1 m2, stats(N r2_a) star

You can add more results to show using options:

  • stats(scalarlist) reports additional statistics in the table. Below are commonly used result identifiers:

    • N for sample size
    • r2_a for adjusted \(R^2\)
    • r2 for \(R^2\)
    • F for F-statistic
    • chi2 for chi-squared statistic
    • p for p-value

    stats(N r2_a) to show sample size and adjusted \(R^2\)

  • star shows stars for significance levels.

    • By default, star(.05 .01 .001), which uses the following significance levels:

      • * for \(p < 0.05\)
      • ** for \(p < 0.01\)
      • *** for \(p < 0.001\)
    • You can change the significance levels using star(.1 .05 .01) to set the levels to 0.10, 0.05, and 0.01, respectively.

    • N.B. the star option may not be combined with the se, t, or p option.

    An error will be returned if you try to combine them:

    .  estimate table, star se t p star
    option star not allowed
  • b[%fmt] how to format the coefficients.

  • se[%fmt] show standard errors and use optional format

  • t[%fmt] show \(t\) or \(z\) statistics and use optional format

  • p[%fmt] show \(p\) values and use optional format

  • varlabel display variable labels rather than variable names

// show stars for sig. levels
. estimates table, star

// show se, t, and p values
.  estimates table, se t p

All statistics are shown in order under the coefficients. If you have a long list of variables, the table can be very long.

You can use keep(varlist) to keep only the variables you want to show in the table.

  • varlist is a list of variables you want to keep in the table.
    • A list of variables can be specified as keep(var1 var2 var3).

      Names are separated by spaces.

    • Not possible to use variable ranges, e.g., keep(var1-var3) will return an error.

    • When you have multiple equations, use eqn_name:varname to specify the variable in a specific equation.

Example of a long variable list

estimates table, keep(L1.logd_gdp tmp tmp2 pre pre2 tmp_pre tmp2_pre tmp_pre2 tmp2_pre2) se t p 

etable

etable allows you to easily create a table of estimation results and export it to a variety of file types, e.g., docx, html, pdf, xlsx, tex, txt, markdown, md.

// use example of etable
. clear all
. webuse nhanes2l
(Second National Health and Nutrition Examination Survey)
. quietly regress bpsystol age weight i.region
. estimates store model1

. quietly regress bpsystol i.sex weight i.agegrp
. estimates store model2

. quietly regress bpsystol age weight i.agegrp
. estimates store model3

. etable, estimates(model1 model2 model3) showstars showstarsnote title("Table 1. Models for systolic blood pressure") export(mydoc.docx, replace)

Options:

  • showstars and showstarsnote shows stars and notes for significance levels.
  • export allows you to specify the output format

Alternative to etable: eststo.

11.2.3 Stored Results

Stata commands that report results also store the results where they can be subsequently used by other commands or programs. This is documented in the Stored results section of the particular command in the reference manuals.

  • r-class commands, such as summarize, store their results in r();

    most commands are r-class.

  • e-class commands, such as regress, store their results in e();

    e-class commands are Stata’s model estimation commands.

// for r-class command
return list
// for e-class command
ereturn list

Most estimation commands leave behind

  • e(b) the coefficient vector, and
  • e(V) the variance–covariance matrix of the estimates (VCE)
// display coef vector
matrix list e(b)
// assign it to a variable
matrix myb = e(b)
matrix list myb

You can refer to e(b) and e(V) in any matrix expression:

matrix c = e(b)*invsym(e(V))*e(b)’
matrix list c

invsym(e(V)) returns the inverse of e(V). Generally, invsym requires a a square, symmetric, and positive-definite matrix.