Reporting Summary Statistics in Stata Using Outreg2

This short article is a guide on reporting summary statistics from Stata to Ms Word, Excel or LaTeX using the outreg2 command in Stata.

Download Example File

For this guide, we start by using Stata’s inbuilt 1978 Automobile dataset and describing it using:

sysuse auto.dta

All Summary Statistics for All Variables

To report summary statistics for all the variables in our dataset, we use a familiar outreg2 syntax with a new option of sum(log). This option indicates to Stata that a summary table is being output.

outreg2 using results, word replace sum(log)

We can also replace variable names with variable labels, a step that we explain in one of our <link>introductory outreg2 articles</link>.

With the summarize command, which is typically used to return summary statistics, Stata allows an option of detail.This option outputs a table with additional statistics. We can report these extra statistics through the outreg2 command by typing detail in the parenthesis of the sum() option used above:

outreg2 using results, word replace sum(detail)

Some Summary Statistics for All Variables

If we only wish to report, say the number of observations, mean and standard deviation of our variables (and not the minimum and maximum that are also reported by default), we add the keep() option to specify which variables we want to retain:

outreg2 using results, word replace sum(log) eqkeep(N mean sd)

Some Statistics for Some Variables

As seen above, we use the keep() option to retain variables that we specify in the parenthesis. However, we cannot specify both the eqkeep() and keep() options at the same time.

To obtain a summary table with a few statistics for a few variables, you can use eqkeep() to retain statistics, and drop() to omit variables, or vice versa.

The following command will return an error since both eqkeep() and keep() appear simultaneously:

outreg2 using results, word replace sum(log) eqkeep(N mean sd) keep(price mpg headroom trunk rep78)

Summary Statistics for Observations Used In a Regression

Because of Stata’s casewise/listwise deletion, it omits observations with missing values from any regression analysis done. Therefore, the number of observations used in regressions is often lower for each variable than the number of observations reported for them in the summary statistics.

For example, if we summarise the data, we see that the variables ‘price’, ‘mpg’, and ‘headroom’ have 74 observations. ‘rep78’ has 69 observations. When we regress price on the other three variables, we note that the regression used 69 observations even though there were variables with 74 observations. This is because Stata omits any observation where rep78 is missing. We therefore find estimates from only 69 observations reported in the regression results.

To obtain summary statistics for variables and observations used in a regression only we first run the regression, then use the outreg2 command right after it with an option of sum;

[embedyt] https://www.youtube.com/watch?v=1JP_NScGfJU[/embedyt]

regress price mpg headroom rep78 outreg2 using results, word replace sum

summary statistics for variables used in regression

Summary Statistics for Different Groups/Categories

To obtain summary statistics for each category in a categorical variable, we simply add the bysort prefix. Here we, use ‘foreign’ as our categorical variable of choice. This variable assumes the value of 1 when a vehicle is foreign, and 0 when a vehicle is domestic.

bysort foreign: outreg2 using results, word replace sum(log) eqkeep(N mean sd)

The results from the regressions will be reported separately for foreign cars (where variable ‘foreign’ = 1) and domestic cars (where variable ‘foreign’ = 0)

Outputting Frequency Distribution

The option of cross allows us to output the frequency distribution of any variable we specify after outreg2. In case of a categorical variable, it includes each category in the table.

outreg2 foreign rep78 using results, word replace cross

Though outreg2 achieves all of the above well, it may not be the best command to output summary statistics from Stata. Another command, asdoc, is perhaps more appropriate for this purpose, and will be discussed in a future article.