This short article is a guide on reporting summary statistics from Stata to Ms Word, Excel or LaTeX using the
outreg2 command in Stata.
For this guide, we start by using Stata’s inbuilt 1978 Automobile dataset and describing it using:
All Summary Statistics for All Variables
To report summary statistics for all the variables in our dataset, we use a familiar
outreg2 syntax with a new option of
sum(log). This option indicates to Stata that a summary table is being output.
outreg2 using results, word replace sum(log)
We can also replace variable names with variable labels, a step that we explain in one of our <link>introductory
summarize command, which is typically used to return summary statistics, Stata allows an option of
detail.This option outputs a table with additional statistics. We can report these extra statistics through the
outreg2 command by typing
detail in the parenthesis of the
sum() option used above:
outreg2 using results, word replace sum(detail)
Some Summary Statistics for All Variables
If we only wish to report, say the number of observations, mean and standard deviation of our variables (and not the minimum and maximum that are also reported by default), we add the
keep() option to specify which variables we want to retain:
outreg2 using results, word replace sum(log) eqkeep(N mean sd)
Some Statistics for Some Variables
As seen above, we use the
keep() option to retain variables that we specify in the parenthesis. However, we cannot specify both the
keep() options at the same time.
To obtain a summary table with a few statistics for a few variables, you can use
eqkeep() to retain statistics, and
drop() to omit variables, or vice versa.
The following command will return an error since both
keep() appear simultaneously:
outreg2 using results, word replace sum(log) eqkeep(N mean sd) keep(price mpg headroom trunk rep78)
Summary Statistics for Observations Used In a Regression
Because of Stata’s casewise/listwise deletion, it omits observations with missing values from any regression analysis done. Therefore, the number of observations used in regressions is often lower for each variable than the number of observations reported for them in the summary statistics.
For example, if we summarise the data, we see that the variables ‘price’, ‘mpg’, and ‘headroom’ have 74 observations. ‘rep78’ has 69 observations. When we regress price on the other three variables, we note that the regression used 69 observations even though there were variables with 74 observations. This is because Stata omits any observation where rep78 is missing. We therefore find estimates from only 69 observations reported in the regression results.
To obtain summary statistics for variables and observations used in a regression only we first run the regression, then use the
outreg2 command right after it with an option of
regress price mpg headroom rep78 outreg2 using results, word replace sum
Summary Statistics for Different Groups/Categories
To obtain summary statistics for each category in a categorical variable, we simply add the bysort prefix. Here we, use ‘foreign’ as our categorical variable of choice. This variable assumes the value of 1 when a vehicle is foreign, and 0 when a vehicle is domestic.
bysort foreign: outreg2 using results, word replace sum(log) eqkeep(N mean sd)
The results from the regressions will be reported separately for foreign cars (where variable ‘foreign’ = 1) and domestic cars (where variable ‘foreign’ = 0)
Outputting Frequency Distribution
The option of
cross allows us to output the frequency distribution of any variable we specify after
outreg2. In case of a categorical variable, it includes each category in the table.
outreg2 foreign rep78 using results, word replace cross
outreg2 achieves all of the above well, it may not be the best command to output summary statistics from Stata. Another command,
asdoc, is perhaps more appropriate for this purpose, and will be discussed in a future article.