How to count the number of observations in STATA

In STATA, the count command not only calculates the total number of observations in a dataset but also allows for the application of specific conditions, enabling users to count observations that meet particular criteria or satisfy certain conditions within the dataset. This flexibility makes it a powerful tool for data exploration and analysis. This article provides a comprehensive guide to using the count command for data analysis in STATA, covering various conditions on the count command, and addressing missing values while presenting all the commands and their results at the end of this article.

Download Example File

Using the `Count` command in Stata

To use the count command, we will begin by importing an example dataset, 1978 Automobile Data, and then employ the command on the data. Let’s import the dataset using the following command:

sysuse auto.dta,clear

The imported Automobiles data is hereunder:

Now, starting with simplicity, let’s count the number of observations in the dataset. The command in the STATA command window will be:

count

The Results window will display not more than the total number of observations, which is 74 in this case. However, we can make the count command more valuable by applying specific conditions. For instance, to count the number of cars with prices above 5000, we can use:

count if price>5000

Now, the Results window will show the number of cars with prices above 5000; 37 cars. Additionally, we can use the double equal sign ‘==’ to count the cars that have been repaired exactly four times from the repair variable named rep78:

count if rep78==4

The window will show that 18 cars have undergone four repairs.

Furthermore, we can perform counting based on certain categories. Suppose we want to count the number of cars priced at 5000 or more, categorized by whether they are produced Domestically or Foreign from a variable which is named Foreign:

bys foreign: count if price>=5000

The window will display 14 cars that are ‘foreign produced’ with prices at or above 5000, and 23 cars ‘domestic produced’ in the same price range.

STATA allows us to specify conditions and limit counting to a certain range of rows. For example, we can count observations for cars with prices above 5000, but only execute this command within rows 1 to 30:

count if price>5000 in 1/30

This command will show that there are 15 cars with prices exceeding 5000 in the specified range.

We can also combine multiple conditions for more comprehensive insights. If we want to know about cars with prices above 5000 and a repair record of 3:

count if price>5000 & rep78==3

Stata will efficiently show the count of cars satisfying both conditions; 13.

In datasets, it’s common to encounter missing values. To check the number of missing observations in the rep78 variable, we can use:

summarize

From the summary table, we can observe that all observations are available in the Automobile data except the repair variable, where 5 values are missing. Now we will directly see how many observations are missing in the repair variable. by just executing the command:

count if rep78==.

The result will confirm that there are 5 missing observations in the rep78 variable.

Alternatively, the same purpose can be achieved with:

count if missing(rep78)

Both commands will yield the same result of 5 missing values in the rep78 variable.

Throughout these commands, we’ve observed that each one provides a scalar value. We can access this value using:

return list

The result will show the scalar list, with the number of missing values in rep78 represented by r(N) = 5. We can directly access this value with:

display r(N)

Which will display the value stored in that scalar, which is 5.

Here the problem is, how to store and perform functions on these scalars? refer to our articles on the subject: Scalars and Macros.

Thank you for reading this article. Visit our website for more insightful articles on Quant tutorials.