Use of Compare command in Stata to compare variables

Oftentimes, we need to compare variables with each other in Stata. Stata’s compare command in Stata allows simple comparisons between variables to be made, which will be discussed further in this article.

The general syntax for this command is:

compare variable1 variable2 [if] [in]

After writing the command, the two variables to be compared are written followed by an if or in condition as needed.

The comparison between the two variables can also be done using the menu option:

Data > Data utilities > Compare to variables

stata compare two variables
Related Article: How to Compare two Data Sets in Stata?

The dialogue box that opens can then be used to tweak the specifics of your comparison as needed.

To illustrate how the compare command works in Stata, we will use two variables called ‘price’ and ‘price2’. Some observations in ‘price2’ are different from ‘price’ which will help us in comparing the two.

Let’s do a comparison between ‘price’ and ‘price2’ using the compare command in Stata.

compare price price2

This output table tells us several things. The first column called ‘count’ states that there are 2 observations where ‘price’ has a smaller value than ‘price2’. In 64 observations, both variables are equal. In one observation, ‘price’ is greater than ‘price2’.

67 observations have both ‘price’ and ‘price2’ jointly defined. This means that there are 67 observations where both of these variables have non-missing values. In 2 observations, only ‘price’ has a missing value, and 4 observations where only ‘price2’ has a missing value. In one observation, both ‘price’ and ‘price2’ are jointly missing, i.e. both variables have a missing value in one observation.

The second section of the table with a heading called ‘Difference’. It shows the difference between the values of ‘price’ and ‘price2’ for the two cases where the two variables are not equal. The two cases being one where ‘price’ is greater than ‘price2’, and one where ‘price’ is less than ‘price2.

In the 2 observations where ‘price’ is smaller than ‘price2’, the difference between the minimum value is -14918. The difference between their average values is -11773, and the difference between their maximum values is -8628.

For the one observation where ‘price’ is greater than ‘price2’, the difference between the minimum, maximum and average values is 2189. This is because there is only one value for each variable to compare.

Where the two variables are jointly defined, i.e. both do not have a missing value, the difference between the minimum and maximum values is -14918 and 2189 respectively. The difference between their averages is -318.7612.

To break these calculations down and understand how Stata is calculating them, let’s summarize both variables when ‘price’ is greater than ‘price2’.

summ price if price>price2
summ price2 if price>price2
stata compare two categorical variables

If we take the difference between the average (and minimum and maximum) values of both of these variables, we obtain 2189. This was reported in the table generated by the compare command.

We can also make the compare command produce an output in Stata for a more specific set of observations using an if condition.

In the following command, ‘price’ and ‘price2’ are compared only for observations where ‘price’ is greater than 10000.

Related Article: How to Create and use Business Calender in Stata
compare price price2 if price>10000
Compare command

Stata can also compare two variables by dividing them over categories (that are present in another categorical variable).

bysort foreign: compare price price2
Stata Compare

As can be seen, two sets of comparisons are made when we use the bysort prefix with the categorical variable ‘foreign’ which has two categories: domestic and foreign. If we had used a variable with three categories, the compare command would have made three sets of comparisons. 

In this example, for all 22 observations in the foreign category, ‘price’ and ‘price2’ are always equal so no differences are reported.

Let’s generate another variable called ‘make2’ which is equal to ‘make’. To help with our comparison, we will change the first four observations in the new variable to be missing. Remember, ‘make’ (and obviously ‘make2’) is a string variable.

gen make2 = make
replace make2 ="" in 1/4
compare make make2
Compare command in Stata to compare variables

The two variables are of the string type as opposed to a numeric type. We only get comparisons about the number of observations that are the same, non-missing, and missing. Out of 74 observations, ‘make’ and ‘make2’ are the same and non-missing (jointly defined) for 70. In four observations, ‘make2’ has missing values.

Notify of
Inline Feedbacks
View all comments
Would love your thoughts, please comment.x