Do you need to round off or remove decimal numbers in Stata and do not know how to make it work? No need to worry! Today, on The Data Hall, we are doing exactly that. We are going to work on rounding off decimal numbers through several functions. By the end of this article we would be able to use round, ceil, floor and int functions to round off decimal numbers, and we would also learn to use a little trick for these functions as a treat in the end.
In this specific article, we are focusing on how to eliminate information. First, let’s understand what eliminating the information would mean in Stata. For understanding that, we need to create a data set. While creating the data set, we would set the observations to 50. To do so, use the following command
set obs 50
Along this, we would generate a variable by the name of “random variable” that would hold observations between 1 and 100. By giving this command, you can easily see this variable being generated.
gen random=runiform(0,100)
The data would look something like this.
You will see that you have 5 to 6 digits after decimal point. So that is not a proper way to work with data. So now, if we want to change this appearance, you will have to change the format of the data. For that, you will have to go to the “Properties” window in bottom right side of Stata. It is shown as below
Under the “Properties” you will see the “variables”. Under the “Variables” column, you will see the “Format”. You can see three dots alongside the “Format.” If you click on these three dots, a dialog box will open.
When you select “Numeric”, another box will be visible and let us say we just want to fix the numbers, we will go under the “Numeric Type” box, and we would select “fixed numeric”. But the real magic would be under the “Format Properties” box. There, you would be able to see “Digits right of decimal”.
From here, you can manage how many digits you want after your decimal point. If you select 0, no digits will be visible after your decimal point. The digits are still there, but not visible. You can test this theory by clicking on the cell. The rows and columns would show rounded off whole numbers, but the individual cell will show the complete data with decimals. Here, we have just changed the format or the appearance of the data. This would be the “Easy for Reading” type of solution. The data after selecting the 0 decimal points will be shown as following
As we are focusing on “Eliminating information” rather than changing the appearance of the data, we would change our approach. We want to manipulate this data and by changing and manipulating this data, we are going to limit it to 2 numbers after the decimal point.
For our goal, the first function we are going to work with would be “the round function”. What this round function would do is that it would round the data off to the nearest whole number. So, let us say we generate a new variable called “vround” by using the following command,
generate vround = round(random)
By using the round function, you can observe that in this case, not just the appearance of the data has changed, but rather the data is accurately rounded off to the nearest whole number. For the sake of example, 34.88717 would change into 35 and 26.68857 would round off to 27.
Let us look into a similar command that would generate the same results, or round off variables to the nearest number 0 , which is 1. So if you put the command for a new variable
gen vround1=round(random,1)
it would give you the same results as the first command.
The new variable also instills that you do not need any digit after the decimal point, and you need whole numbers in your data. Thus, it generates the same data in the same fashion as the previous variable.
Now, let us assume that we need digits after a decimal point, but we want to limit it to just one digit after the decimal point. For that we would use the same command but with a small difference. The command would be
gen vround1dec=round(random, 0.1)
Here, the variable “vround1dec” would have data up to 1 decimal point. So, 68.80603 would change into 68.8, 97.94578 would change into 97.9 and 67.01937 would change into 67.
Now, if you want to have some fun with the decimals, you can round them off to the nearest multiples of 0.5. For that, your command should look something like this
gen vround5dec=round(random,0.5)
Your data in this new variable would be in multiples of 0.5. But if you want your data for up to two decimal points, your command would be
gen vround2dec= round(random, 0.11)
This would make your 68.80603 into 68.86 and so on.
You can round off your odd numbers into even numbers too! If you want to round your data off to the nearest multiples of 2, it will round off all your data into even numbers. Your command for that goal should be something like this,
gen vround2=round (random,2)
Now you will be able to see that 68.80603 remains as 68, but 97.94578 would become 98 and 67.01937 would become 68. Thus, your data would be in even whole numbers.
If you want your data to be rounded off to the nearest multiple of 5, you just need to change your command a little bit. If your command looks something like this,
gen vround5=round (random, 5)
Your data will comply. 68.80603 will change into 70, 97.94578 would change into 100 and 67.01937 would change into 65.
Rounding off to Highest and Lowest numbers
For the sake of another example, let us say that you want to round your data off to the nearest higher number. You need to put in the following command for it
gen vceil=ceil (random)
It would be irrespective of whether the decimal point is less than or greater than 0.5, rather, it would be rounding it off to the nearest ceiling number. So, 34.88717 would be rounded off to 35 etc.
If we have a ceiling function, we definitely need a flooring function too! Stata does provide a flooring function for these exact needs. The flooring function can be used with the command
gen vfloor=floor(random)
With this flooring function, you can easily round off your data to the nearest lower number. This function would floor the 68.80603 to 68, 97.94578 would be changed to 97 and 67.01937 would be changed in to 67, and so on and so forth.
But what if you want to remove your digits after the decimal points? For this goal, we have another function called “integer function”. This function would not round off your data, but would simply remove the digits that are appearing after the decimal point. When you go through with this command, i.e.
gen vint= int(random)
you would see that the command would remove all the digits that appear after the decimal point. Thus, 68.80603 would simply become 68, 97.94578 would be reduced to 97 and 67.01937 would be limited to 67, and so on.
Rounding Off through Menu
Although all of this learning is fun, we can have a shortcut for these functions too! You can easily access these functions by clicking on “Data”.
It would lead you to “create and change data”. From there, you can “change contents of variable”. This action would open a dialog box which can lead you to select the variable you want to change.
For the sake of example, we are choosing our “random” variable and adding our New Contents (value or expression) by clicking on “create.”. By clicking on “create” you will open the Expression Builder that will provide you with the categories of Functions through the “Categories: Functions” tab.
If you click on the “Mathematical” functions, it will open multiple mathematical functions for you to choose from, including, but not limited to the integer function, floor function, ceiling function and the round function.
You need to simply click on these functions and input the variable that you want to change.
The changes would apply accordingly. Although this short way is more appealing, but the manual way of applying functions and commands is fun. By now, you know the drill. You can round off or remove decimal numbers in Stata through various methods.