While a data set is generated or imported, it contains many variables and have their specific name that are set by the creator of that data set and may vary depending on type of research which one is going to be performed, such variables are called research or data set variables. These variables are used during giving commands to perform a specific action using Stata.
But there are some variables already set in the system that you can use to perform different actions on data set. These are System variables of Stata.
System Variables (often denoted as _variables in Stata) are created as well as updated by system automatically and can be used in commands.
_n and _N are also system variables. There are other system variables that you can explore using following command in Stata:
Following window will appear which shows different system variable and their use. When you will scroll down the window you can see all system variables, their definitions and use with example.
You can also open this window by using help icon in Stata Toolbar.
_n and _N are the system variables of Stata where _n represent number of current observation, or you can say row number (specific row on which we need to perform an action). And _N represents total number of observations or total number of rows.
_n is Row Number
_N is No. of Rows.
In this article, we are going to learn different ways of using these variables in Stata. To understand this we are using an example of following dataset:
Related Article: Local and Global Macros in Stata
Using own Dataset for testing in Stata
Import the dataset into Stata using following command.
Please make sure if there is any space in file name path, then write the name in quotation marks.
This command will open the window of Data set I saved earlier to perform my test on.
Generate command to test _n variable in Stata
As discussed earlier _n is the row number, or you can say a specific number of row we are going to work on.
Now we will see the row number of each row. This can be seen if we generate a new variable Row Number and set it equal to _n to check what row number of all the rows in data set. It should be equal to the serial number, as that is representing the row number.
By using generate command:
gen Row_number = _n
It should be equal to the serial number, as that is representing the row number.
As we can see, it generated a new variable called Row_Number and mentioned one by one row number of each row which we can also see is serial number. So, we can perform any task on any specific row using _n variable in Stata.
Generate command to test _N variable in Stata
_N as described previously is total number of observation. We can also see this by generating a new variable No_of_Rows and then that variable will showing total number of observation in data set.
As we can see it should be 15 so let’s test it using command
gen No_of_Rows =_N
As can be seen, total number of observation is 15 and result is also showing 15. This variable is helpful especially when we need to take mean in Stata, we can directly use this variable instead of counting the number and then calculating mean.
Understanding _n variable in group by using bys command in Stata
We can also use this variable in group, means we can use this variable by representation of another variable in data set. Like if we like to have row number with respect to Car_No. we can use by command to set this.
Let’s generate another variable which will tell row number according to Car no. Stata Command:
bys Car_No: gen C_Row_Number = _n
By using command above, we have results as:
There are three values for each of car. So, Row number with respect to Car No is also three values for each car. This can be used if we want to work on data of a specific car for specific month in Stata.
Understanding _N variable in group by using bys command in Stata
_N can also be used as group function of any other variable. As for _n, Let’s take this also as a variable with respect to Car No.
You can use the command in Stata:
bys Car_No: gen C_No_of_Row = _N
Total no. of rows or total no. of observations for each of car is 3. We can validate our answer by using above command in command window of Stata.
If we have to take average or have to work independently on Car sales data one by one and comparing them with whole results we can use this by group representation of variables in Stata.
Replacing any missing data using replace command in Stata:
If there is any empty cell in Data set, we can replace value of that cell from another cell using _n variable. First conditional if will check for missing cell then replace its data with any other chosen cell.
In my data set, I set one missing cell data:
In Stata Command Window by using if and replace command as:
Replace Sales = Sales[_n-2] if missing(Sales)
_n-2 represents that the missing value will be now equal to value of row number which is _n-2 means as row number missing is 9 then the number replacing would be the number in row 9-2 = 7th row which is 25.
As in results, row 9 Sales are equal to row 7 Sales, which is 25.