This article covers all the basics of working with dates in Stata. Knowing how to work with dates comes in particularly handy when working with panel data. If you wish to dive deeper into date and time, please refer to Chapter 6 of Data Management Using Stata by Michael N. Mitchell.
To understand dates in Stata, we will use a dataset that has six variables with various formats of dates. The dataset looks something like:
There is also one variable called ‘closing’ which stores stock closing prices. Notice that these closing prices are displayed in a black colour, while all the dates are displayed in a red colour. This is because ‘closing’ is a numeric variable (float to be specific) but the dates are stored as a string. Strings are always displayed as red in Stata.
If, for example, you wanted to declare your data as a panel data, and had to specify your time variable, using any of the date variables above would give the following error:
Because the variable ‘date1’ is a string variable, Stata does not recognize it as a date that can be used to indicate that our data is a panel dataset.
How to Convert String to Date in Stata
In order to convert string variables to date format in Stata, we use the date function. This function can be used when generating a new variable.
gen new_date = date(date1, “DMY”)
There are two parameters of this function. Firstly, we specify the variable that needs to be converted to date format. In this case, it is the ‘date1’ variable. Then, in inverted commas, we specify the order of the date components i.e. the exact format of the date we would like the new variable to take on. Here, we want it to have the “DMY” format, which would first have the day, then the month and finally the year.
Generally, all the components of date/time and how they are referred to in Stata can be found from Stata’s help section.
So what does the new variable, ‘new_date’ look like?
The numeric values generated for each corresponding value of ‘date1’ do not seem to make much sense. Here, the type of the variable ‘new_date’ is understood by Stata to be indicating dates. It can also understand correctly what each of these values mean in terms of the date they represent. In order for us, the users of Stata, to understand what these numbers mean, we will have to change the (visual) format of the variable. Remember that Stata is able to treat these as a date correctly; we just need to change the way they are displayed to understand them. In order to do that, the format command is used:
format new_date %td
After writing the command
format, we write the name of the variable that we would like to change the display format of (‘new_date’ in this case) followed by the format that we would like for the variable to be displayed in.
%td indicates that dates in the variable represent a single day. If they were to represent the month, we would have written
%tm. You can explore the various formats that data can be displayed in Stata through the command
After running the format command, the dates are now displayed in the following manner:
We can also see from the Properties section in Stata’s right panel that the variable type is float, while its format is
Related Article: Filling Gaps in Time series or Panel Data in Stata
You can use the Format field in this panel to change the variable’s format as well. In order to do that, the lock sign on the top left corner needs to be clicked to “unlock” the properties so that they can be edited.
Then click on the three dots that appear at the end of the Format field. The following window will open:
The first section of this dialogue box offers you various types of data that can be selected. In our case, we have daily data, so “Daily” is an appropriate selection. But we want our dates to appear in the same format that we saw in the original ‘date1’ string variable, where dates were written in a format that looked like: dd/mm/yyyy. From the second section, we can see that the seventh sample matches this format. Let’s select that.
Note also that in the field at the very bottom, the code for the format has also changed from %td to %tdDD/NN/CCYY. Clicking ‘OK’ after making your selections will generate the following command:
format %tdDD/NN/CCYY new_date
Browsing the variable also reflects the format changes we just made:
Similarly, we can change the format of the ‘date2’ variable as well. This variable has dates written with the year first, then the month, and lastly, the day, all separated by dashes.
gen new_date2 = date(date2,"YMD") format new_date2 %td
While working with dates in Stata, we can convert the remaining string variables (‘date3’ till ‘date9’) to the date type and alter their format as well.
gen new_date3 = date(date3,"DMY") format new_date2 %td
Convert String to Date Without the Century Part in Stata
The ‘date4’ variable has a slightly different format. It has the day, month and year separated by dashes but the year does not have the century part.
gen new_date4 = date(date4,"DMY")
When we execute the command above, the new variable ‘new_date4’ has missing values. This is because Stata cannot understand what ‘96’ means in terms of years.
To generate a date variable without the century part, we need to change the way we specify the format when using the date function. Let’s generate another variable. This time, we will write ‘20’ before ‘Y’ to complete the four-digit year to make it understandable for Stata.
gen new_date5 = date(date4,"DM20Y") format new_date5 %td
This generates a variable with dates in States shown above. Clearly, the years being shown (2096) are wrong. They are being shown this way because one of the parameters for our date function indicated that the year should be prefixed with ‘20’ (“20Y”).
To get around this issue, we make use of the third parameter that can be added to the date() function. Remember that the date() function can have three parameters:
When working with dates in Stata, remember Parameter Y is defined by Stata as a way to handle two-digit years. If a two-digit year is present, the most recent year which does not go beyond Y is returned by Stata. An example will make this more clear. We are trying to turn a string with two-digit year into date format. Let’s try the following command:
gen new_date6 = date(date4,"DMY",2020) format new_date6 %td
The new variable ‘new_date6’ is created as follows:
Related Article: How to Convert String variable into numeric in Stata
Previously, Stata turned the two-digit years into 2096. Now, using the third parameter, we specified that the year part of our date will not go beyond 2020. Stata then turns the two-digit year into a year from the most recent century, while ensuring that no year is greater than 2020.
What happens when we specify 1996 as our Y parameter? Remember that we also have years greater than 1996 in our data.
gen new_date7 = date(date4,"DMY",1996) format new_date7 %td
The variable ‘new_date7’ is shown in the third column. The years meant to be 1996 are being converted correctly. But because the Y parameter we added is also 1996, Stata will ensure that no year in this variable goes beyond 1996. So, for the two-digit year -20, it turns it into 1920.
You will therefore need to choose your Y parameter with care. This parameter does not always have to be added. It only needs to be specified when we are dealing with two-digit years.
Convert String to Date With the Day of the Week | Dates in Stata
Sometimes, the day of the week is also a part of our date like the ‘date5’ variable in our date.
Generating a new date variable with a simple DMY format will end up creating a variable with missing values.
gen new_date8 = date(date5,"DMY") format new_date8 %td
This is because, with just the DMY format, Stata cannot accommodate the day of the week in the new variable.
We will drop the ‘new_date8’ variable and regenerate it with the correct parameter.
In order for Stata to ignore the day of the week in the string and create a proper date variable, we add a hash sign (#) before DMY. The hash sign tells Stata to ignore anything written before the date, month and year in a string. In the first observation, for example, it will ignore everything written before “13 January 1996”.
gen new_date8 = date(date5,"#DMY") format new_date8 %td
Now, with the hash sign, anything before the date, month and year (DMY) was not considered when generating the new variable.
Similarly, we can also add the hash sign after DMY. This lets Stata know that it needs to ignore anything written after the date, month and year (DMY). Let’s turn the string variable ‘date6’ into a date variable ‘new_date9’.
gen new_date9 = date(date6,"DMY#") format new_date9 %td
With some versions of Stata, if you have to work with dates in Stata you have to add multiple hash signs, depending on how many extra values there are after the year part of the date.
As a side note, it is possible to show time as part of a date variable. Edit the variable’s properties like we did above and choose ‘Clock’ as the type of data. Then, from the sample section, choose the type of format you would like your variable values to look like.
Extracting the Year, Month, or Day From a Date Variable
Let’s drop all these variables we created and keep just the ‘new_date’ and ‘closing’ variables. The former is a date type variable.
keep new_date closing
Now, say that you have a date variable in Stata, but you would like to extract only the year from it. For this, we generate a new variable to store the year value and use the year() function, where the parameter inside the brackets is the date variable that we want to extract the year from.
The new variable generated will only store the year value from the date variable specified.
In exactly the same manner, we can also use the month() function when creating a new variable and extract just the month from the date variable, when dealing with dates in Stata.
gen month = month(new_date)
The month will be stored as a number where 1 indicates January, 2 indicates February and so on.
Finally, the day value from the date can also be extracted using the date() function when creating a new variable.
gen day = day(new_date)
Related Article: How to Calculate Stock Return in Stata
Combining Year, Month and Day to Create One Date Variable in Stata
Sometimes, our data may only contain the year, month and day separately and we may want to combine them to create one date variable.
Stata has several date-time functions that can be explored from Stata’s [FN] manual. For this example, we can use the mdy(M, D, Y) function. This function needs three parameters: M, D and Y which stand for month, day and year respectively. In our data, these parameter values can be obtained from variables ‘month’, ‘day’ and ‘year’ respectively. It is important to follow this sequence since the syntax of this function specifies it.
The command below will allow Stata to combine values from these three variables.
gen date = mdy(month, day, year) format date %td
As can be seen in the last column, the new variable ‘date’ combines the values from the three variables and turns them into one date.
Conditional Statements With Dates in Stata
We can also use date variables when specifying conditions in Stata in cases where we need to do operations on a subset of data based on dates.
Let’s try out the following two
summarize commands with a date as part of an
summarize if date >= 20jan2020 format date %td
Both of these commands will return an error because Stata does not recognise the values provided in either of them as a date.
When adding dates to if conditions, they need to be written with an appropriate function. In this case, that function is mdy() with the same parameters we discussed above, except this time, instead of writing variable names, we will specify the date that needs to be used as the condition.
summarize if date >= mdy(1,1,2020)
This command asks Stata to summarize the dataset only when the date is greater than or equal to the first of January in 2020.
13 observations where the date fell before 1/1/2020 are omitted from the summary calculations.
In case the if condition only needs to be applied to the year, a new variable that extracts the year from the date variable will have to be generated as shown above. It can then be used to perform commands that are conditional on the year of the observation.
summarize if year >= 2020
Extracting Week of the Year From Date in Stata
We can also use a function to extract the week of the year from a given date in Stata. The function, woft(), takes one parameter, which is the date variable. As with other date variables, it will also need to be formatted. Because this example deals with weeks, instead of %td, we will write %tw.
gen weekyear = wofd(date) format weekyear %tw
The new variable generated indicates the year and the week number.
Extracting Month of the Year From Date in Stata
Similarly, the month of the year can also be extracted from a date variable using the mofd() function. This time, to format the data, we will need to specify %tm.
gen monthyear = mofd(date) format monthyear %tm
As with the week of the year, this variable will indicate the year and the month of that year.
Related Article: How to Estimate Weekly, Monthly, Yearly Betas in Stata
Extracting Quarter of the Year From Dates in Stata
Finally, we can also extract the quarter of the year from a date using the qofd() function. The new variable generated will be formatted using %tq.
gen quarteryear = qofd(date) format quarteryear %tq
The new variable generated will indicate the year and the quarter corresponding to the date.
More functions of Dates in Stata
There are a lot more date functions that can be used in Stata. Follow the following menu options:
Data > Create or change data > Create a new variable
Click on ‘Create’.
Select ‘Functions’ and then ‘Date and time’ from the section on the left. The section on the right will list all the date and time functions that you can possibly use for various applications.