...

Data Frames in Stata | Store Multiple Datasets in Stata Memory

Data frames allow users to hold multiple datasets in memory at the same time, and work on all of them. While this concept has been present in other data analysis software like R and Python for a while, it was introduced to Stata in Stata 16. With data frames in Stata, you don’t have to open different Stata windows or save/close one dataset to open another. Multiple datasets can be worked in one instance of Stata.

Download Example File

Using Data Frames Through Stata’s Menu

Data frames can be accessed from Stata’s menu in the following manner:

Data > Frames Manager

The following Frames Manager window will open

Data Frames in Stata
Stata frames used in Stata 17

Right now, the ‘Current frame’ option is set to ‘default’ because we don’t have multiple datasets open. On the right side, there are buttons which can be used to create a new data frame (‘Create…’), switch to another data frame (‘Switch to’), rename a data frame (‘Rename…’), copy a data frame (‘Copy…’), get a subset of the data (‘Put subset…’), and drop a data frame (‘Drop’). You can also reset a data frame (‘Reset’).

Let’s load our first dataset which will be the built-in 1978 automobile dataset.

sysuse auto,clear
Related Article: Preserve and Restore Data in Stata

Creating Data Frames In Stata

frame create

Now if you wanted to work with another dataset, you might open an entirely new window of Stata and load it there. Or you might clear or save the existing dataset and then load a new one.

Instead of resorting to these or other inconvenient ways of working with multiple datasets, you can simply create a data frame.

frame create abc

To check how many existing frames are present in Stata, the following command can be used.

frame dir

This will list out all the frames that Stata has currently.

thd3

As can be seen, there are two frames in our Stata right now. The first one is called ‘abc’ which we just created, and the second one is just the default frame that was there when we opened Stata and loaded the automobile dataset. The default frame has 74 observations and 12 variables, whereas the new ‘abc’ frame has no observations or variables.

frames create

When generating a new data frame in the Stata, you can also use the command frames i.e. the plural of frame. For example:

frames create xyz

This command will also create a new frame called ‘xyz’.

mkf

You can shorten the command that you have to type by using the command mkf (short for ‘make frame’) followed by the name of the frame:

mkf newframe

We can check whether the new frames have created through the following command again:

frame dir
thd4
Related Article: How to Reshape Data from Long to Wide form in Stata

Creating a New Variable When Creating a New Data Frame in Stata

You can also generate a new variable in the new data frame that you are creating. Simply follow the frame name with the new variable name.

frame create newframe2 newvariable
frame dir
thd5

As the frame directory shows, this new frame is created with one variable in it. Because we made some changes in the data frame called ‘newframe2’ (i.e. we generated a new variable in it) but did not save it, Stata puts an asterisk beside its name. The asterisk identifies all data frames that have unsaved data.

Creating a New Variable With Data Type When Creating a New Frame in Stata

We can also specify the data type of the new variable that we generate in data frames in Stata.

frame create results str20(event_window rating_category) double(obs car tvalue_ test)

Here, we have generated a frame called ‘results’. This time, the variable names are categorised based on their respective types and written inside brackets. Outside the bracket, we write the data type, and inside the bracket, we list all the variables that are to be generated with that particular type.

thd6

The directory list confirms that a data frame called ‘results’ was created with zero observations and five variables.

How to Check the Current Data Frame in Stata

If you want to check which data frame is currently active in Stata, just type frame and execute the command. Stata will output the name of the data frame that is active at that time.

Alternatively, you can also run the pwf command (pwf is short for “print working frame”).

Both the frame and pwf command will produce the exact same output, i.e. the name of the current frame.

thd7

In our case, the current frame is the one called ‘default’ which had the automobile dataset loaded. Any command that you run will be applied only to the particular dataset in the frame that is active.

How to Change From One Frame to Another in Stata

Now, we would like to move on from the auto dataset and work on the dataset that we created in the data frame called ‘results’ that also had five variables. To switch from one frame to another, we use the frame change command followed by the name of the data frame that we want to switch to.

frame change results

We have now switched to the ‘results’ data frame. This can also be confirmed through the frame command and by checking the variable list which will obviously also have changed.

thd8
Stata frames

Any command that we run now will be executed for this dataset only.

gen price = .

This will create a new variable called price in this dataset stored in the data frame called ‘results’ – not in any of the other five data frames.

Alternatively, you can also use the shorter command cwf to change data frames. cwf is short for “change working frame”.

cwf default cwf results

Rename a Data Frame in Stata

If you would like to rename an existing data frame, the command frame rename is used followed by the data frame’s current name and then the new name that we would like it to have. The data frame called ‘default’ is not very indicative of the kind of data it has. Thus we can rename it to ‘auto_data’ so it is easy to identify that it holds the automobile dataset.

frame rename default auto_data

Checking the frame directory now reflects this change.

thd10

Drop/Delete a Data Frame in Stata

If you would like to remove a data frame, you can drop it using the frame drop command. It will be followed by the name of the frame that is to be deleted.

frame drop xyz frame dir
Data frames using frame dir

The directory of frame no longer has the ‘xyz’ frame in the list because it has been dropped.

Perform Commands on Different Data Frames Without Changing Data Frames in Stata

If you would like to execute a command on a data frame that is not the current and active one, the frame frame_name prefix can be used. This means that we can use this prefix before a colon sign. Then we write any command that needs to be executed in another frame, without actually having to change or switch data frames.

Take a look at the example below.

frame frame abc: sysuse auto.dta,clear frame abc: summ price mpg frame
Perform Commands on Different Data Frames In Stata

First, we used the frame command to check what the current active data frame was and confirmed that it was ‘results’. Then, using the frame abc prefix to load the automobile dataset in the frame called ‘abc’, we used the same prefix and summarized two variables from the dataset present in the ‘abc’ frame. We then use the frame command again to illustrate that the current data frame remained unchanged, and that we ran commands in the ‘abc’ frame without actually having to switch the data frames.

Note that if you try and load a new dataset into a data frame that already has some existing data that you have made any unsaved changes to, it will produce an error unless you use the clear option. This option first clears any existing data in the data frame, and then loads a new one.

thd13

In case there are several commands that need to be performed in another, non-active dataframe, the commands can be written in the form of a block inside curly brackets, with the frame name specified before the brackets open. The syntax will look something like:

frame frame_name {   list of commands to be applied only to framename } For example: frame abc {             gen price2 = 1000             drop if rep78>3 describe  }

           

These three commands will only be executed on the data inside the frame ‘abc’.

thd14

Note that now, when the dataset is described, it has a new variable called ‘price2’. It has 40 observations instead of 74 (since we dropped the 32 where ‘rep78’ was greater than 3).

Copying a Dataset from One Data Frame to Another in Stata

If you would like to copy a data set from one data frame to another, the frame copy command is used. This command is followed by the name of the original frame, and then the name of the frame to which the dataset needs to be copied. In doing so, you can also create a completely new data frame. The command will automatically create a new frame if the new name you specify does not already exist.

frame copy results new_results frame dir
thd15

The list of frames shows that a new frame called ‘new_results’ is created. This has exactly the same dataset as the frame called ‘results’.

If you were to execute this exact command again, Stata would give an error. The error occurs because the ‘new_results’ frame is already defined. Copying a dataset into this frame would require you to specify the replace option so Stata knows that the old dataset is to be replaced.

frame copy results new_results, replace
Related Article: How to merge data in Stata | Combining datasets in Stata

Copying a Subset of Variables from One Data Frame to Another in Stata

frame copy works when you want to copy the entire dataset from one frame to another. If only a subset of variables is to be copied from one frame into another, the frame put command will be used. The general syntax is:

frame put varlist, into(newframename)

To illustrate this, let’s first generate two new variables in the main dataset.

gen weight2= weight/length gen length2= length/weight

Now, we would like to copy data only for these two variables and the ‘price’ variable into a new data frame called ‘small_data’.

frame put price length2 weight2, into(small_data)
thd15-1

We can see in the frame directory list that a new data frame called ‘small_data’ is created with 74 observations but only 3 variables.

The frame name that you specify in the brackets in the into() should not already be existing.

frame put length2 weight2, into(abc)

This command would fail to run and not create any new subset of data because frame ‘abc’ already exists.

thd16

We can see in the frame directory list that a new data frame called ‘small_data’ is created with 74 observations but only 3 variables.

The frame name that you specify in the brackets in the into() should not already be existing.

frame put length2 weight2, into(abc)

This command would fail to run and not create any new subset of data because frame ‘abc’ already exists.

thd17-1

Copying a Subset of Observations from One Data Frame to Another in Stata

You can also copy a subset of the observations from one data frame to another, new data frame using the if condition. The general syntax is:

 frame put if , into(newframename)

For example:

frame put price length2 weight2 if price>5000, into(small_data2)

Here, we want to copy the variables ‘price’, ‘length2’, and ‘weight2’ in another, new data frame called ‘small_data2’ but only for observations where the price is greater than 5000.

thd20

The new data frame has 37 observations and 3 variables as expected.

How to Clear/Delete All Data Frames in Stata

We can also delete and clear all the data frames that currently reside in Stata using the frame reset command.

frames reset
thd21

The eight data frames we had in Stata’s will get removed and we will only be left with one empty, default data frame.

The exact functionality can also be achieved using the clear frames.

clear frames // same as frames reset

clear all would clear everything from Stata, not just data frames.

clear all

Add/Post Results to a Data Frame in Stata

Previously we saw how a subset of observations and variables could be copied from one data frame into another. Now we want to add new variables and observations to a new data frame. Let’s say you want do some statistical analysis on a data set, and post its results to another data frame.

In the following command, a new frame called ‘ttest’ is created with two variables: ‘tvalue’ and ‘pvalue’.

frame create ttest tvalue pvalue

Let’s now import a dataset into our existing data frame (‘default’) and perform some a t-test on some variables.

sysuse auto.dta ttest price=0

Remember that after every general command, Stata stores the results in the form of macros which can be checked using the return list command.

frame post ttest (r(t)) (r(p)) return list
thd22

We can also see that the value of the t-statistic is stored in ‘r(t)’, while the p-value is stored in ‘r(p)’. We can use these macros to send the respective values to the two variables we created in the ‘ttest’ data frame using the frame post command followed by the name of the frame we want the values to be sent to, and then the values or macros that hold the value that needs to be sent. These macros will be written inside brackets while matching the order of the variables in the other data frame. That is, our first variable in the ‘ttest’ data frame is ‘tvalue’, and the second one is the ‘pvalue’. The first value/macro in the brackets of the post frame command should also be written in the same order.

frame post ttest (r(t)) (r(p))

Let’s also do the same for the ‘mpg’ variable in the default data frame and send the t-stat and p-value to the ‘ttest’ data frame.

ttest mpg=0 frame post ttest (r(t)) (r(p))

We can change the current data frame to ‘ttest’ and browse it to check what the data over there looks like now:

frame change ttest browse
thd23

The results from both the ttests we did have been posted correctly to this data frame.

Linking Data Across Different Data Frames in Stata

We now want to create links between data in two different frames. To illustrate this, two datasets will be used. One of these will have equity data for companies, while the other would have data on assets for companies. There will be some common variables in both datasets.

Clear any previous data frames that Stata might have:

frame reset

The assets dataset has the following four variables:

thd25

The equity dataset has the following four variables.

thd26

Variables ‘symbol’ and ‘year’ are common across both datasets.

Let’s start by loading the equity dataset. Since it is the first dataset to  become part of Stata’s memory, this data frame will already be named ‘default’ by Stata. We can rename it to something more helpful and meaningful like ‘equity’.

use "equity.dta", clear frame rename default equity

Now, we can create another data frame called ‘total_assets’ and load the data sets on assets into it.

frame create total_assets frame total_assets: use "total assets.dta", clear frame dir
thd27
Related Article: Filling Gaps in Time series or Panel Data in Stata

The frames directory shows that there are two data frames with the names we specified for them, each with 4 variables and 92 observations.

Now, we want to link the two data frames because both of them have related data including some common variables.

Before linking the two, let’s sort the data by the ‘year’ variable first.

sort year

If you are familiar with the merge command in Stata, the frame link command, frlink, also has a similar syntax and concept. In this example, linking the two data frames would be done through the following command:

frlink 1:1 symbol year, frame(total_assets)

The 1:1 indicates that one observation from the current data frame will be linked to one observation in the other. (This could also have been m:1 or 1:m to indicate many-to-one or one-to-many). This is followed by variable names ‘symbol’ and ‘year’ which are common amongst both data frames and will therefore be used to create a link between both of them. Finally, the option frame(total_assets) tells Stata which data frame needs to be referred to when making the link with the current one.

How is this link created and shown?

In the current data frame ‘equity’, a new variable called ‘total_assets’ is created. This refers to the frame that we have created the link with. The data for this variable is the observation number that an observation in the current data frame is linked to in the second data frame. For example, for the symbol AASM in the year 2001, a corresponding observation (for the same symbol and year) is found in the 37th row of the ‘total_assets’ data frame.

 Image Name

Once this link has been created, we can now “get” data from data frame to another in a similar way that we merge datasets.

For example, in order for the ‘equity’ data frame to have data on sales (which is present in the ‘total_assets’ data frame), the frget command can be used to “get” the sales data from ‘total_assets’ to ‘equity’.

frget sales, from(total_assets)
thd29

The frget command will only work if a link between the two data frames is already created. Let’s browse the data.

thd30

A new variable for ‘sales’, taken from the ‘total_assets’ frame is now created in the ‘equity’ data frame.

In case you were to get data from another frame like this but wanted it to be populated into a variable with a new name, just specify the new variable name in the frget command.

frget new_sales = sales, from(total_assets)

The command above will create a new variable called ‘new_sales’ in the current data frame (‘equity’) and get data of the ‘sales’ variable in the ‘total_assets’ frame to populate into it.

thd31

As can be observed, the same data is populated into a new variable which we specified the name of this time

An interesting thing to note is that, if something were to be changed in the second data frame (‘total_assets’ in this case), then the link we created would be nullified. For example, lets drop observation number 37 in the ‘total_assets’ frame.

frame total_assets: drop in 37

When we browse the current ‘equity’ data frame, the observation for symbol AASM is still shown as corresponding to/linked to observation number 37 in the ‘total_assets’ frame even though the 37th observation in ‘total_assets’ would now refer to a different company and year. But what happens when we try to “get” more data?

frget new_sales2 = sales, from(total_assets)
thd32

We can no longer get the sales data, or any other data for that matter from the ‘total_assets’ frame. The link between the two frames needs to be rebuilt and updated. To do this, we can simply use the frlink rebuild command followed by the frame name with which the link is supposed to be updated. 

frlink rebuild total_assets
thd33

The link will be rebuilt and updated to appropriately take into account any changes that were made.

We can now “get” data from ‘total_assets’ again.

frget new_sales2 = sales, from(total_assets)
thd34

There is another way that we can use to get data from other frames. Before we get into it, let’s describe the ‘total_assets’ frame to check what other variables it has.

frame total_assets: describe
thd35

This frame has a variable called ‘total_assets’ which we can add into our current ‘equity’ data. It might get a little confusing since three elements in this example are called ‘total_assets’: the variable that we are getting, the frame that we are getting it from, and the linking variable in the current ‘equity’ frame that refers to corresponding observations in the second frame.

We want to get data from the ’total_assets’ variable present in the ’total_assets’ frame and populate it in a variable called ‘ta’.

generate ta = frval(total_assets, total_assets)

We can use the generate command to generate this ‘ta’ variable and use a function called frval to get the data we need. This is done using two parameters in the function.

The first parameter refers to the linking variable that was initially created in the current frame when we linked it to the  ’total_assets’ frame. This variable was also called ’total_assets’.

The second parameter is the name of the variable in the second frame that we want to get. In this case, we want to get the variable called ’total_assets’.

thd36

The last column shows this new variable ‘ta’ with data from the variable called ’total_assets’ from the frame ’total_assets’. Note that no data was added for the company AASM in year 2001 because we had dropped the observation for that earlier. 

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x
Tweet
Share
Share
Pin