How to Append (Stack) Data in Stata

Data appending in Stata involves stacking one dataset on top of another to create a single dataset. We will cover different scenarios and options for this process.

Download Code and Data

Direct Data Appending:

When you have separate data files and want to combine them, you can use the “append” command. This straightforward approach involves specifying the names of the files you want to append. Let us say, for the sake of explaining, we have a number of files with stock prices of different firms as such,

All these files have the same variables i.e, Symbol, opening, high, low, closing and date. By using the append command, you can stack the files beneath one another, in the same columns, under the same variables. The command,

append using firm1 firm2 firm3

will append data of three firms, i.e., Firm 1, Firm 2 and Firm 3 into one datasheet. This data can be viewed in the Data Editor.

The “tabulate” command in Stata creates a frequency table that can show the counts of unique values or observations in a categorical variable. In case, you want to tabulate the variable “symbol” form your appended data of each firm, here is how you can do it:

tabulate symbol

This command will display the following result in your Stata output window,

Appending with a Master File:

In some cases, you may have one dataset already open in Stata, which we will call the “master file.” You can append additional datasets (called using data) to it without closing the master file. Let us treat the firm1 file as a master file. Open the file in Stata through the command,

use firm1

You can append the data of Firm 2 and Firm 3 onto Firm 1 with this command,

append using firm2 firm3

A slight difference between ‘master file’ and ‘using file’ is necessary to understand. The ‘master file’ is the one on which data is appended and the ‘using file’ is the one from where data gets appended. This method simplifies the process by not needing to open and close files repeatedly.

Appending Multiple Files with Wildcards:

If you have multiple files with similar names and want to append them all, you can use wildcards to specify the files. You need to generate a command that focuses on the working directory and the common names of the files you need to append. For example, in this command

local files : dir "E:\" files "firm*.dta"

the directory is specified to be “Local File E” while the “firm*.dta” specifies a few things; the asterisk is a wildcard that specifies that the software has to focus on all the files in this specific working directory with the word ‘firm’ in their name. This ensures that all those files with different firm stocks are ready to be appended. The above command will only generate a list of files that meet certain conditions and not actually append these files. Next we use foreach loop to combine all the files in the list. The next command would be,

foreach file in `files' { append using `file' }

will append all the files in a single dataset. Once you tabulate as done previously through

tabulate symbol

you will see all the appended files. In our specific case, there are 6 files with Firm stocks,

This method streamlines appending multiple files with a consistent naming pattern.

Using the “Generate” Option:

To keep track of the source of appended data, you can use the “generate” option. This creates a new variable indicating which dataset the data comes from. It’s a helpful way to maintain data source information. A simple addition to the previous commands can help with the source maintenance:

append using firm1 firm2 firm3,generate(source)

This command will generate a new variable by the name of “source” in the appended data set with the original source as such,

Labeling the source variable is also possible through Stata Menu. You need to click on “Data” and open “Variable Manager”.

From here you need to manage the value label by clicking on it, as shown below,

This will open another pop up box, with the sources’ labels. Here you can modify the variable’s labels as per your need. For example, we are changing the Label from ‘Appended dataset 1’ to ‘Appended from Form1’ in through this simple visual guide. After modifying, click OK to save changes.

Your appended data set will show the modified label as such,

Handling Different Variable Names:

When appending datasets with varying variable names, you may face challenges. Stata appends data based on variable names, so if they differ between datasets, new variables will be created. Here is an example of what it will look like if your appended data has different variable names:

Renaming variables in the source files can resolve this issue.

Selectively Appending Variables:

If you only want to append specific variables, you can use the “keep” option. A command for this specific example would be,

append using firm1 firm2, keep(symbol opening high)

This allows you to choose which variables to include in the final dataset. In our specific command, we included the symbol, opening, and high variables. Our dataset will look like this,

It is a useful option for working with large datasets where you need to focus on specific variables.

Let us work on another scenario and assume that you already had the master file opened by the

use firm1

command. Now, the next command

append using firm2 firm3,keep(symbol opening high)

will generate a dataset where the Firm 1 will have data of all the variables while the data from Firm 2 and Firm 3 will only have the specified variables. In the following illustration, notice the exact pattern,

Addressing Data Type Conflicts:

Data type conflicts can arise when appending datasets with different variable types. The red colored values in Stata indicate that the variables of these values are string variables. Now let us assume that we have another file named “string” where the ‘closing’ variable is a string variable as such,

In Stata, this red colored values in the ‘closing’ variable signifies that the variable is a string variable, essentially an alphanumeric or text-based variables. This usually happens when there is a presence of string data.

If you attempt to append a file with numeric variables corresponding to the string variables through this command,

append using firm1 string

Stata will show this error

As the software suggests, the “Force” option can help resolve these conflicts. Thus, your command would be,

append using firm1 string,force

Applying the “force” option does allow the operation to proceed. However, an unintended consequence of this will be that all the string values are transformed into missing values. This outcome affects our data integrity. If we need remove the alphanumeric values from the string data and then append them, we have to follow a different path. First, we need to open the specific file with string data in Stata with this command,

use string.dta

Your next step would be using the destring command. This command converts a string variable into a numerical one, replacing non-convertible entries with missing values. The command would be,

destring closing,replace force

Your Stata will have the following message,

Your data will be converted as such,

After converting the “closing” variable, we need to save the updated string data with this command

save string.dta,replace

However, one needs to be cautious when converting such data types to ensure data integrity.

Handling Value and Variable Labels:

When appending datasets, Stata preserves value and variable labels from the first dataset. If you have unique labels in subsequent datasets, you may need to update or merge labels manually. the “nolabel” and “nonotes” options can also be used accordingly with needs along the append commands.

Data appending in Stata offers flexibility for combining datasets with various options and scenarios. Understanding these methods allows you to efficiently manage and analyze your data.

Happy Learning!