How to Combine Multiple CSV/Excel Files in Stata

This article is going to discuss how to combine multiple Excel data files into one Stata file. Let’s suppose we have 15 Excel files, each file with daily stock market data of one company. In order to combine data for all 15 companies in one file, we follow three steps:

Step 1: Set the Current Directory

Your working directory is the location where all your files are saved. Once you specify this directory in Stata, you can refer to files just by their names instead of typing out the entire directory each time a file or dataset needs to be mentioned. To do this, we use the ‘cd’ command followed by the directory path enclosed in inverted commas:

cd “E:\Combining data from multiple files”

Your directory address could look longer as well if your files are located inside a nested folder. In this example, the file for the first company is named ‘1.csv’, for the second it is named ‘2.csv’, and so on.

Step 2: Use a Loop to Import and Save Your Files in Stata Format

Once our directory containing the relevant files is set, we need to save the Excel data files in Stata format before proceeding to combine them as CSV files cannot be combined directly. Because there are multiple files that need to be converted, it is more efficient to use a loop to do this task. The following loop imports each individual Excel data file into Stata, and saves is it as a Stata data file.

forvalues 1/15 {
             clear
             import delimited `i’.csv
             save `i’.dta, replace
 } 

Let’s deconstruct the code above. The first line indicates that the code inside the curly brackets is going to run 15 times. The local `i‘ mentioned in the code will take the value ‘1’ when the loop runs for the first time. It will take the value ‘2’ when it is run the second time, and so on. This will occur for values until 15. Note that every time the local i is mentioned, it is enclosed between the symbols ` and '. The first of these can be found beside the number 1 key on the keyboard. It is also called the grave accent or backtick key. The symbol at the end of the local is a single quote.

The clear command inside the loop will ensure that there is no existing data loaded in Stata and that the file is ready to import new data on a clean and empty slate.

The import delimited command is used to import CSV files into Stata. The local `i' is used to indicate the file name because it will incrementally take values from 1 to 15 as indicated by forvalues 1/15, which is also how our Excel files are named. This command will load the Excel data into Stata.

The last line of the code inside the loop saves the loaded data as a Stata data (.dta) file. Once again, we make use of the local `i' to save the files with the numbers 1 to 15, but this time with a ‘.dta’ extension. The replace option ensure that any existing file with the name 1.dta (all the way to 15.dta) is replaced by the new dataset we just loaded and saved.

All 15 files in the ‘.dta’ format will be saved in the directory that was specified earlier using the cd command. If suppose, you wish to save these new files in separate folder located inside the directory, called, say ‘folder2’, we can adjust the save command as follows:

save folder2\\`i’.dta, replace

Note the addition of the two backward slashes. We add two slashes to make sure that Stata does not consider the ` symbol to be a part of the file name.

Step 3: Combine All the Files Together

After converting your CSV files into Stata’s .dta format, they are ready to be combined in one file. We will once again use a loop to append all fifteen files together.

Now it is worth going over the two ways that bring together two or more datasets. Appending datasets together means that you are adding more observations/rows i.e., combining data that is structurally similar. Merging datasets means you are adding more variables/columns.

In this case, we would like to append the datasets together. This is because structurally, they all have the same variables with different observations under each in each file (daily stock time series data being converted to panel data). To append the files together, we use the code:

forvalues i = 1/15{
             append using folder2\\`i’.dta
 } 

The workings of the loop are the same as described earlier. The append using syntax simply tells Stata to append the current dataset using a second, subsequently specified dataset.

Using the unique (ssc install) command, we can check how many unique values a variable has. The variable name for the company’s stock symbol here is ‘symbol’.

unique symbol

As expected, the command returns an output that tells us that the number of unique values for this variable is 15. Further using the tabulate command will return a list of the 15 unique values for the variable.

tabulate symbol

This returns us with a table listing down the 15 unique symbols, the number of observations corresponding to each company, percentage share of their observations in the dataset, and the cumulative percentage.

Combining Files with Different Names

Our task above was greatly eased by the fact that our CSV files to be combined were named conveniently. What if our files have different, random names? In that case, we will start off again by changing our directory using the cd command as shown before, followed by the following command:

local files: dir “E:\Combining Data from multiple files” files “*.csv”
 foreach file in `files’{
             clear
             Import delimited `file’
             save `file’.dta, replace
 } 

The very first local command creates a local called ‘files’. In this local, it saves the list of all the files you wish to combine by first stating the directory where your files are stored, and then indicating via the *.csv totell Stata to get the name of all the files in that directory have a .csv extension. Asterisk is used as wild card.

foreach file in `files' prepares the loop to run for all these files that are stored in the local specified as files. The rest of the loop will execute in the same manner as we described previously. The files will be stored with the same name that they had in CSV format.

In order to combine these files, we once again use the local command to locate files in the directory that have .dta as their extension.

local files : dir “E:\Combining Data from multiple files” files “*.dta”

The list of these .dta files is once again saved in a local variable called ‘files’. They are appended using the following loop:

foreach file in files{
             append using `file’
 } 

All the data for the fifteen firms whose file names were stored in the local ‘files’ will be appended one by one.

Subscribe
Notify of
guest
1 Comment
Newest
Oldest Most Voted
Inline Feedbacks
View all comments
Clio
Clio
5 months ago

This is great, I found it super useful. I had to tweak the code a little to import excel (xlsx) files with different names: global dir “E:\Combining Data from multiple files” local files: dir “$dir” files “*.xlsx”, respectcase foreach file in 'files’ {             clear             import excel using “$dir”\\'file’", firstrow             save “$dir”\\'file’.dta", replace } local files : dir "$dir" files "*.dta", respectcase foreach j in'files' {              append using "$dir\\'j'" } respectcase because my file names have uppercase letters added the file path in the import, save and append lines also found a… Read more »

Last edited 5 months ago by Clio
1
0
Would love your thoughts, please comment.x
()
x
Tweet
Share
Share
Pin