Very often we obtain datasets that are saved inside a zipped folder owing to them being email attachments or generally large files. Such files are typically unzipped using software like WinRAR, 7-Zip or Windows’ built-in decompression tool. These compressed files are saved with a .zip or some other extension. We can save time by directly unzipping them especially when we are unzipping multiple files in Stata. So in this article we are going to learn how to unzip files in Stata
Download Example FileThe unzipping procedure depends on the extension of the zip file. To check the file extension, right click on the file and then left click on properties. We can find the file extension under the ‘type of file’ heading. You can directly navigate to the section of this article that is directly related to your file extension or you can follow in sequential manner to have better understanding.
Unzipping .zip Files
In order to unzip files in Stata, we first need to change Stata’s working directory by specifying a directory path that contains the relevant zipped folder.
cd “C:\Users\Desktop\Stata\zipfiles\zip”
This command changes Stata’s working directory. Let’s say there is a zipped file called ‘file1’ in this folder. We can unzip it using the unzip command by typing:
unzipfile “file1”
Stata unzips the file and returns details of the process in the results window where it specifies which files were unzipped, and how many were skipped and extracted.
To unzip the file again and replace the previously unzipped folder, we add a replace
option.
unzipfile “file1”, replace
Unzipping the same file without adding this option leads to Stata skipping all the files because it cannot overwrite/replace the files with the same name that were extracted previously.
Related Post: Stata Command Syntax: How To Write Commands in Stata
Unzipping .rar Files
Compressed folders that are created using the WinRAR software are saved with an extension of .rar. The unzipfile
command described above cannot unzip folders with this extension. To unzip .rar files, we change our working directory to the relevant folder again. This time, our compressed file called file2.rar is saved in a folder named ‘rar’:
cd “C:\Users\Desktop\Stata\zipfiles\rar”
To unzip this, we use the shell
command. This command interacts directly with the operating system. In our example, we use this command to open the WinRAR software by typing the path to where the software is installed and specifying the folder name that needs to be unzipped.
shell set path=”C:\Program Files\WinRAR”; %path% & unrar e “file2.rar”
Unzipping Other File Formats
Other file formats can be extracted using the aforementioned unzipfile
command. We begin extracting a file with a .z extension by once again resetting the working directory to the relevant folder.
cd “C:\Users\Desktop\Stata\zipfiles\others” unzipfile “file2.z”
Because Stata’s default setting for this command is to extract .zip files, we must specify the extension if any other format is being unzipped.
Extracting Multiple Files
To extract multiple zipped files (named ‘file4’ and ‘file5’), we make use of locals and loops. Loops are particularly helpful when you have a large number of files to unzip and typing a separate command for each one of them is cumbersome. We store the working directory in a local variable named ‘sourcedir’.
local sourcedir “C:\Users\Desktop\Stata\zipfiles\multiplefiles”
This is then followed by using the cd
command as before but we just type sourcedir
– the local storing the directory path – in the inverted commas. Because sourcedir
refers to a local variable, we must enclose it in left and right single quotes.
cd “`sourcedir’”
Our next step is to ask Stata to store the names of all the .zip files in the source directory in a local called ‘folder’. The syntax for that is as follows:
local folder : dir “`sourcedir’” files “*.zip*”
This creates a local variable called ‘folder’ (local folder) that has the names of all files with the .zip extensions (files “*.zip*”) in the directory stored in `sourcedir’ (dir “`sourcedir’”).
Now we just need to use a foreach loop to unzip each of the files in turn.
foreach file in `folder’ { unzipfile `file’, replace }
This loop keeps on running until all the files whose names are stored in the `folder’ local are unzipped.