[Stata] Converting xlsx, csv, and sav format to Stata dta files
You may have encountered the need to import data from different sources into Stata. In this blog post, I am going to introduce how to import data from Excel (.xlsx
), comma-separated values (.csv
), and SPSS (.sav
) files into Stata dat format.
Import excel format (xlsx
)
Here is a possible blog post on how to import xlsx, csv, spss files in Stata:
If you are working with data analysis, you may have encountered the need to import data from different sources into Stata. In this blog post, we will show you how to import data from Excel (.xlsx), comma-separated values (.csv), and SPSS (.sav) files into Stata using the built-in commands and the graphical user interface.
Importing data from Excel files
Excel files are one of the most common formats for storing and exchanging data. Stata can directly import data from Excel files with the extension .xls or .xlsx. There are two ways to do this: using the command import excel
or using the menu option File > Import > Excel Spreadsheet.
The command import excel
has the following syntax:
import excel using "filename.xlsx", sheet("Sheet1") firstrow case(lower)
where varlist
is an optional list of variables to import, filename
is the name of the Excel file, and options
are additional arguments that control how the data are imported. Some of the options are as follows.
sheet("sheetname")
: specify the name of the worksheet to import. If not specified, the first worksheet is imported by default.cellrange(start:end)
: specify the range of cells to import. For example,cellrange(A1:G10)
imports data from cells A1 to G10.firstrow
: treat the first row of data as variable names. If not specified, Stata will generate default variable names such as v1, v2, etc.case(preserve|lower|upper)
: preserve the case of variable names (the default), or convert them to lowercase or uppercase when usingfirstrow
.allstring("format")
: import all data as strings, optionally specifying a numeric display format.
Import csv format (csv
)
CSV files are plain text files that store data in a tabular format, where each row represents an observation and each column represents a variable. The values are separated by commas or other delimiters. Stata can import data from CSV files with the command import delimited
or the menu option File > Import > Text Data.
The command import delimited
has the following syntax:
import delimited using "filename.csv", varnames(1)
where varlist
, filename
, and options
are similar to those of import excel
. Some of the options are:
delim(string)
: specify the delimiter used in the CSV file. The default is comma (,), but other common delimiters are tab (\t), semicolon (;), and pipe (|).varnames
(1): treat the first row of data as variable names. If not specified, Stata will generate default variable names such as v1, v2, etc.
Import spss format (sav
)
SPSS files are binary files that store data and metadata in a proprietary format used by IBM SPSS Statistics software. Stata can import data from SPSS files with the extension .sav or .zsav (compressed) using the command import spss
or the menu option File > Import > SPSS Data.
The command import spss
has the following syntax:
import spss using "filename.sav", case(lower)
where varlist
, filename
, and options
are similar to those of import excel
. Some of the options are as follows.
describe
: list the available sheets and ranges of the SPSS file without importing the data.clear
: replace the data in memory with the imported data. If not specified, Stata will append the imported data to the existing data.locale("locale")
: specify the locale used by the SPSS file. This option has no effect on Windows, but may be needed on Mac or Linux to handle special characters.