R: Read Stata binary files

read.dta {foreign}

R Documentation

Read Stata binary files

Description

Reads a file in Stata version 5-8 or 7/SE binary format into a data frame.

Usage

read.dta(file, convert.dates = TRUE, tz = "GMT",
         convert.factors = TRUE, missing.type = FALSE,
         convert.underscore=TRUE)

Arguments

`file`	a filename as a character string.
`convert.dates`	Convert Stata dates to `POSIXct` class?
`tz`	timezone for date conversion
`convert.factors`	Use Stata value labels to create factors? (version 6.0 or later)
`missing.type`	For version 8 only, store information about different types of missing data?
`convert.underscore`	Convert `"_"` in Stata variable names to `"."` in R names?

Details

The variables in the Stata data set become the columns of the data frame. Missing values are correctly handled. The data label, variable labels, and timestamp are stored as attributes of the data frame. Nothing is done with variable characteristics.

Optionally, Stata dates (%d formats) are converted to R's POSIXct class and variables with Stata value labels are converted to factors. In any case the value label and format information is stored as attributes on the returned data frame.

Stata 8.0 has 27 different missing data values. If missing.type is TRUE a separate list is created with the same variable names as the loaded data. For string variables the list value is NULL. For other variables the value is NA where the observation is not missing and 0-26 when the observation is missing. This is attached as the code{"missing"} attribute of the returned value.

The option to allow underscores in variable names may become the default in future versions now that R supports their use.

Value

a data frame

Author(s)

Thomas Lumley

References

Stata Users Manual (versions 5 & 6), Programming manual (version 7), or online help (version 8) describe the format of the files

Examples

data(swiss)
write.dta(swiss,swissfile<-tempfile())
read.dta(swissfile)

[Package Contents]