R: Extract or Replace Parts of a Data Frame

Extract.data.frame {base}

R Documentation

Extract or Replace Parts of a Data Frame

Description

Extract or replace subsets of data frames.

Usage

x[i]
x[i] <- value
x[i, j, drop = TRUE]
x[i, j] <- value

x[[i]]
x[[i]] <- value
x[[i, j]]
x[[i, j]] <- value

x$name
x$name <- value

Arguments

`x`	data frame.
`i, j`	elements to extract or replace. `i, j` are `numeric` or `character` or, for `[` only, empty. Numeric values are coerced to integer as if by `as.integer`. For replacement by `[`, a logical matrix is allowed.
`drop`	logical. If `TRUE` the result is coerced to the lowest possible dimension: however, see the Warning below.
`value`	A suitable replacement value: it will be repeated a whole number of times if necessary and it may be coerced: see the Coercion section. If `NULL`, deletes the column if a single column is selected.
`name`	name or literal character string.

Details

Data frames can be indexed in several modes. When [ and [[ are used with a single index, they index the data frame as if it were a list. In this usage a drop argument is ignored, with a warning. Using $ is equivalent to using [[ with a single index.

When [ and [[ are used with two indices they act like indexing a matrix: [[ can only be used to select one element.

If [ returns a data frame it will have unique (and non-missing) row names, if necessary transforming the row names using make.unique. Similarly, column names will be transformed (if columns are selected more than once).

When drop =TRUE, this is applied to the subsetting of any matrices contained in the data frame as well as to the data frame itself.

The replacement methods can be used to add whole column(s) by specifying non-existent column(s), in which case the column(s) are added at the right-hand edge of the data frame and numerical indices must be contiguous to existing indices. On the other hand, rows can be added at any row after the current last row, and the columns will be in-filled with missing values.

For [ the replacement value can be a list: each element of the list is used to replace (part of) one column, recycling the list as necessary. If the columns specified by number are created, the names (if any) of the corresponding list elements are used to name the columns. If the replacement is not selecting rows, list values can contain NULL elements which will cause the corresponding columns to be deleted.

Matrixing indexing using [ is not recommended, and barely supported. For extraction, x is first coerced to a matrix. For replacement a logical matrix (only) can be used to select the elements to be replaced in the same ways as for a matrix. Missing values in the matrix are treated as false, unlike S which does not replace them but uses up the corresponding values in value.

Value

For [ a data frame, list or a single column (the latter two only when dimensions have been dropped). If matrix indexing is used for extraction a matrix results.
For [[ a column of the data frame (extraction with one index) or a length-one vector (extraction with two indices).
For [<-, [[<- and $<-, a data frame.

Coercion

The story over when replacement values are coerced is a complicated one, and one that has changed during R's development. This section is a guide only.

When [ and [[ are used to add or replace a whole column, no coercion takes place but value will be replicated (by calling the generic function rep) to the right length if an exact number of repeats can be used.

When [ is used with a logical matrix, each value is coerced to the type of the column in which it is to be placed.

When [ and [[ are used with two indices, the column will be coerced as necessary to accommodate the value.

Note that when the replacement value is an array (including a matrix) it is not treated as a series of columns (as data.frame and as.data.frame do) but inserted as a single column.

Warning

Although the default for drop is TRUE, the default behaviour when only one row is left is equivalent to specifying drop = FALSE. To drop from a data frame to a list, drop = FALSE has to specified explicitly.

Examples

data(swiss)
sw <- swiss[1:5, 1:4]  # select a manageable subset

sw[1:3]      # select columns
sw[, 1:3]    # same
sw[4:5, 1:3] # select rows and columns
sw[1]        # a one-column data frame
sw[, 1, drop = FALSE]  # the same
sw[, 1]      # a (unnamed) vector
sw[[1]]      # the same

sw[1,]       # a one-row data frame
sw[1,, drop=TRUE]  # a list

swiss[ c(1, 1:2), ]   # duplicate row, unique row names are created

sw[sw <= 6] <- 6  # logical matrix indexing
sw

## adding a column
sw["new1"] <- LETTERS[1:5]   # adds a character column
sw[["new2"]] <- letters[1:5] # ditto
sw[, "new3"] <- LETTERS[1:5] # ditto
                             # but this got converted to a factor in 1.7.x
sw$new4 <- 1:5
sapply(sw, class)
sw$new4 <- NULL              # delete the column
sw
sw[6:8] <- list(letters[10:14], NULL, aa=1:5) # delete col7, update 6, append
sw

## matrices in a data frame
A <- data.frame(x=1:3, y=I(matrix(4:6)), z=I(matrix(letters[1:9],3,3)))
A[1:3, "y"] # a matrix, was a vector prior to 1.8.0
A[1:3, "z"] # a matrix
A[, "y"]    # a matrix

[Package Contents]