R: Data on ranges

RangedData-class {IRanges}

R Documentation

Data on ranges

Description

IMPORTANT NOTE: RangedData objects will be deprecated in BioC 3.7! The use of RangedData objects has been discouraged in favor of GRanges or GRangesList objects since BioC 2.12, that is, since 2014. The GRanges and GRangesList classes are defined in the GenomicRanges package. See ?GRanges and ?GenomicRanges (after loading the GenomicRanges package) for more information about these classes. PLEASE MIGRATE YOUR CODE TO USE GRanges OR GRangesList OBJECTS INSTEAD OF RangedData OBJECTS AS SOON AS POSSIBLE. Don't hesitate to ask on the bioc-devel mailing list (https://bioconductor.org/help/support/#bioc-devel) if you need help with this.

RangedData supports storing data, i.e. a set of variables, on a set of ranges spanning multiple spaces (e.g. chromosomes). Although the data is split across spaces, it can still be treated as one cohesive dataset when desired and extends DataTable.

Details

A RangedData object consists of two primary components: a RangesList holding the ranges over multiple spaces and a parallel SplitDataFrameList, holding the split data. There is also an universe slot for denoting the source (e.g. the genome) of the ranges and/or data.

There are two different modes of interacting with a RangedData. The first mode treats the object as a contiguous "data frame" annotated with range information. The accessors start, end, and width get the corresponding fields in the ranges as atomic integer vectors, undoing the division over the spaces. The [[ and matrix-style [, extraction and subsetting functions unroll the data in the same way. [[<- does the inverse. The number of rows is defined as the total number of ranges and the number of columns is the number of variables in the data. It is often convenient and natural to treat the data this way, at least when the data is small and there is no need to distinguish the ranges by their space.

The other mode is to treat the RangedData as a list, with an element (a virtual Ranges/DataFrame pair) for each space. The length of the object is defined as the number of spaces and the value returned by the names accessor gives the names of the spaces. The list-style [ subset function behaves analogously.

Accessor methods

In the code snippets below, x is a RangedData object.

The following accessors treat the data as a contiguous dataset, ignoring the division into spaces:

Array accessors:

: nrow(x): The number of ranges in x.
: ncol(x): The number of data variables in x.
: dim(x): An integer vector of length two, essentially c(nrow(x), ncol(x)).
: rownames(x), rownames(x) <- value: Gets or sets the names of the ranges in x.
: colnames(x), colnames(x) <- value: Gets the names of the variables in x.
: dimnames(x): A list with two elements, essentially list(rownames(x), colnames(x)).
: dimnames(x) <- value: Sets the row and column names, where value is a list as described above.
: columnMetadata(x): Get the DataFrame of metadata along the value columns, i.e., where each column in x is represented by a row in the metadata. Note that calling mcols(x) returns the metadata on each space in x.
: columnMetadata(x) <- value: Set the DataFrame of metadata for the columns.
: within(data, expr, ...): Evaluates expr within data, a RangedData. Any values assigned in expr will be stored as value columns in data, unless they match one of the reserved names: ranges, start, end, width and space. Behavior is undefined if any of the range symbols are modified inconsistently. Modifications to space are ignored.

Range accessors. The type of the return value depends on the type of Ranges. For IRanges, an integer vector. Regardless, the number of elements is always equal to nrow(x).

: start(x), start(x) <- value: Get or set the starts of the ranges. When setting the starts, value can be an integer vector of length(sum(elementNROWS(ranges(x)))) or an IntegerList object of length length(ranges(x)) and names names(ranges(x)).
: end(x), end(x) <- value: Get or set the ends of the ranges. When setting the ends, value can be an integer vector of length(sum(elementNROWS(ranges(x)))) or an IntegerList object of length length(ranges(x)) and names names(ranges(x)).
: width(x), width(x) <- value: Get or set the widths of the ranges. When setting the widths, value can be an integer vector of length(sum(elementNROWS(ranges(x)))) or an IntegerList object of length length(ranges(x)) and names names(ranges(x)).

These accessors make the object seem like a list along the spaces:

: length(x): The number of spaces (e.g. chromosomes) in x.
: names(x), names(x) <- value: Get or set the names of the spaces (e.g. "chr1"). NULL or a character vector of the same length as x.

Other accessors:

: universe(x), universe(x) <- value: Get or set the scalar string identifying the scope of the data in some way (e.g. genome, experimental platform, etc). The universe may be NULL.
: ranges(x), ranges(x) <- value: Gets or sets the ranges in x as a RangesList.
: space(x): Gets the spaces from ranges(x).
: values(x), values(x) <- value: Gets or sets the data values in x as a SplitDataFrameList.
: score(x), score(x) <- value: Gets or sets the column representing a "score" in x, as a vector. This is the column named score, or, if this does not exist, the first column, if it is numeric. The get method return NULL if no suitable score column is found. The set method takes a numeric vector as its value.

Constructor

RangedData(ranges = IRanges(), ..., space = NULL, universe = NULL): Creates a RangedData with the ranges in ranges and variables given by the arguments in .... See the constructor DataFrame for how the ... arguments are interpreted.

If ranges is a Ranges object, the space argument is used to split of the data into spaces. If space is NULL, all of the ranges and values are placed into the same space, resulting in a single-space (length one) RangedData object. Otherwise, the ranges and values are split into spaces according to space, which is treated as a factor, like the f argument in split.

If ranges is a RangesList object, then the supplied space argument is ignored and its value is derived from ranges.

If ranges is not a Ranges or RangesList object, this function calls as(ranges, "RangedData") and returns the result if successful.

The universe may be specified as a scalar string by the universe argument.

Coercion

as.data.frame(x, row.names=NULL, optional=FALSE, ...): Copy the start, end, width of the ranges and all of the variables as columns in a data.frame. This is a bridge to existing functionality in R, but of course care must be taken if the data is large. Note that optional and ... are ignored.

as(from, "DataFrame"): Like as.data.frame above, except the result is an DataFrame and it probably involves less copying, especially if there is only a single space.

as(from, "RangedData"): Coerce from to a RangedData, according to the type of from:

Rle, RleList: Converts each run to a range and stores the run values in a column named "score".
RleViewsList: Creates a RangedData using the ranges given by the runs of subject(from) in each of the windows, with a value column score taken as the corresponding subject values.
Ranges: Creates a RangedData with only the ranges in from; no data columns.
RangesList: Creates a RangedData with the ranges in from. Also propagates the inner metadata columns of the RangesList (accessed with mcols(unlist(from))) to the data columns (aka values) of the RangedData. This makes it a lossless coercion and the exact reverse of the coercion from RangedData to RangesList.
data.frame or DataTable: Constructs a RangedData, using the columns “start”, “end”, and, optionally, “space” columns in from. The other columns become data columns in the result. Any “width” column is ignored.

as(from, "RangesList"): Creates a CompressedIRangesList (a subclass of RangesList) made of the ranges in from. Also propagates the data columns (aka values) of the RangedData to the inner metadata columns of the RangesList. This makes it a lossless coercion and the exact reverse of the coercion from RangesList to RangedData.

as.env(x, enclos = parent.frame()): Creates an environment with a symbol for each variable in the frame, as well as a ranges symbol for the ranges. This is efficient, as no copying is performed.

Subsetting and Replacement

In the code snippets below, x is a RangedData object.

x[i]: Subsets x by indexing into its spaces, so the result is of the same class, with a different set of spaces. i can be numerical, logical, NULL or missing.

x[i,j]: Subsets x by indexing into its rows and columns. The result is of the same class, with a different set of rows and columns. The row index i can either treat x as a flat table by being a character, integer, or logical vector or treat x as a partitioned table by being a RangesList, LogicalList, or IntegerList of the same length as x.

x[[i]]: Extracts a variable from x, where i can be a character, numeric, or logical scalar that indexes into the columns. The variable is unlisted over the spaces.

For convenience, values of "space" and "ranges" are equivalent to space(x) and unlist(ranges(x)) respectively.

x$name: similar to above, where name is taken literally as a column name in the data.

x[[i]] <- value: Sets value as column i in x, where i can be a character, numeric, or logical scalar that indexes into the columns. The length of value should equal nrow(x). x[[i]] should be identical to value after this operation.

For convenience, i="ranges" is equivalent to ranges(x) <- value.

x$name <- value: similar to above, where name is taken literally as a column name in the data.

Splitting and Combining

In the code snippets below, x is a RangedData object.

: rbind(...): Matches the spaces from the RangedData objects in ... by name and combines them row-wise.
: c(x, ..., recursive = FALSE): Combines x with arguments specified in ..., which must all be RangedData objects. This combination acts as if x is a list of spaces, meaning that the result will contain the spaces of the first concatenated with the spaces of the second, and so on. This function is useful when creating RangedData objects on a space-by-space basis and then needing to combine them.

Applying

An lapply method is provided to apply a function over the spaces of a RangedData:

: lapply(X, FUN, ...): Applies FUN to each space in X with extra parameters in ....

Author(s)

Michael Lawrence

Examples

  ranges <- IRanges(c(1,2,3),c(4,5,6))
  filter <- c(1L, 0L, 1L)
  score <- c(10L, 2L, NA)

  ## constructing RangedData instances

  ## no variables
  rd <- RangedData()
  rd <- RangedData(ranges)
  ranges(rd)
  ## one variable
  rd <- RangedData(ranges, score)
  rd[["score"]]
  ## multiple variables
  rd <- RangedData(ranges, filter, vals = score)
  rd[["vals"]] # same as rd[["score"]] above
  rd$vals
  rd[["filter"]]
  rd <- RangedData(ranges, score + score)
  rd[["score...score"]] # names made valid

  ## split some data over chromosomes

  range2 <- IRanges(start=c(15,45,20,1), end=c(15,100,80,5))
  both <- c(ranges, range2)
  score <- c(score, c(0L, 3L, NA, 22L))
  filter <- c(filter, c(0L, 1L, NA, 0L)) 
  chrom <- paste("chr", rep(c(1,2), c(length(ranges), length(range2))), sep="")

  rd <- RangedData(both, score, filter, space = chrom)
  rd[["score"]] # identical to score
  rd[1][["score"]] # identical to score[1:3]
  
  ## subsetting

  ## list style: [i]

  rd[numeric()] # these three are all empty
  rd[logical()]
  rd[NULL]
  rd[] # missing, full instance returned
  rd[FALSE] # logical, supports recycling
  rd[c(FALSE, FALSE)] # same as above
  rd[TRUE] # like rd[]
  rd[c(TRUE, FALSE)]
  rd[1] # numeric index
  rd[c(1,2)]
  rd[-2]

  ## matrix style: [i,j]

  rd[,NULL] # no columns
  rd[NULL,] # no rows
  rd[,1]
  rd[,1:2]
  rd[,"filter"]
  rd[1,] # now by the rows
  rd[c(1,3),]
  rd[1:2, 1] # row and column
  rd[c(1:2,1,3),1] ## repeating rows

  ## dimnames

  colnames(rd)[2] <- "foo"
  colnames(rd)
  rownames(rd) <- head(letters, nrow(rd))
  rownames(rd)

  ## space names

  names(rd)
  names(rd)[1] <- "chr1"

  ## variable replacement

  count <- c(1L, 0L, 2L)
  rd <- RangedData(ranges, count, space = c(1, 2, 1))
  ## adding a variable
  score <- c(10L, 2L, NA)
  rd[["score"]] <- score
  rd[["score"]] # same as 'score'
  ## replacing a variable
  count2 <- c(1L, 1L, 0L)
  rd[["count"]] <- count2
  ## numeric index also supported
  rd[[2]] <- score
  rd[[2]] # gets 'score'
  ## removing a variable
  rd[[2]] <- NULL
  ncol(rd) # is only 1
  rd$score2 <- score
  
  ## combining

  rd <- RangedData(ranges, score, space = c(1, 2, 1))
  c(rd[1], rd[2]) # equal to 'rd'
  rd2 <- RangedData(ranges, score)

  ## applying

  lapply(rd, `[[`, 1) # get first column in each space

[Package IRanges version 2.12.0 Index]

Data on ranges

Description

Details

Accessor methods

Constructor

Coercion

Subsetting and Replacement

Splitting and Combining

Applying

Author(s)

See Also

Examples