| DataFrame-class {S4Vectors} | R Documentation |
DataFrame objects
Description
The DataFrame class extends the RectangularData virtual
class supports the storage of any type of object (with length
and [ methods) as columns.
Details
On the whole, the DataFrame behaves very similarly to
data.frame, in terms of construction, subsetting, splitting,
combining, etc. The most notable exceptions have to do with handling
of the row names:
The row names are optional. This means calling
rownames(x)will returnNULLif there are no row names. Of course, it could returnseq_len(nrow(x)), but returningNULLinforms, for example, combination functions that no row names are desired (they are often a luxury when dealing with large data).The row names are not required to be unique.
Subsetting by row names does not use partial matching.
As DataFrame derives from Vector, it is
possible to set an annotation string. Also, another
DataFrame can hold metadata on the columns.
For a class to be supported as a column, it must have length
and [ methods, where [ supports subsetting only by
i and respects drop=FALSE. Optionally, a method may be
defined for the showAsCell generic, which should return a
vector of the same length as the subset of the column passed to
it. This vector is then placed into a data.frame and converted
to text with format. Thus, each element of the vector should be
some simple, usually character, representation of the corresponding
element in the column.
Constructor
DataFrame(..., row.names = NULL, check.names = TRUE, stringsAsFactors):-
Constructs a
DataFramein similar fashion todata.frame. Each argument in...is coerced to aDataFrameand combined column-wise. The row names should be given inrow.names; otherwise, they are inherited from the arguments, as indata.frame. Explicitly passingNULLtorow.namesensures that there are no rownames. Ifcheck.namesisTRUE, the column names will be checked for syntactic validity and made unique, if necessary.To store an object of a class that does not support coercion to
DataFrame, wrap it inI(). The class must still have methods forlengthand[.The
stringsAsFactorsargument is ignored. The coercion of column arguments to DataFrame determines whether strings become factors. make_zero_col_DFrame(nrow):-
Constructs a zero-column DFrame object with
nrowrows. Intended for developers to use in other packages and typically not needed by the end user.
Accessors
In the following code snippets, x is a DataFrame.
dim(x):-
Get the length two integer vector indicating in the first and second element the number of rows and columns, respectively.
dimnames(x),dimnames(x) <- value:-
Get and set the two element list containing the row names (character vector of length
nrow(x)orNULL) and the column names (character vector of lengthncol(x)).
Coercion
as(from, "DataFrame"):-
By default, constructs a new
DataFramewithfromas its only column. Iffromis amatrixordata.frame, all of its columns become columns in the newDataFrame. Iffromis a list, each element becomes a column, recycling as necessary. Note that for theDataFrameto behave correctly, each column object must support element-wise subsetting via the[method and return the number of elements withlength. It is recommended to use theDataFrameconstructor, rather than this interface. as.list(x):Coerces
x, aDataFrame, to alist.as.data.frame(x, row.names=NULL, optional=FALSE, make.names=TRUE):-
Coerces
x, aDataFrame, to adata.frame. Each column is coerced to adata.frameand then column bound together. Ifrow.namesisNULL, they are propagated fromx, if it has any. Otherwise, they are inferred by thedata.frameconstructor.Like the
as.data.frame()method for classmatrix, the method for classDataFramesupports themake.namesargument.make.namescan be set toTRUEorFALSEto indicate what should happen if the row names ofx(or the row names supplied via therow.namesargument) are invalid (e.g. contain duplicates). If they are invalid, andmake.namesisTRUE(the default), they get "fixed" by going thrumake.names(*, unique=TRUE). Otherwise (i.e. ifmake.namesisFALSE), an error is raised. Note that unlike the method for classmatrix,make.names=NAis not supported.NOTE: Conversion of
xto adata.frameis not supported ifxcontains anylist,SimpleList, orCompressedListcolumns. as(from, "data.frame"):Coerces a
DataFrameto adata.frameby callingas.data.frame(from).as.matrix(x):Coerces the
DataFrameto amatrix, if possible.as.env(x, enclos = parent.frame()):-
Creates an environment from
xwith a symbol for eachcolnames(x). The values are not actually copied into the environment. Rather, they are dynamically bound usingmakeActiveBinding. This prevents unnecessary copying of the data from the external vectors into R vectors. The values are cached, so that the data is not copied every time the symbol is accessed.
Subsetting
In the following code snippets, x is a DataFrame.
x[i,j,drop]:Behaves very similarly to the
[.data.framemethod, exceptican be a logicalRleobject and subsetting bymatrixindices is not supported. Indices containingNA's are also not supported.x[i,j] <- value:Behaves very similarly to the
[<-.data.framemethod.x[[i]]:Behaves very similarly to the
[[.data.framemethod, except argumentsjandexactare not supported. Column name matching is always exact. Subsetting by matrices is not supported.x[[i]] <- value:Behaves very similarly to the
[[<-.data.framemethod, except argumentjis not supported.
Displaying
The show() method for DataFrame objects obeys global options
showHeadLines and showTailLines for controlling the number
of head and tail rows to display.
See ?get_showHeadLines for more information.
Author(s)
Michael Lawrence
See Also
-
DataFrame-combine for combining DataFrame objects.
-
DataFrame-utils for other common operations on DataFrame objects.
-
TransposedDataFrame objects.
-
RectangularData and SimpleList which DataFrame extends directly.
-
get_showHeadLinesfor controlling the number of DataFrame rows to display.
Examples
score <- c(1L, 3L, NA)
counts <- c(10L, 2L, NA)
row.names <- c("one", "two", "three")
df <- DataFrame(score) # single column
df[["score"]]
df <- DataFrame(score, row.names = row.names) #with row names
rownames(df)
df <- DataFrame(vals = score) # explicit naming
df[["vals"]]
# arrays
ary <- array(1:4, c(2,1,2))
sw <- DataFrame(I(ary))
# a data.frame
sw <- DataFrame(swiss)
as.data.frame(sw) # swiss, without row names
# now with row names
sw <- DataFrame(swiss, row.names = rownames(swiss))
as.data.frame(sw) # swiss
# subsetting
sw[] # identity subset
sw[,] # same
sw[NULL] # no columns
sw[,NULL] # no columns
sw[NULL,] # no rows
## select columns
sw[1:3]
sw[,1:3] # same as above
sw[,"Fertility"]
sw[,c(TRUE, FALSE, FALSE, FALSE, FALSE, FALSE)]
## select rows and columns
sw[4:5, 1:3]
sw[1] # one-column DataFrame
## the same
sw[, 1, drop = FALSE]
sw[, 1] # a (unnamed) vector
sw[[1]] # the same
sw[["Fertility"]]
sw[["Fert"]] # should return 'NULL'
sw[1,] # a one-row DataFrame
sw[1,, drop=TRUE] # a list
## duplicate row, unique row names are created
sw[c(1, 1:2),]
## indexing by row names
sw["Courtelary",]
subsw <- sw[1:5,1:4]
subsw["C",] # no partial match (unlike with data.frame)
## row and column names
cn <- paste("X", seq_len(ncol(swiss)), sep = ".")
colnames(sw) <- cn
colnames(sw)
rn <- seq(nrow(sw))
rownames(sw) <- rn
rownames(sw)
## column replacement
df[["counts"]] <- counts
df[["counts"]]
df[[3]] <- score
df[["X"]]
df[[3]] <- NULL # deletion