| nearest-methods {IRanges} | R Documentation |
Finding the nearest range/position neighbor
Description
The nearest(), precede(), follow(), distance()
and distanceToNearest() methods for IntegerRanges
derivatives (e.g. IRanges objects).
Usage
## S4 method for signature 'IntegerRanges,IntegerRanges_OR_missing'
nearest(x, subject, select=c("arbitrary", "all"))
## S4 method for signature 'IntegerRanges,IntegerRanges_OR_missing'
precede(x, subject, select=c("first", "all"))
## S4 method for signature 'IntegerRanges,IntegerRanges_OR_missing'
follow(x, subject, select=c("last", "all"))
## S4 method for signature 'IntegerRanges,IntegerRanges'
distance(x, y)
## S4 method for signature 'Pairs,missing'
distance(x, y)
## S4 method for signature 'IntegerRanges,IntegerRanges_OR_missing'
distanceToNearest(x, subject, select=c("arbitrary", "all"))
Arguments
x |
The query IntegerRanges derivative, or (for
|
subject |
The subject IntegerRanges object, within which
the nearest neighbors are found. Can be missing, in which case
|
select |
Logic for handling ties. By default, all the methods
select a single interval (arbitrary for |
y |
For the |
... |
Additional arguments for methods |
Details
nearest(x, subject, select=c("arbitrary", "all")):-
The conventional nearest neighbor finder. Returns an integer vector containing the index of the nearest neighbor range in
subjectfor each range inx. If there is no nearest neighbor (ifsubjectis empty), NA's are returned.Here is how it proceeds, for a range
xiinx:Find the ranges in
subjectthat minimize the distance toxi. If a single rangesiinsubjectachieves the shortest distance toxi,siis returned as the nearest neighbor ofxi. If multiple ranges insubjectachieve the shortest distance toxi, one of them is chosen arbitrarily.See
distancebelow for how the distance between two ranges is defined. precede(x, subject, select=c("first", "all")):-
For each range in
x,precedereturns the index of the interval insubjectthat is directly preceded by the query range. Overlapping ranges are excluded.NAis returned when there are no qualifying ranges insubject. follow(x, subject, select=c("last", "all")):-
The opposite of
precede, this function returns the index of the range insubjectthat a query range inxdirectly follows. Overlapping ranges are excluded.NAis returned when there are no qualifying ranges insubject. distance(x, y):-
Returns the distance for each range in
xto the range iny.The
distancemethod differs from others documented on this page in that it is symmetric;ycannot be missing. Ifxandyare not the same length, the shortest will be recycled to match the length of the longest. Theselectargument is not available fordistancebecause comparisons are made in a pair-wise fashion. The return value is the length of the longest ofxandy.The
distancecalculation changed in BioC 2.12 to accommodate zero-width ranges in a consistent and intuitive manner. The new distance can be explained by a block model where a range is represented by a series of blocks of size 1. Blocks are adjacent to each other and there is no gap between them. A visual representation ofIRanges(4,7)would be+-----+-----+-----+-----+ 4 5 6 7The distance between two consecutive blocks is 0L (prior to Bioconductor 2.12 it was 1L). The new distance calculation now returns the size of the gap between two ranges.
This change to distance affects the notion of overlaps in that we no longer say:
x and y overlap <=> distance(x, y) == 0
Instead we say
x and y overlap => distance(x, y) == 0
or
x and y overlap or are adjacent <=> distance(x, y) == 0
distanceToNearest(x, subject, select=c("arbitrary", "all")):-
Returns the distance for each range in
xto its nearest neighbor insubject. selectNearest(hits, x, subject):-
Selects the hits that have the minimum distance within those for each query range. Ties are possible and can be broken with
breakTies.
Value
For nearest(), precede() and follow(), an integer
vector of indices in subject, or a Hits object
if select="all".
For distance(), an integer vector of distances between the ranges
in x and y.
For distanceToNearest(), a Hits object with
a metadata column reporting the distance between the pair.
Access the distance metadata column with the
mcols() accessor.
For selectNearest(), a Hits object, sorted by query.
Author(s)
M. Lawrence
See Also
-
Hits objects implemented in the S4Vectors package.
-
findOverlapsfor finding just the overlapping ranges. The IntegerRanges class.
-
nearest-methods in the GenomicRanges package for the
nearest(),precede(),follow(),distance(), anddistanceToNearest()methods for GenomicRanges objects.
Examples
## ------------------------------------------
## precede() and follow()
## ------------------------------------------
query <- IRanges(c(1, 3, 9), c(3, 7, 10))
subject <- IRanges(c(3, 2, 10), c(3, 13, 12))
precede(query, subject) # c(3L, 3L, NA)
precede(IRanges(), subject) # integer()
precede(query, IRanges()) # rep(NA_integer_, 3)
precede(query) # c(3L, 3L, NA)
follow(query, subject) # c(NA, NA, 1L)
follow(IRanges(), subject) # integer()
follow(query, IRanges()) # rep(NA_integer_, 3)
follow(query) # c(NA, NA, 2L)
## ------------------------------------------
## nearest()
## ------------------------------------------
query <- IRanges(c(1, 3, 9), c(2, 7, 10))
subject <- IRanges(c(3, 5, 12), c(3, 6, 12))
nearest(query, subject) # c(1L, 1L, 3L)
nearest(query) # c(2L, 1L, 2L)
## ------------------------------------------
## distance()
## ------------------------------------------
## adjacent
distance(IRanges(1,5), IRanges(6,10)) # 0L
## overlap
distance(IRanges(1,5), IRanges(3,7)) # 0L
## zero-width
sapply(-3:3, function(i) distance(shift(IRanges(4,3), i), IRanges(4,3)))