I will describe the problem using GRanges, but it is solely within IRanges. Ranges of width are apparently allowed. You can obtain them for instance calling `promoters(TxDb, upstream=0, downstream=0)`
Specifically, these ranges cannot be found by `distanceToNearest` _if_ they are within the subject, but are easily found if they lie outside.
They also do not appear in `subsetByOverlaps` and `findOverlaps` unless you change minoverlap from 1L (default) to 0L.
IMO minoverlap should be set to 0L by default in functions which use it, and distanceToNearest should be able to handle these ranges properly. Example code below.
Out of curiosity: how are IRanges implemented internally? Are indices following Jim Kent "UCSC" convention, or is it internally handled as as in user functions?
a = GRanges(c("a:2-2", "a:4-3", "a:12-13", "a:15-14")) b = GRanges("a:1-10") a GRanges object with 4 ranges and 0 metadata columns: seqnames ranges strand <Rle> <IRanges> <Rle> [1] a [ 2, 2] * [2] a [ 4, 3] * [3] a [12, 13] * [4] a [15, 14] * ------- seqinfo: 1 sequence from an unspecified genome; no seqlengths width(a) [1] 1 0 2 0 b GRanges object with 1 range and 0 metadata columns: seqnames ranges strand <Rle> <IRanges> <Rle> [1] a [1, 10] * ------- seqinfo: 1 sequence from an unspecified genome; no seqlengths distanceToNearest(a,b) Hits object with 3 hits and 1 metadata column: queryHits subjectHits | distance <integer> <integer> | <integer> [1] 1 1 | 0 [2] 3 1 | 1 [3] 4 1 | 4 ------- queryLength: 4 / subjectLength: 1 subsetByOverlaps(a,b) GRanges object with 1 range and 0 metadata columns: seqnames ranges strand <Rle> <IRanges> <Rle> [1] a [2, 2] * ------- seqinfo: 1 sequence from an unspecified genome; no seqlengths subsetByOverlaps(a,b, minoverlap=0) GRanges object with 2 ranges and 0 metadata columns: seqnames ranges strand <Rle> <IRanges> <Rle> [1] a [2, 2] * [2] a [4, 3] * ------- seqinfo: 1 sequence from an unspecified genome; no seqlengths
Hi,
Thanks for the report. The current behavior of
distanceToNearest()
w.r.t. zero-width ranges seems wrong. I'll look into this.FWIW an IRanges object uses 2 parallel slots to represent the ranges: one slot for the 1-based starts, and one slot for the widths.
H.