GRanges::distanceToNearest does not return distance to nearest
1
0
Entering edit mode
@daniel-cameron-10086
Last seen 2.7 years ago
Australia

distanceToNearest does not respect strand when x is * and subject is stranded. According to the documentation, matches are done in the following manner:

       x  |  subject  |  orientation 
     -----+-----------+----------------
a)     +  |  +        |  ---> 
b)     +  |  -        |  NA
c)     +  |  *        |  --->
d)     -  |  +        |  NA
e)     -  |  -        |  <---
f)     -  |  *        |  <---
g)     *  |  +        |  --->
h)     *  |  -        |  <---
i)     *  |  *        |  --->  (the only situation where * arbitrarily means +)

When I actually test this, I find that

distanceToNearest(GRanges(seqnames=c("chr1", "chr2"), ranges=IRanges(start=55000, width=1), strand="*"), GRanges(seqnames="chr1", ranges=IRanges(start=c(1000, 10000, 1000000, 1000000), width=1),strand=c("+", "-", "+", "-"), select="all"))

distanceToNearest(GRanges(seqnames=c("chr1", "chr2"), ranges=IRanges(start=55000, width=1), strand="*"), GRanges(seqnames="chr1", ranges=IRanges(start=c(1000, 10000, 1000000, 1000000), width=1),strand=c("-", "+", "+", "-"), select="all"))

Both return 44999. No matter the interpretation of the orientation column in the linked documentation (is it x w.r.t subject or subject w.r.t x?), at least one of these examples should return the distance to the subject that starts at 1000000 because both the 1000 and 10000 subject ranges point "away" from position 55000.

GenomicRanges Genomic • 1.2k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 1 day ago
United States

You may be misinterpreting the help page. The part you reference is talking about precede and follow, not distanceToNearest. In addition, there isn't a 'select' argument for distanceToNearest; that argument is also specific to precede and follow.

Also, please note that you have mis-placed parentheses in your example (the select = "all" is being interpreted by R as an argument to the second GRanges call, instead of distanceToNearest). It's often easier to do each step separately so people can more easily interpret your code. Anyway, it seems OK to me:

> gr1 <- GRanges(seqnames=c("chr1", "chr2"), ranges=IRanges(start=55000, width=1), strand="*")
> gr2 <- GRanges(seqnames="chr1", ranges=IRanges(start=c(1000, 10000, 1000000, 1000000), width=1),strand=c("+", "-", "+", "-"))
> gr3 <- GRanges(seqnames="chr1", ranges=IRanges(start=c(1000, 10000, 1000000, 1000000), width=1),strand=c("-", "+", "+", "-"))

> gr1
GRanges object with 2 ranges and 0 metadata columns:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]     chr1     55000      *
  [2]     chr2     55000      *
  -------
  seqinfo: 2 sequences from an unspecified genome; no seqlengths
> gr2
GRanges object with 4 ranges and 0 metadata columns:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]     chr1      1000      +
  [2]     chr1     10000      -
  [3]     chr1   1000000      +
  [4]     chr1   1000000      -
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
> gr3
GRanges object with 4 ranges and 0 metadata columns:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]     chr1      1000      -
  [2]     chr1     10000      +
  [3]     chr1   1000000      +
  [4]     chr1   1000000      -
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

> precede(gr1, gr2, select = "all")
Hits object with 1 hit and 0 metadata columns:
      queryHits subjectHits
      <integer>   <integer>
  [1]         1           2
  -------
  queryLength: 2 / subjectLength: 4
> precede(gr1, gr3, select = "all")
Hits object with 1 hit and 0 metadata columns:
      queryHits subjectHits
      <integer>   <integer>
  [1]         1           1
  -------
  queryLength: 2 / subjectLength: 4

> follow(gr1, gr2, select = "all")
Hits object with 1 hit and 0 metadata columns:
      queryHits subjectHits
      <integer>   <integer>
  [1]         1           1
  -------
  queryLength: 2 / subjectLength: 4
> follow(gr1, gr3, select = "all")
Hits object with 1 hit and 0 metadata columns:
      queryHits subjectHits
      <integer>   <integer>
  [1]         1           2
  -------
  queryLength: 2 / subjectLength: 4
ADD COMMENT
0
Entering edit mode

They're all on the same man page, (nearest-methods {GenomicRanges}). My issue, isn't with follow/precede, it's with distanceToNearest.

Given both precede and follow give different answers for gr2 and gr3, why do they give the same answer for distanceToNearest? Does distanceToNearest completely ignore strand?

    Hits object with 1 hit and 1 metadata column:
      queryHits subjectHits |  distance
      <integer>   <integer> | <integer>
  [1]         1           2 |     44999
  -------
  queryLength: 2 / subjectLength: 4
> distanceToNearest(gr1, gr3)
Hits object with 1 hit and 1 metadata column:
      queryHits subjectHits |  distance
      <integer>   <integer> | <integer>
  [1]         1           2 |     44999
  -------
  queryLength: 2 / subjectLength: 4

Is this a case of me misunderstanding what distanceToNearest actually does? It appears it completely ignores strand. Is this supposed to be the case? My use case is finding the distance to the nearest TSS for a given genomic position. Is this an inappropriate usage for distanceToNearest?

ADD REPLY

Login before adding your answer.

Traffic: 518 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6