Nearest genomic ranges and distances in circular ranges
1
0
Entering edit mode
@adrian-geissler-16504
Last seen 17 months ago
Denmark

Hej, it is unclear to me how to correctly use the GenomicRanges package for circular genomes, and I failed to find an explanation in vignettes. Assume the following example, in which some intervals are due to eg. shift equal but have a different interval.

library(tidyverse)
library(GenomicRanges)
foo <- GRanges('foo', IRanges(c(-1, 1, 4, 9, 11), width = 5), strand='+')
seqinfo(foo) <- Seqinfo(seqnames = 'foo', isCircular = TRUE, seqlengths = 10)
foo$follow <- follow(foo)
foo$precede <- precede(foo)
findOverlaps(foo, foo, type = 'equal') %>%
  subset(queryHits != subjectHits)
foo

Here, the pairs have different follow/precede rsults, although findOverlaps identified them as identical.

> foo
GRanges object with 5 ranges and 2 metadata columns:
      seqnames    ranges strand |    follow   precede
         <Rle> <IRanges>  <Rle> | <integer> <integer>
  [1]      foo      -1-3      + |      <NA>         3
  [2]      foo       1-5      + |      <NA>         4
  [3]      foo       4-8      + |         1         4
  [4]      foo      9-13      + |         3      <NA>
  [5]      foo     11-15      + |         3      <NA>

Here, the pairs are (1, 4) and (2, 5). These interval are equal yet they have non-zero distances

> distanceToNearest(foo, foo, select='all') %>%
+   as.tibble %>%
+   spread(subjectHits, distance)
# A tibble: 5 x 6
  queryHits   `1`   `2`   `3`   `4`   `5`
      <int> <int> <int> <int> <int> <int>
1         1     0     0     0     5     7
2         2     0     0     0     3     5
3         3     0     0     0     0     2
4         4     5     3     0     0     0
5         5     7     5     2     0     0

What would be the correct approach for nearest neighbor/distances in bioconductor? Thank you in advance.

genomicranges • 1.0k views
ADD COMMENT
0
Entering edit mode
@valerie-obenchain-4275
Last seen 2.2 years ago
United States

Hi Adrian,

Thanks for the clear example. It looks like precede(), follow() and the nearest() family don't support circular chromosomes. This is was probably my oversight when these were implemented and it will be fixed soon.

I'm editing my answer because the helpers I suggested won't work on all situations. We are working on a solution and I'll post back when it's implemented.

Valerie

ADD COMMENT
0
Entering edit mode

Hi,

We've discussed this and it's not entirely clear what precede and follow mean for circular chromosomes. Should chromosome length be interpreted as the stop point in this case? For example, the current precede, follow have a select=c("first", "all") argument. What would be returned when select=all?

distance, nearest etc. are well defined and will be implemented before the next release in October - sooner if possible.

Valerie

ADD REPLY
0
Entering edit mode

I've started a github issue for this:

https://github.com/Bioconductor/GenomicRanges/issues/14

ADD REPLY

Login before adding your answer.

Traffic: 781 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6