How to operate on circular sequences with GenomicRanges?
Entering edit mode
Last seen 4.2 years ago

I need to do operations on ranges from a circular sequence (chrM). My ranges sometimes spans the "end-start" gap, but I still would like to do overlap, unions and other operations on them.

I hoped that GenomicRanges could do this since I found a promising "isCircular"-flag. However I am not able create a GenomicRanges-object that has a range covering the end-start boundary. It is interpreted as having a negative width.

I was not able to find any examples with GenomicRanges used in this way, so I wonder if this functionality is implemented, and if so what do I do wrong?

Here is a really simple example where I try to create a GenomicRanges with one range covering the gap.

chrMseqinfo = Seqinfo("chrM", 16569, isCircular = TRUE, genome="GRCH38")
df = data.frame(chr="chrM", start=16560, end=10, strand="+")
gr = makeGRangesFromDataFrame(df=df, seqinfo=chrMseqinfo) # error, "negative widths are not allowed"
sum(gr@ranges@width) # should be 20


> sessionInfo()
R version 3.4.2 (2017-09-28)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: OS X El Capitan 10.11.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] GenomicRanges_1.30.1 GenomeInfoDb_1.14.0  IRanges_2.12.0       S4Vectors_0.16.0    
[5] BiocGenerics_0.24.0 

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.15           knitr_1.18             bindr_0.1             
 [4] XVector_0.18.0         magrittr_1.5           zlibbioc_1.24.0       
 [7] R6_2.2.2               rlang_0.1.6            dplyr_0.7.4           
[10] tools_3.4.2            yaml_2.1.16            assertthat_0.2.0      
[13] tibble_1.3.4           bindrcpp_0.2           GenomeInfoDbData_1.0.0
[16] purrr_0.2.4            tidyr_0.7.2            bitops_1.0-6          
[19] RCurl_1.95-4.10        glue_1.2.0             compiler_3.4.2        
[22] pkgconfig_2.0.1


GenomicRanges mitochondria circular • 1.5k views
Entering edit mode
Last seen 2.4 years ago
United States


The ?GNCList man page in GenomicRanges has examples of overlaps with circular sequences. See the "With Circular Sequences" section at the bottom.

The range to overlap should extend past the end of the circular sequence, not start again at '1'. In your example above the range would start at 16560 and end at 16570.

If you still have questions after reading the man page please post back.


Entering edit mode

Thanks for you help.

It works now.

I was not aware of the trick of setting the range-end to more than the length of my sequence.



Entering edit mode

Hi, I am stuck again.
I am using the setdiff method on a range crossing the start-end boundary and I am not able to produce correct results. It seems that setdiff discards any information on "the other" side.

Here is a small example, fragmentA crosses the boundary and I try to cut it in half with enzymeA. The part of fragmentA on the other side (more than the length of chrM) is not included.

chrMseqinfo = Seqinfo("chrM", 16569, isCircular = TRUE, genome="GRCH38")
fragmentA = makeGRangesFromDataFrame(df=data.frame(chr="chrM", start=16000, end=17000), seqinfo=chrMseqinfo)
enzymeA = makeGRangesFromDataFrame(df=data.frame(chr="chrM", start=16500, end=16504), seqinfo=chrMseqinfo)
GenomicRanges::setdiff(fragmentA, enzymeA)

GRanges object with 2 ranges and 0 metadata columns:
      seqnames         ranges strand
         <Rle>      <IRanges>  <Rle>
  [1]     chrM [16000, 16499]      *
  [2]     chrM [16505, 16569]      *
  seqinfo: 1 sequence (1 circular) from GRCH38 genome

The last range is cut short, I expect the end to be 17000.

For these operations I did not use GNCList since my understanding is that it adds efficiency but no extra functionality. Please correct me if I am wrong.



Login before adding your answer.

Traffic: 346 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6