Question: How to operate on circular sequences with GenomicRanges?
gravatar for Vegard Nygaard
28 days ago by
Vegard Nygaard100 wrote:

I need to do operations on ranges from a circular sequence (chrM). My ranges sometimes spans the "end-start" gap, but I still would like to do overlap, unions and other operations on them.

I hoped that GenomicRanges could do this since I found a promising "isCircular"-flag. However I am not able create a GenomicRanges-object that has a range covering the end-start boundary. It is interpreted as having a negative width.

I was not able to find any examples with GenomicRanges used in this way, so I wonder if this functionality is implemented, and if so what do I do wrong?

Here is a really simple example where I try to create a GenomicRanges with one range covering the gap.

chrMseqinfo = Seqinfo("chrM", 16569, isCircular = TRUE, genome="GRCH38")
df = data.frame(chr="chrM", start=16560, end=10, strand="+")
gr = makeGRangesFromDataFrame(df=df, seqinfo=chrMseqinfo) # error, "negative widths are not allowed"
sum(gr@ranges@width) # should be 20


> sessionInfo()
R version 3.4.2 (2017-09-28)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: OS X El Capitan 10.11.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] GenomicRanges_1.30.1 GenomeInfoDb_1.14.0  IRanges_2.12.0       S4Vectors_0.16.0    
[5] BiocGenerics_0.24.0 

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.15           knitr_1.18             bindr_0.1             
 [4] XVector_0.18.0         magrittr_1.5           zlibbioc_1.24.0       
 [7] R6_2.2.2               rlang_0.1.6            dplyr_0.7.4           
[10] tools_3.4.2            yaml_2.1.16            assertthat_0.2.0      
[13] tibble_1.3.4           bindrcpp_0.2           GenomeInfoDbData_1.0.0
[16] purrr_0.2.4            tidyr_0.7.2            bitops_1.0-6          
[19] RCurl_1.95-4.10        glue_1.2.0             compiler_3.4.2        
[22] pkgconfig_2.0.1


ADD COMMENTlink modified 27 days ago by Valerie Obenchain ♦♦ 6.4k • written 28 days ago by Vegard Nygaard100
gravatar for Valerie Obenchain
27 days ago by
Valerie Obenchain ♦♦ 6.4k
United States
Valerie Obenchain ♦♦ 6.4k wrote:


The ?GNCList man page in GenomicRanges has examples of overlaps with circular sequences. See the "With Circular Sequences" section at the bottom.

The range to overlap should extend past the end of the circular sequence, not start again at '1'. In your example above the range would start at 16560 and end at 16570.

If you still have questions after reading the man page please post back.


ADD COMMENTlink modified 27 days ago • written 27 days ago by Valerie Obenchain ♦♦ 6.4k

Thanks for you help.

It works now.

I was not aware of the trick of setting the range-end to more than the length of my sequence.



ADD REPLYlink written 27 days ago by Vegard Nygaard100

Hi, I am stuck again.
I am using the setdiff method on a range crossing the start-end boundary and I am not able to produce correct results. It seems that setdiff discards any information on "the other" side.

Here is a small example, fragmentA crosses the boundary and I try to cut it in half with enzymeA. The part of fragmentA on the other side (more than the length of chrM) is not included.

chrMseqinfo = Seqinfo("chrM", 16569, isCircular = TRUE, genome="GRCH38")
fragmentA = makeGRangesFromDataFrame(df=data.frame(chr="chrM", start=16000, end=17000), seqinfo=chrMseqinfo)
enzymeA = makeGRangesFromDataFrame(df=data.frame(chr="chrM", start=16500, end=16504), seqinfo=chrMseqinfo)
GenomicRanges::setdiff(fragmentA, enzymeA)

GRanges object with 2 ranges and 0 metadata columns:
      seqnames         ranges strand
         <Rle>      <IRanges>  <Rle>
  [1]     chrM [16000, 16499]      *
  [2]     chrM [16505, 16569]      *
  seqinfo: 1 sequence (1 circular) from GRCH38 genome

The last range is cut short, I expect the end to be 17000.

For these operations I did not use GNCList since my understanding is that it adds efficiency but no extra functionality. Please correct me if I am wrong.


ADD REPLYlink written 26 days ago by Vegard Nygaard100
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 359 users visited in the last hour