Search
Question: How to operate on circular sequences with GenomicRanges?
0
gravatar for Vegard Nygaard
8 months ago by
Norway
Vegard Nygaard110 wrote:


I need to do operations on ranges from a circular sequence (chrM). My ranges sometimes spans the "end-start" gap, but I still would like to do overlap, unions and other operations on them.

I hoped that GenomicRanges could do this since I found a promising "isCircular"-flag. However I am not able create a GenomicRanges-object that has a range covering the end-start boundary. It is interpreted as having a negative width.

I was not able to find any examples with GenomicRanges used in this way, so I wonder if this functionality is implemented, and if so what do I do wrong?

Here is a really simple example where I try to create a GenomicRanges with one range covering the gap.

library(GenomicRanges)
chrMseqinfo = Seqinfo("chrM", 16569, isCircular = TRUE, genome="GRCH38")
df = data.frame(chr="chrM", start=16560, end=10, strand="+")
gr = makeGRangesFromDataFrame(df=df, seqinfo=chrMseqinfo) # error, "negative widths are not allowed"
sum(gr@ranges@width) # should be 20

 

> sessionInfo()
R version 3.4.2 (2017-09-28)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: OS X El Capitan 10.11.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] GenomicRanges_1.30.1 GenomeInfoDb_1.14.0  IRanges_2.12.0       S4Vectors_0.16.0    
[5] BiocGenerics_0.24.0 

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.15           knitr_1.18             bindr_0.1             
 [4] XVector_0.18.0         magrittr_1.5           zlibbioc_1.24.0       
 [7] R6_2.2.2               rlang_0.1.6            dplyr_0.7.4           
[10] tools_3.4.2            yaml_2.1.16            assertthat_0.2.0      
[13] tibble_1.3.4           bindrcpp_0.2           GenomeInfoDbData_1.0.0
[16] purrr_0.2.4            tidyr_0.7.2            bitops_1.0-6          
[19] RCurl_1.95-4.10        glue_1.2.0             compiler_3.4.2        
[22] pkgconfig_2.0.1

 

ADD COMMENTlink modified 8 months ago by Valerie Obenchain ♦♦ 6.6k • written 8 months ago by Vegard Nygaard110
1
gravatar for Valerie Obenchain
8 months ago by
Valerie Obenchain ♦♦ 6.6k
United States
Valerie Obenchain ♦♦ 6.6k wrote:

Hi,

The ?GNCList man page in GenomicRanges has examples of overlaps with circular sequences. See the "With Circular Sequences" section at the bottom.

The range to overlap should extend past the end of the circular sequence, not start again at '1'. In your example above the range would start at 16560 and end at 16570.

If you still have questions after reading the man page please post back.

Valerie

ADD COMMENTlink modified 8 months ago • written 8 months ago by Valerie Obenchain ♦♦ 6.6k

Thanks for you help.

It works now.

I was not aware of the trick of setting the range-end to more than the length of my sequence.

vegard.

 

ADD REPLYlink written 8 months ago by Vegard Nygaard110


Hi, I am stuck again.
I am using the setdiff method on a range crossing the start-end boundary and I am not able to produce correct results. It seems that setdiff discards any information on "the other" side.

Here is a small example, fragmentA crosses the boundary and I try to cut it in half with enzymeA. The part of fragmentA on the other side (more than the length of chrM) is not included.

library(GenomicRanges)
chrMseqinfo = Seqinfo("chrM", 16569, isCircular = TRUE, genome="GRCH38")
fragmentA = makeGRangesFromDataFrame(df=data.frame(chr="chrM", start=16000, end=17000), seqinfo=chrMseqinfo)
enzymeA = makeGRangesFromDataFrame(df=data.frame(chr="chrM", start=16500, end=16504), seqinfo=chrMseqinfo)
GenomicRanges::setdiff(fragmentA, enzymeA)

GRanges object with 2 ranges and 0 metadata columns:
      seqnames         ranges strand
         <Rle>      <IRanges>  <Rle>
  [1]     chrM [16000, 16499]      *
  [2]     chrM [16505, 16569]      *
  -------
  seqinfo: 1 sequence (1 circular) from GRCH38 genome
  


The last range is cut short, I expect the end to be 17000.


For these operations I did not use GNCList since my understanding is that it adds efficiency but no extra functionality. Please correct me if I am wrong.

vegard.

ADD REPLYlink written 8 months ago by Vegard Nygaard110
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 245 users visited in the last hour