How to operate on circular sequences with GenomicRanges?
1
0
Entering edit mode
@vegard-nygaard-6284
Last seen 4.1 years ago
Norway


I need to do operations on ranges from a circular sequence (chrM). My ranges sometimes spans the "end-start" gap, but I still would like to do overlap, unions and other operations on them.

I hoped that GenomicRanges could do this since I found a promising "isCircular"-flag. However I am not able create a GenomicRanges-object that has a range covering the end-start boundary. It is interpreted as having a negative width.

I was not able to find any examples with GenomicRanges used in this way, so I wonder if this functionality is implemented, and if so what do I do wrong?

Here is a really simple example where I try to create a GenomicRanges with one range covering the gap.

library(GenomicRanges)
chrMseqinfo = Seqinfo("chrM", 16569, isCircular = TRUE, genome="GRCH38")
df = data.frame(chr="chrM", start=16560, end=10, strand="+")
gr = makeGRangesFromDataFrame(df=df, seqinfo=chrMseqinfo) # error, "negative widths are not allowed"
sum(gr@ranges@width) # should be 20

 

> sessionInfo()
R version 3.4.2 (2017-09-28)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: OS X El Capitan 10.11.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] GenomicRanges_1.30.1 GenomeInfoDb_1.14.0  IRanges_2.12.0       S4Vectors_0.16.0    
[5] BiocGenerics_0.24.0 

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.15           knitr_1.18             bindr_0.1             
 [4] XVector_0.18.0         magrittr_1.5           zlibbioc_1.24.0       
 [7] R6_2.2.2               rlang_0.1.6            dplyr_0.7.4           
[10] tools_3.4.2            yaml_2.1.16            assertthat_0.2.0      
[13] tibble_1.3.4           bindrcpp_0.2           GenomeInfoDbData_1.0.0
[16] purrr_0.2.4            tidyr_0.7.2            bitops_1.0-6          
[19] RCurl_1.95-4.10        glue_1.2.0             compiler_3.4.2        
[22] pkgconfig_2.0.1

 

GenomicRanges mitochondria circular • 1.5k views
ADD COMMENT
1
Entering edit mode
@valerie-obenchain-4275
Last seen 2.2 years ago
United States

Hi,

The ?GNCList man page in GenomicRanges has examples of overlaps with circular sequences. See the "With Circular Sequences" section at the bottom.

The range to overlap should extend past the end of the circular sequence, not start again at '1'. In your example above the range would start at 16560 and end at 16570.

If you still have questions after reading the man page please post back.

Valerie

ADD COMMENT
0
Entering edit mode

Thanks for you help.

It works now.

I was not aware of the trick of setting the range-end to more than the length of my sequence.

vegard.

 

ADD REPLY
0
Entering edit mode


Hi, I am stuck again.
I am using the setdiff method on a range crossing the start-end boundary and I am not able to produce correct results. It seems that setdiff discards any information on "the other" side.

Here is a small example, fragmentA crosses the boundary and I try to cut it in half with enzymeA. The part of fragmentA on the other side (more than the length of chrM) is not included.

library(GenomicRanges)
chrMseqinfo = Seqinfo("chrM", 16569, isCircular = TRUE, genome="GRCH38")
fragmentA = makeGRangesFromDataFrame(df=data.frame(chr="chrM", start=16000, end=17000), seqinfo=chrMseqinfo)
enzymeA = makeGRangesFromDataFrame(df=data.frame(chr="chrM", start=16500, end=16504), seqinfo=chrMseqinfo)
GenomicRanges::setdiff(fragmentA, enzymeA)

GRanges object with 2 ranges and 0 metadata columns:
      seqnames         ranges strand
         <Rle>      <IRanges>  <Rle>
  [1]     chrM [16000, 16499]      *
  [2]     chrM [16505, 16569]      *
  -------
  seqinfo: 1 sequence (1 circular) from GRCH38 genome
  


The last range is cut short, I expect the end to be 17000.


For these operations I did not use GNCList since my understanding is that it adds efficiency but no extra functionality. Please correct me if I am wrong.

vegard.

ADD REPLY

Login before adding your answer.

Traffic: 879 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6