I need to do operations on ranges from a circular sequence (chrM). My ranges sometimes spans the "end-start" gap, but I still would like to do overlap, unions and other operations on them.
I hoped that GenomicRanges could do this since I found a promising "isCircular"-flag. However I am not able create a GenomicRanges-object that has a range covering the end-start boundary. It is interpreted as having a negative width.
I was not able to find any examples with GenomicRanges used in this way, so I wonder if this functionality is implemented, and if so what do I do wrong?
Here is a really simple example where I try to create a GenomicRanges with one range covering the gap.
library(GenomicRanges)
chrMseqinfo = Seqinfo("chrM", 16569, isCircular = TRUE, genome="GRCH38")
df = data.frame(chr="chrM", start=16560, end=10, strand="+")
gr = makeGRangesFromDataFrame(df=df, seqinfo=chrMseqinfo) # error, "negative widths are not allowed"
sum(gr@ranges@width) # should be 20
> sessionInfo()
R version 3.4.2 (2017-09-28)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: OS X El Capitan 10.11.6
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] GenomicRanges_1.30.1 GenomeInfoDb_1.14.0 IRanges_2.12.0 S4Vectors_0.16.0
[5] BiocGenerics_0.24.0
loaded via a namespace (and not attached):
[1] Rcpp_0.12.15 knitr_1.18 bindr_0.1
[4] XVector_0.18.0 magrittr_1.5 zlibbioc_1.24.0
[7] R6_2.2.2 rlang_0.1.6 dplyr_0.7.4
[10] tools_3.4.2 yaml_2.1.16 assertthat_0.2.0
[13] tibble_1.3.4 bindrcpp_0.2 GenomeInfoDbData_1.0.0
[16] purrr_0.2.4 tidyr_0.7.2 bitops_1.0-6
[19] RCurl_1.95-4.10 glue_1.2.0 compiler_3.4.2
[22] pkgconfig_2.0.1

Thanks for you help.
It works now.
I was not aware of the trick of setting the range-end to more than the length of my sequence.
vegard.
Hi, I am stuck again.
I am using the setdiff method on a range crossing the start-end boundary and I am not able to produce correct results. It seems that setdiff discards any information on "the other" side.
Here is a small example, fragmentA crosses the boundary and I try to cut it in half with enzymeA. The part of fragmentA on the other side (more than the length of chrM) is not included.
library(GenomicRanges) chrMseqinfo = Seqinfo("chrM", 16569, isCircular = TRUE, genome="GRCH38") fragmentA = makeGRangesFromDataFrame(df=data.frame(chr="chrM", start=16000, end=17000), seqinfo=chrMseqinfo) enzymeA = makeGRangesFromDataFrame(df=data.frame(chr="chrM", start=16500, end=16504), seqinfo=chrMseqinfo) GenomicRanges::setdiff(fragmentA, enzymeA) GRanges object with 2 ranges and 0 metadata columns: seqnames ranges strand <Rle> <IRanges> <Rle> [1] chrM [16000, 16499] * [2] chrM [16505, 16569] * ------- seqinfo: 1 sequence (1 circular) from GRCH38 genomeThe last range is cut short, I expect the end to be 17000.
For these operations I did not use GNCList since my understanding is that it adds efficiency but no extra functionality. Please correct me if I am wrong.
vegard.