Does there exists a getSeq like function for ranges that wrap around end to start ?
2
0
Entering edit mode
@hauken_heyken-13992
Last seen 5 months ago
Bergen

Hey, in my package ORFik I give the possibility to find ORFs from circular genomes, but getting the sequences from those ranges are not possible as GRanges directly.

Here is the problem and I would like some input into the best way to solve it. I have a way to fix it as I show, but not sure it is the best way of doing it. Just copy the range, instead of downloading ORFik if you want to replicate this:

library(ORFik) # If you want to replicate, just copy the GRanges object, instead of downloading ORFik
# If you want to replicate with ORFik use github devel version: devtools::install_github("ORFik/master")
library(Biostrings)
require(BSgenome.Hsapiens.UCSC.hg38)

mtDNA <- DNAStringSet(BSgenome.Hsapiens.UCSC.hg38$chrM) names(mtDNA) <- "chrM" # Find all orf, allowing ORFs to wrap around from end to beginning (circular genome) orfs <- findORFsFasta(mtDNA, startCodon = "ATG", stopCodon = stopDefinition(2), #2 The Vertebrate Mitochondrial Code is.circular = TRUE) isCircular(orfs) chrM TRUE circular <- which(start(orfs) < 0 | end(orfs) > nchar(mtDNA)) # 1 ORF wraps around seqs <- getSeq(mtDNA, orfs[-circular]) # This works, as there are no wrapping ranges here seqlengths(mtDNA) chrM 16569 circular_orf <- orfs[circular] # the 1 wrapping ORF circular_orf GRanges object with 1 range and 0 metadata columns: seqnames ranges strand <Rle> <IRanges> <Rle> [1] chrM 16567-16578 + ------- seqinfo: 1 sequence (1 circular) from an unspecified genome; no seqlengths overstep <- end(circular_orf) - seqlengths(mtDNA) # It steps 9 outside chromosome end seqs <- getSeq(mtDNA, circular_orf) # Gives error, getSeq does not support circular, does not help to set seqlength Error in h(simpleError(msg, call)) : error in evaluating the argument 'value' in selecting a method for function 'unsplit': some ranges in 'at' are off-limits with respect to their corresponding sequence in 'x' # Now make it work overstep <- end(circular_orf) - seqlengths(mtDNA) new_range <- IRanges(c(start(circular_orf), 1), c(seqlengths(mtDNA), overstep)) circular_orf_that_works <- GRangesList(GRanges("chrM", new_range, "+")) circular_orf_that_works GRangesList object of length 1: [[1]] GRanges object with 2 ranges and 0 metadata columns: seqnames ranges strand <Rle> <IRanges> <Rle> [1] chrM 16567-16569 + [2] chrM 1-9 + ------- seqinfo: 1 sequence from an unspecified genome; no seqlengths extractTranscriptSeqs(mtDNA, circular_orf_that_works) DNAStringSet object of length 1: width seq [1] 12 ATGGATCACAGG  So, any ideas ? Else I thought I could make a getSeq function with an argument circular.wrapping = TRUE, that just wraps it to GRangesList for the user if set to true. Seqlengths/isCircular for all circular chromosomes must of course be set for this to work. circulargenome ORFik getSeq • 975 views ADD COMMENT 0 Entering edit mode Hm, figured out getSeq already works for circular wrapping ranges if your DNA sequences is in BSGenome format, but not as DNAStringSet, which is strange..  library(GenomicRanges) library(BSgenome.Hsapiens.UCSC.hg19) # Use 38 if you have that instead gr <- GRanges("chrMT:16567-16578:+") isCircular(gr) <- TRUE seqlengths(gr) <- 16569 # Load sequence fa.loaded <- DNAStringSet(BSgenome.Hsapiens.UCSC.hg19$chrMT)

getSeq(fa.loaded, gr) # Does not work

isCircular(fa.loaded) # Does not update to TRUE for some reason

getSeq(fa.loaded, gr) # Still does not work


So the question then is, why does not DNAStringSet support to update the isCircular flag ?

1
Entering edit mode
@herve-pages-1542
Last seen 11 hours ago
Seattle, WA, United States

Hi,

DNAStringSet objects predate the introduction of GRanges and Seqinfo objects, so they didn't have any seqinfo capabilities originally (i.e. they didn't support seqinfo(), seqlengths(), isCircular() etc...). At some point, someone decided to add the seqinfo capabilities to these objects but the seqinfo<- and seqinfo methods they provide are inconsistent:

> selectMethod("seqinfo<-", "DNAStringSet")
Method Definition:

function (x, new2old = NULL, pruning.mode = c("error", "coarse",
"fine", "tidy"), value)
{
.local <- function (x, value)
{
metadata(x)$seqinfo <- value x } .local(x, value) } <bytecode: 0x5557c4a00928> <environment: namespace:GenomicRanges> Signatures: x target "DNAStringSet" defined "List" > selectMethod("seqinfo", "DNAStringSet") Method Definition: function (x) { x_names <- names(x) if (is.null(x_names)) x_names <- as.character(seq(length(x))) Seqinfo(x_names, width(x)) } <bytecode: 0x5557d7e80150> <environment: namespace:rtracklayer> Signatures: x target "DNAStringSet" defined "DNAStringSet"  As you can see, the setter stores the Seqinfo object in the metadata slot of the DNAStringSet object, but the getter ignores that and instead recreates the Seqinfo object on-the-fly each time it's called. So after you do: isCircular(fa.loaded) <- TRUE  the internal Seqinfo object gets modified: > metadata(fa.loaded)$seqinfo
Seqinfo object with 1 sequence (1 circular) from an unspecified genome:
seqnames seqlengths isCircular genome
chrMT         16569       TRUE   <NA>


but the seqinfo() getter ignores that.

Another problem is that the seqinfo() getter is defined in rtracklayer but the setter is defined in GenomicRanges. None of these places are good places for these methods. DNAStringSet objects are implemented in Biostrings and, unfortunately, after I load Biostrings, none of these methods are available. If they can't be defined in Biostrings (Biostrings would need to depend on GenomeInfoDb for that), at the very least they should be defined in the same package, maybe BSgenome where the getSeq() method for DNAStringSet objects is also defined.

Finally, I'm not sure that the getSeq() method for DNAStringSet objects is circularity aware. So even if you were able to effectively change the circularity flag of your DNAStringSet object, I don't think that getSeq() would be aware of it.

Would be great if you could open an issue here to report these problems.

Thanks!

0
Entering edit mode
@hauken_heyken-13992
Last seen 5 months ago
Bergen

Great, we have found the solution, so I will accept the answer and continue discussion on the github issues section.