Entering edit mode
Sometimes a GRanges
object has no seqlengths
because the information is not available. As a last resort solution, I set the seqlenghts
to the maximal end value for each seqlevel
.
if (all(is.na(seqlengths(gr))))
seqlengths(gr) <- tapply(end(gr), seqnames(gr), max) |> as.vector()
To make my code more readable, I was wondering if there is a core bioconductor function that does the same in a more idiomatic way, but I could not find…
Hi Charles,
Out of curiosity, what's the purpose of doing this?
Thanks,
H.
Hi Hervé,
I made a function that concatenates twe seqlevels (https://oist.github.io/GenomicBreaks/reference/mergeSeqLevels.html, link to the source near the top of the page); I use it to scaffold some contigs by hand in some exploratory analyses, and we also use it to make whole-genome plots.
I also made a function that "reverse-complements"
GRanges
objects by inverting the roles of the _plus_ and _minus_ strands and recalculating coordinates accordingly. This function also needs seqlengths, and as a last resort I would like to support the case where they are missing. (https://oist.github.io/GenomicBreaks/reference/reverse.html).I am aligning pairs of related genomes. Some have a BSgenome package and some don't. I transfer the seqinfo of the BSgenome package to the GRanges object representing the aligned regions. For the genomes that I study, I create local BSgenome packages by hand. But in other cases this is not practical, for example in the test case that I use in the package's documentation (https://oist.github.io/GenomicBreaks/articles/GenomicBreaks.html) where I load the alignment of _SacCer3_ (BSgenome available) to _Saccharomyces paradoxus_ version ASM207905v1 (no BSgenome). Approximating
sequlength
tomax(end)
is enough in that case, where the purpose is regression test and concept presentation.By the way, maybe you remember my question in 2020 about representing pairwise genome alignments using Bioc objects. In the end, the most doable for me was to use a "GRanges-in-GRanges" approach (instead of the
GRangePairs
approach inCNEr
), so that I can benefit from all theGRanges
functions for free on at least one of the genomes. The _GenomicBreaks_ package is the current state of my work on that. Although the pkgdown site looks pretty, I do not know if I will go as far as submitting it to Bioconductor unless it is clear the package would be useful for others. I plan to ask for comments in the future, but if you have some, please do not hesitate.