Handling missing seqlenghts
0
0
Entering edit mode
@charles-plessy-7857
Last seen 4 hours ago
Japan

Sometimes a GRanges object has no seqlengths because the information is not available. As a last resort solution, I set the seqlenghts to the maximal end value for each seqlevel.

if (all(is.na(seqlengths(gr))))
seqlengths(gr) <- tapply(end(gr), seqnames(gr), max) |> as.vector()


To make my code more readable, I was wondering if there is a core bioconductor function that does the same in a more idiomatic way, but I could not find…

GenomicRanges • 125 views
0
Entering edit mode

Hi Charles,

Out of curiosity, what's the purpose of doing this?

Thanks,

H.

0
Entering edit mode

Hi Hervé,

I made a function that concatenates twe seqlevels (https://oist.github.io/GenomicBreaks/reference/mergeSeqLevels.html, link to the source near the top of the page); I use it to scaffold some contigs by hand in some exploratory analyses, and we also use it to make whole-genome plots.

I also made a function that "reverse-complements" GRanges objects by inverting the roles of the _plus_ and _minus_ strands and recalculating coordinates accordingly. This function also needs seqlengths, and as a last resort I would like to support the case where they are missing. (https://oist.github.io/GenomicBreaks/reference/reverse.html).

I am aligning pairs of related genomes. Some have a BSgenome package and some don't. I transfer the seqinfo of the BSgenome package to the GRanges object representing the aligned regions. For the genomes that I study, I create local BSgenome packages by hand. But in other cases this is not practical, for example in the test case that I use in the package's documentation (https://oist.github.io/GenomicBreaks/articles/GenomicBreaks.html) where I load the alignment of _SacCer3_ (BSgenome available) to _Saccharomyces paradoxus_ version ASM207905v1 (no BSgenome). Approximating sequlength to max(end) is enough in that case, where the purpose is regression test and concept presentation.

By the way, maybe you remember my question in 2020 about representing pairwise genome alignments using Bioc objects. In the end, the most doable for me was to use a "GRanges-in-GRanges" approach (instead of the GRangePairs approach in CNEr), so that I can benefit from all the GRanges functions for free on at least one of the genomes. The _GenomicBreaks_ package is the current state of my work on that. Although the pkgdown site looks pretty, I do not know if I will go as far as submitting it to Bioconductor unless it is clear the package would be useful for others. I plan to ask for comments in the future, but if you have some, please do not hesitate.