When I need to know the total sizes of chromosomes, I typically use the `BSGenome` package, which provides both metadata (like seqnames(BSgenome)) as well as raw sequence data.
I'm building a package now that needs this metadata information -- but for a package that never needs the raw sequence data and only the metadata, I'd rather not introduce a dependency on the huge BSgenome raw data packages nor on the BSgenome package itself, which has many dependencies; instead, is there a more streamlined way to easily get *just the metadata* for a package that needs only that?
It would have to come from something other than BSgenome. How do you approach this?
Thanks, that could be exactly what I need... out of curiosity, is there a way to get assembly gaps out of GenomeInfoDb the way you can from a BSgenome object? I didn't see that so far...
This is actually a new question so it should be asked as such. This way people can find it when they search our support site.
The GenomeInfoDb package provides no specific tools for retrieving the assembly gaps. If your assembly is supported by the UCSC Genome Browser, then just import the "gap" table as a GRanges object. You can use
getTable()
from the rtracklayer package for this:or query the UCSC SQL server directly:
The 2
gaps
andgaps2
data.frames should contain the same data (possibly in different order). To turn the data.frame into a GRanges object:H.