GenomeInfoDb: where are the genome patches?
Entering edit mode
Last seen 4.3 years ago

I'm trying to implement a function to convert Ensembl chromosome names to UCSC names for many potential input species (i.e. the intersection of species supported by both sources). I saw the `seqlevelStyles` function in GenomeInfoDb, but only the canonical chromosomes are mapped. Why is that? It's kind of funny because the canonical ones can mostly be fixed with a `sub()` call. It's the patches that are really irregular and vexing.

seqnames genomeinfodb • 535 views
Entering edit mode
Last seen 4 hours ago
Seattle, WA, United States

Hi Matt,

I guess supporting the seqlevel mappings for the canonical chromosomes only was the easy thing to do, mainly because the mappings for a given species don't depend on a particular assembly. So the approach taken in GenomeInfoDb was to simply hardcode these mappings in tabulated files (located in inst/extdata/dataFiles). It's a very straightforward approach but, unfortunately, it's an approach that wouldn't easily allow to support mappings of the patches or scaffolds for a given assembly.

FWIW note that fetchExtendedChromInfoFromUCSC() in GenomeInfoDb is one way to get the mapping between NCBI and UCSC seqlevels for all the sequences in a given assembly. It supports only a few assemblies (see ?fetchExtendedChromInfoFromUCSC for the list). It's a work-in-progress and maybe seqlevelsStyles() should use something like this behind the scene. 



Login before adding your answer.

Traffic: 286 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6