GenomeInfoDb: where are the genome patches?
1
0
Entering edit mode
@mattchambers42-10186
Last seen 6.9 years ago

I'm trying to implement a function to convert Ensembl chromosome names to UCSC names for many potential input species (i.e. the intersection of species supported by both sources). I saw the `seqlevelStyles` function in GenomeInfoDb, but only the canonical chromosomes are mapped. Why is that? It's kind of funny because the canonical ones can mostly be fixed with a `sub()` call. It's the patches that are really irregular and vexing.

seqnames genomeinfodb • 974 views
ADD COMMENT
0
Entering edit mode
@herve-pages-1542
Last seen 11 hours ago
Seattle, WA, United States

Hi Matt,

I guess supporting the seqlevel mappings for the canonical chromosomes only was the easy thing to do, mainly because the mappings for a given species don't depend on a particular assembly. So the approach taken in GenomeInfoDb was to simply hardcode these mappings in tabulated files (located in inst/extdata/dataFiles). It's a very straightforward approach but, unfortunately, it's an approach that wouldn't easily allow to support mappings of the patches or scaffolds for a given assembly.

FWIW note that fetchExtendedChromInfoFromUCSC() in GenomeInfoDb is one way to get the mapping between NCBI and UCSC seqlevels for all the sequences in a given assembly. It supports only a few assemblies (see ?fetchExtendedChromInfoFromUCSC for the list). It's a work-in-progress and maybe seqlevelsStyles() should use something like this behind the scene. 

H.

ADD COMMENT

Login before adding your answer.

Traffic: 600 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6