Question: GenomeInfoDb: where are the genome patches?
0
2.2 years ago by
matt.chambers4210 wrote:

I'm trying to implement a function to convert Ensembl chromosome names to UCSC names for many potential input species (i.e. the intersection of species supported by both sources). I saw the seqlevelStyles function in GenomeInfoDb, but only the canonical chromosomes are mapped. Why is that? It's kind of funny because the canonical ones can mostly be fixed with a sub() call. It's the patches that are really irregular and vexing.

genomeinfodb seqnames • 372 views
modified 2.2 years ago by Hervé Pagès ♦♦ 14k • written 2.2 years ago by matt.chambers4210
Answer: GenomeInfoDb: where are the genome patches?
0
2.2 years ago by
Hervé Pagès ♦♦ 14k
United States
Hervé Pagès ♦♦ 14k wrote:

Hi Matt,

I guess supporting the seqlevel mappings for the canonical chromosomes only was the easy thing to do, mainly because the mappings for a given species don't depend on a particular assembly. So the approach taken in GenomeInfoDb was to simply hardcode these mappings in tabulated files (located in inst/extdata/dataFiles). It's a very straightforward approach but, unfortunately, it's an approach that wouldn't easily allow to support mappings of the patches or scaffolds for a given assembly.

FWIW note that fetchExtendedChromInfoFromUCSC() in GenomeInfoDb is one way to get the mapping between NCBI and UCSC seqlevels for all the sequences in a given assembly. It supports only a few assemblies (see ?fetchExtendedChromInfoFromUCSC for the list). It's a work-in-progress and maybe seqlevelsStyles() should use something like this behind the scene.

H.