Question: GenomeInfoDb's Seqinfo() error on hg38
gravatar for Michael Love
9 weeks ago by
Michael Love22k
United States
Michael Love22k wrote:

I noticed recently that Seqinfo gives an error on hg38:

> Seqinfo(genome="hg38")
Error in FUN(genome = names(SUPPORTED_UCSC_GENOMES)[idx], circ_seqs = supported_genome$circ_seqs,  :
  cannot map the following UCSC seqlevel(s) to an NCBI seqlevel: chr8_KZ208915v1_fix,
  chr15_KN538374v1_fix, chr15_KQ031389v1_alt, chr16_KV880768v1_fix, chr12_KZ208916v1_fix,
  chr14_KZ208920v1_fix, chr7_KZ208913v1_alt, chr5_KV575244v1_fix, chr7_KZ208912v1_fix,
  chr1_KV880763v1_alt, chr12_KN538369v1_fix, chr2_KQ983256v1_alt, chr2_KQ031384v1_fix,
  chr16_KZ559113v1_fix, chr7_KV880765v1_fix, chr1_KQ031383v1_fix, chr1_KN538360v1_fix,
  chr3_KN196475v1_fix, chr4_KV766193v1_alt, chr10_KN538367v1_fix, chr3_KN538364v1_fix,
  chr3_KV766192v1_fix, chr18_KQ090028v1_fix, chr19_KQ458386v1_fix, chr3_KQ031385v1_fix,
  chr19_KN196484v1_fix, chr2_KN538363v1_fix, chr5_KV575243v1_alt, chr13_KN538372v1_fix,
  chr1_KQ458383v1_alt, chr9_KN196479v1_fix, chr1_KZ208906v1_fix, chr6_KQ031387v1_fix,
  chr12_KQ759760v1_fix, chr3_KN196476v1_fix, chr1_KN538361v1_fix, chr11_KZ559108v1_fix,
  chr3_KZ559103v1_alt, chr11_KZ559110v1_alt, chr19_KV575249v1_alt, chr17_KV766196v1_fix,
  chr11_KZ559109v1_fix, chr1_
Calls: Seqinfo ... .fetch_sequence_info_for_UCSC_genome -> fetchExtendedChromInfoFromUCSC -> FUN

The output with hg19:

> Seqinfo(genome="hg19")
Seqinfo object with 93 sequences (1 circular) from hg19 genome:
  seqnames       seqlengths isCircular genome
  chr1            249250621      FALSE   hg19
  chr2            243199373      FALSE   hg19
  chr3            198022430      FALSE   hg19
  chr4            191154276      FALSE   hg19
  chr5            180915260      FALSE   hg19
  ...                   ...        ...    ...
  chrUn_gl000245      36651      FALSE   hg19
  chrUn_gl000246      38154      FALSE   hg19
  chrUn_gl000247      36422      FALSE   hg19
  chrUn_gl000248      39786      FALSE   hg19
  chrUn_gl000249      38502      FALSE   hg19

My session info:

> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS  10.14.2

Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices datasets  utils     methods   base

other attached packages:
[1] GenomeInfoDb_1.18.1 IRanges_2.16.0      S4Vectors_0.20.1    BiocGenerics_0.28.0
[5] testthat_2.0.1      rmarkdown_1.11      devtools_1.13.6

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.0             knitr_1.20             magrittr_1.5           R6_2.3.0
 [5] rlang_0.3.1            stringr_1.4.0          tools_3.5.0            withr_2.1.2
 [9] htmltools_0.3.6        rprojroot_1.3-2        digest_0.6.18          GenomeInfoDbData_1.2.0
[13] BiocManager_1.30.4     bitops_1.0-6           RCurl_1.95-4.11        memoise_1.1.0
[17] evaluate_0.12          stringi_1.2.4          compiler_3.5.0         backports_1.1.3
genomeinfodb • 203 views
ADD COMMENTlink modified 9 weeks ago by Hervé Pagès ♦♦ 13k • written 9 weeks ago by Michael Love22k
Answer: GenomeInfoDb's Seqinfo() error on hg38
gravatar for Hervé Pagès
9 weeks ago by
Hervé Pagès ♦♦ 13k
United States
Hervé Pagès ♦♦ 13k wrote:

Hi Mike,

Sequences like chr15_KN538374v1_fix don't belong to GRCh38 but to recent patched versions of GRCh38 (e.g. GRCh38.p12, which is the latest). Unfortunately it looks like the UCSC folks updated the chromInfo table for hg38 a couple of days ago, and that the table now contains chrom info for GRCh38.p12 (or maybe GRCh38.p11). You can see a dump of this SQL table here:

The timestamp for the dump indicates that it was updated on Feb 10.

We have a lot of code around (e.g. in GenomeInfoDb, GenomicFeatures, and probably more) that assumes that hg38 is mapped to GRCh38, and not to a patched version of it. The fact that the UCSC folks are changing the reference genome of the hg38 tracks is totally new (I don't remember seeing this for any other organism they support in the last 10 years or so). This a bummer and a game changer.

Also the description they provide for the hg38 browser is misleading:

UCSC Genome Browser assembly ID: hg38
Sequencing/Assembly provider ID: Genome Reference Consortium Human GRCh38 (GCA_000001405.15)
Assembly date: Dec. 2013
Assembly accession: GCA_000001405.15
NCBI Genome ID: 51 (Homo sapiens (human))
NCBI Assembly ID: 883148 (GRCh38, GCA_000001405.15)

It means that from now on we'll have to guess what assembly the hg38 tracks are really based on :-/

I need to think of how to best deal with this in GenomeInfoDb and GenomicFeatures.


ADD COMMENTlink written 9 weeks ago by Hervé Pagès ♦♦ 13k

Thanks for the quick detective work. Yeah, this is a really useful function in GenomeInfoDb and figured that since I hadn’t updated versions something at the source was the issue.

ADD REPLYlink written 9 weeks ago by Michael Love22k

Should be fixed in GenomeInfoDb 1.19.2 (devel, see commit here) and 1.18.2 (release). These new versions won't show up on the build reports (devel and release) before Wednesday morning.


ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by Hervé Pagès ♦♦ 13k

Thanks for the rapid fix, Hervé

ADD REPLYlink written 9 weeks ago by Michael Love22k

BTW this change was finally announced recently on the Genome Browser's blog:

and on their mailing list:!topic/genome-announce/oMMnhnPr6gg

The hg38 gateway page still doesn't say what assembly the hg38 tracks are really based on though:

ADD REPLYlink written 6 weeks ago by Hervé Pagès ♦♦ 13k

Better late than never :-)

ADD REPLYlink written 6 weeks ago by Michael Love22k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 136 users visited in the last hour