Question: GenomeInfoDb's Seqinfo() error on hg38
2
gravatar for Michael Love
10 months ago by
Michael Love26k
United States
Michael Love26k wrote:

I noticed recently that Seqinfo gives an error on hg38:

> Seqinfo(genome="hg38")
Error in FUN(genome = names(SUPPORTED_UCSC_GENOMES)[idx], circ_seqs = supported_genome$circ_seqs,  :
  cannot map the following UCSC seqlevel(s) to an NCBI seqlevel: chr8_KZ208915v1_fix,
  chr15_KN538374v1_fix, chr15_KQ031389v1_alt, chr16_KV880768v1_fix, chr12_KZ208916v1_fix,
  chr14_KZ208920v1_fix, chr7_KZ208913v1_alt, chr5_KV575244v1_fix, chr7_KZ208912v1_fix,
  chr1_KV880763v1_alt, chr12_KN538369v1_fix, chr2_KQ983256v1_alt, chr2_KQ031384v1_fix,
  chr16_KZ559113v1_fix, chr7_KV880765v1_fix, chr1_KQ031383v1_fix, chr1_KN538360v1_fix,
  chr3_KN196475v1_fix, chr4_KV766193v1_alt, chr10_KN538367v1_fix, chr3_KN538364v1_fix,
  chr3_KV766192v1_fix, chr18_KQ090028v1_fix, chr19_KQ458386v1_fix, chr3_KQ031385v1_fix,
  chr19_KN196484v1_fix, chr2_KN538363v1_fix, chr5_KV575243v1_alt, chr13_KN538372v1_fix,
  chr1_KQ458383v1_alt, chr9_KN196479v1_fix, chr1_KZ208906v1_fix, chr6_KQ031387v1_fix,
  chr12_KQ759760v1_fix, chr3_KN196476v1_fix, chr1_KN538361v1_fix, chr11_KZ559108v1_fix,
  chr3_KZ559103v1_alt, chr11_KZ559110v1_alt, chr19_KV575249v1_alt, chr17_KV766196v1_fix,
  chr11_KZ559109v1_fix, chr1_
Calls: Seqinfo ... .fetch_sequence_info_for_UCSC_genome -> fetchExtendedChromInfoFromUCSC -> FUN

The output with hg19:

> Seqinfo(genome="hg19")
Seqinfo object with 93 sequences (1 circular) from hg19 genome:
  seqnames       seqlengths isCircular genome
  chr1            249250621      FALSE   hg19
  chr2            243199373      FALSE   hg19
  chr3            198022430      FALSE   hg19
  chr4            191154276      FALSE   hg19
  chr5            180915260      FALSE   hg19
  ...                   ...        ...    ...
  chrUn_gl000245      36651      FALSE   hg19
  chrUn_gl000246      38154      FALSE   hg19
  chrUn_gl000247      36422      FALSE   hg19
  chrUn_gl000248      39786      FALSE   hg19
  chrUn_gl000249      38502      FALSE   hg19

My session info:

> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS  10.14.2

Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices datasets  utils     methods   base

other attached packages:
[1] GenomeInfoDb_1.18.1 IRanges_2.16.0      S4Vectors_0.20.1    BiocGenerics_0.28.0
[5] testthat_2.0.1      rmarkdown_1.11      devtools_1.13.6

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.0             knitr_1.20             magrittr_1.5           R6_2.3.0
 [5] rlang_0.3.1            stringr_1.4.0          tools_3.5.0            withr_2.1.2
 [9] htmltools_0.3.6        rprojroot_1.3-2        digest_0.6.18          GenomeInfoDbData_1.2.0
[13] BiocManager_1.30.4     bitops_1.0-6           RCurl_1.95-4.11        memoise_1.1.0
[17] evaluate_0.12          stringi_1.2.4          compiler_3.5.0         backports_1.1.3
genomeinfodb • 439 views
ADD COMMENTlink modified 10 months ago by Hervé Pagès ♦♦ 14k • written 10 months ago by Michael Love26k
Answer: GenomeInfoDb's Seqinfo() error on hg38
2
gravatar for Hervé Pagès
10 months ago by
Hervé Pagès ♦♦ 14k
United States
Hervé Pagès ♦♦ 14k wrote:

Hi Mike,

Sequences like chr15_KN538374v1_fix don't belong to GRCh38 but to recent patched versions of GRCh38 (e.g. GRCh38.p12, which is the latest). Unfortunately it looks like the UCSC folks updated the chromInfo table for hg38 a couple of days ago, and that the table now contains chrom info for GRCh38.p12 (or maybe GRCh38.p11). You can see a dump of this SQL table here:

http://hgdownload.cse.ucsc.edu/goldenPath/hg38/database/

The timestamp for the dump indicates that it was updated on Feb 10.

We have a lot of code around (e.g. in GenomeInfoDb, GenomicFeatures, and probably more) that assumes that hg38 is mapped to GRCh38, and not to a patched version of it. The fact that the UCSC folks are changing the reference genome of the hg38 tracks is totally new (I don't remember seeing this for any other organism they support in the last 10 years or so). This a bummer and a game changer.

Also the description they provide for the hg38 browser is misleading:

UCSC Genome Browser assembly ID: hg38
Sequencing/Assembly provider ID: Genome Reference Consortium Human GRCh38 (GCA_000001405.15)
Assembly date: Dec. 2013
Assembly accession: GCA_000001405.15
NCBI Genome ID: 51 (Homo sapiens (human))
NCBI Assembly ID: 883148 (GRCh38, GCA_000001405.15)

It means that from now on we'll have to guess what assembly the hg38 tracks are really based on :-/

I need to think of how to best deal with this in GenomeInfoDb and GenomicFeatures.

H.

ADD COMMENTlink written 10 months ago by Hervé Pagès ♦♦ 14k

Thanks for the quick detective work. Yeah, this is a really useful function in GenomeInfoDb and figured that since I hadn’t updated versions something at the source was the issue.

ADD REPLYlink written 10 months ago by Michael Love26k
1

Should be fixed in GenomeInfoDb 1.19.2 (devel, see commit here) and 1.18.2 (release). These new versions won't show up on the build reports (devel and release) before Wednesday morning.

H.

ADD REPLYlink modified 10 months ago • written 10 months ago by Hervé Pagès ♦♦ 14k

Thanks for the rapid fix, Hervé

ADD REPLYlink written 10 months ago by Michael Love26k

BTW this change was finally announced recently on the Genome Browser's blog:

http://genome.ucsc.edu/blog/patches/

and on their mailing list:

https://groups.google.com/a/soe.ucsc.edu/forum/#!topic/genome-announce/oMMnhnPr6gg

The hg38 gateway page still doesn't say what assembly the hg38 tracks are really based on though:

http://genome.ucsc.edu/cgi-bin/hgGateway?db=hg38

ADD REPLYlink written 9 months ago by Hervé Pagès ♦♦ 14k

Better late than never :-)

ADD REPLYlink written 9 months ago by Michael Love26k

I have a similar issue: https://support.bioconductor.org/p/127017/ Could you please help me?

Thanks, M

ADD REPLYlink written 3 days ago by MT0
1

I replied over on the other thread, update needed, then post back if the issue persists (after restarting R).

https://support.bioconductor.org/p/127017/#127019

ADD REPLYlink written 3 days ago by Michael Love26k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 152 users visited in the last hour