Information on genomic sequences via Seqinfo failing
1
0
Entering edit mode
@benilton-carvalho-1375
Last seen 4.1 years ago
Brazil/Campinas/UNICAMP

Hi guys,

I've written some code which has been working for a while now... To my surprise, today I noticed it is failing....

library(GenomeInfoDb)
## GenomeInfoDb_1.22.0
## Rversion 3.6.2
Seqinfo(genome="hg19")
Error in FUN(genome = names(SUPPORTED_UCSC_GENOMES)[idx], circ_seqs = supported_genome$circ_seqs,  :
cannot map the following UCSC seqlevel(s) to an NCBI seqlevel:
chr1_jh636052_fix, chrX_jh806600_fix, chrX_jh806587_fix, ....

Any thoughts on how to proceed?

Thank you very much for your time on this,

benilton

R version 3.6.2 (2019-12-12)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Catalina 10.15.3

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
[1] GenomeInfoDb_1.22.0 IRanges_2.20.2      S4Vectors_0.24.3
[4] BiocGenerics_0.32.0

loaded via a namespace (and not attached):
[1] BiocManager_1.30.10    compiler_3.6.2         tools_3.6.2
[4] GenomeInfoDbData_1.2.2 RCurl_1.98-1.1         bitops_1.0-6
GenomeInfoDb • 1.1k views
ADD COMMENT
0
Entering edit mode

i can reproduce it and i think the problem is upstream in fetchExtendedChromInfoFromUCSC():

fetchExtendedChromInfoFromUCSC("hg19")
Error in FUN(genome = names(SUPPORTED_UCSC_GENOMES)[idx], circ_seqs = supported_genome$circ_seqs,  : 
  cannot map the following UCSC seqlevel(s) to an NCBI seqlevel:
  chr1_jh636052_fix, chrX_jh806600_fix, chrX_jh806587_fix,
  chr7_jh159134_fix, chrX_jh159150_fix, chrX_jh806590_fix,
  chr10_jh591181_fix, chr1_jh636053_fix, chr5_gl339449_alt,
  chr14_kb021645_fix, chrX_jh720453_fix, chrX_jh806601_fix,
  chr7_gl582971_fix, chrX_jh806599_fix, chr19_gl949749_alt,
  chr19_gl949750_alt, chr19_gl949748_alt, chr19_kb021647_fix,
  chrX_jh806597_fix, chr10_ke332501_fix, chr19_gl949751_alt,
  chr19_gl949746_alt, chr19_gl949752_alt, chrX_jh806598_fix,
  chrX_jh720451_fix, chrX_jh806591_fix, chr11_jh806581_fix,
  chrX_jh806588_fix, chrX_jh806592_fix, chr19_gl949753_alt,
  chr1_jh636054_fix, chrX_jh720454_fix, chr19_gl949747_alt,
  chr7_jh636058_fix, chrX_jh806602_fix, chr17_gl383561_fix,
  chr8_gl949743_fix, chr2_kb663603_fix, chr19_gl582977_fix,
  chr19_ke332505_fix, chr11_jh159140_fix, chr5_ke332497_fix,
  chr17_gl383560_fix, chrX_jh720452_fix, chr4_ke332496_fix,
  chr6_kb663604_fix, chr
In addition: Warning message:
In FUN(genome = names(SUPPORTED_UCSC_GENOMES)[idx], circ_seqs = supported_genome$circ_seqs,  :
  NCBI seqlevel was set to NA for hg19 UCSC seqlevel(s) not in the NCBI
  assembly: chrM
ADD REPLY
2
Entering edit mode
@herve-pages-1542
Last seen 36 minutes ago
Seattle, WA, United States

Hi Benilton and Robert,

This is UCSC making massive changes to the hg19 genome (it went from 93 sequences to 298). See their announcement here. These changes break fetchExtendedChromInfoFromUCSC(), the workhorse behind Seqinfo(genome="hg19").

This is fixed in GenomeInfoDb 1.23.14 (devel). See https://github.com/Bioconductor/GenomeInfoDb/issues/9. Also BSgenome.Hsapiens.UCSC.hg19 and BSgenome.Hsapiens.UCSC.hg19.masked have been updated.

Not sure when I'll have time to port the fix to release. Not completely straightforward because the fix I came up with in devel is actually for getChromInfoFromUCSC(), the replacement for fetchExtendedChromInfoFromUCSC() that uses a completely different code base.

H.

ADD COMMENT
0
Entering edit mode

It's almost April anyway, so maybe not worth the trouble to fix something that will be replaced very soon.

ADD REPLY
0
Entering edit mode

Thx Jim. I didn't want to say "I won't fix it in release" but you read between the lines. Yes, I don't have much time to dedicate to repair things that break so close to release's EOL (especially when they break because of crazy changes at UCSC). I'd rather use that time to focus on the next release (the devel builds are seriously messed up right now).

H.

ADD REPLY
1
Entering edit mode

Never say never...

OK so this is also fixed in release (BioC 3.11):

hpages@spectre:~/GenomeInfoDb$ git log -n 1
commit a4109ce3ab5a98ba627f2346829a9dbf94b9f927
Author: Hervé Pagès <hpages@fredhutch.org>
Date:   Fri Mar 27 12:34:28 2020 -0700

  Fix Seqinfo(genome="hg19")

  Still returns the 93 sequences of the **original** hg19 (unlike
  in BioC 3.11 where it returns all the new sequences).

Seems like too many things are currently broken in release because of this.

The fix is in GenomeInfoDb 1.22.1.

Cheers,

H.

ADD REPLY

Login before adding your answer.

Traffic: 778 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6