error in makeTxDbFromUCSC
1
0
Entering edit mode
@alanchenslm-19586
Last seen 5.3 years ago

Hi GenomicFeatures support,

I am a phD student at the University of Tokyo, using GenomicFeatures for ChIPSeeker. After I run

txdb=makeTxDbFromUCSC(genome="hg19",tablename="refGene")

I got this error.

Download the refGene table ... OK
Download the hgFixed.refLink table ... OK
Extract the 'transcripts' data frame ... OK
Extract the 'splicings' data frame ... OK
Download and preprocess the 'chrominfo' data frame ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... Error in .check_foreign_key(transcripts_tx_chrom, NA, "transcripts$tx_chrom",  : 
  all the values in 'transcripts$tx_chrom' must be present in 'chrominfo$chrom'

I consider the problem is that refGene version was upgraded last November,however, GenomicFeatures haven't done corresponding changes to the new refGene release. Genomefeatures "x" "1" "chr1" "2" "chr1gl000191random" "3" "chr1gl000192random" "4" "chr10" "5" "chr11" "6" "chr11gl000202random" "7" "chr12" "8" "chr13" "9" "chr14" "10" "chr15" "11" "chr16" "12" "chr17" "13" "chr17ctg5hap1" "14" "chr17gl000203random" "15" "chr17gl000204random" "16" "chr17gl000205random" "17" "chr17gl000206random" "18" "chr18" "19" "chr18gl000207random" "20" "chr19" "21" "chr19gl000208random" "22" "chr19gl000209random" "23" "chr2" "24" "chr20" "25" "chr21" "26" "chr21gl000210random" "27" "chr22" "28" "chr3" "29" "chr4" "30" "chr4ctg9hap1" "31" "chr4gl000193random" "32" "chr4gl000194random" "33" "chr5" "34" "chr6" "35" "chr6apdhap1" "36" "chr6coxhap2" "37" "chr6dbbhap3" "38" "chr6mannhap4" "39" "chr6mcfhap5" "40" "chr6qblhap6" "41" "chr6sstohap7" "42" "chr7" "43" "chr7gl000195random" "44" "chr8" "45" "chr8gl000196random" "46" "chr8gl000197random" "47" "chr9" "48" "chr9gl000198random" "49" "chr9gl000199random" "50" "chr9gl000200random" "51" "chr9gl000201random" "52" "chrM" "53" "chrUngl000211" "54" "chrUngl000212" "55" "chrUngl000213" "56" "chrUngl000214" "57" "chrUngl000215" "58" "chrUngl000216" "59" "chrUngl000217" "60" "chrUngl000218" "61" "chrUngl000219" "62" "chrUngl000220" "63" "chrUngl000221" "64" "chrUngl000222" "65" "chrUngl000223" "66" "chrUngl000224" "67" "chrUngl000225" "68" "chrUngl000226" "69" "chrUngl000227" "70" "chrUngl000228" "71" "chrUngl000229" "72" "chrUngl000230" "73" "chrUngl000231" "74" "chrUngl000232" "75" "chrUngl000233" "76" "chrUngl000234" "77" "chrUngl000235" "78" "chrUngl000236" "79" "chrUngl000237" "80" "chrUngl000238" "81" "chrUngl000239" "82" "chrUngl000240" "83" "chrUngl000241" "84" "chrUngl000242" "85" "chrUngl000243" "86" "chrUngl000244" "87" "chrUngl000245" "88" "chrUngl000246" "89" "chrUngl000247" "90" "chrUngl000248" "91" "chrUn_gl000249" "92" "chrX" "93" "chrY"

UCSCrefgene "x" "1" "chr1" "2" "chr1gl000191random" "3" "chr1gl000192random" "4" "chr1gl383519alt" "5" "chr1gl949741fix" "6" "chr1jh636052fix" "7" "chr1jh636054fix" "8" "chr10" "9" "chr10gl383543fix" "10" "chr10jh591181fix" "11" "chr10jh636060fix" "12" "chr11" "13" "chr11gl949744fix" "14" "chr11jh159138fix" "15" "chr11jh159142fix" "16" "chr12" "17" "chr13" "18" "chr14" "19" "chr14kb021645fix" "20" "chr15" "21" "chr16" "22" "chr17" "23" "chr17ctg5hap1" "24" "chr17gl000205random" "25" "chr17gl383560fix" "26" "chr17gl582976fix" "27" "chr17jh159145fix" "28" "chr18" "29" "chr18gl383571alt" "30" "chr19" "31" "chr19gl000209random" "32" "chr19gl383575alt" "33" "chr19gl582977fix" "34" "chr19gl949746alt" "35" "chr19gl949747alt" "36" "chr19gl949748alt" "37" "chr19gl949749alt" "38" "chr19gl949750alt" "39" "chr19gl949751alt" "40" "chr19gl949752alt" "41" "chr19gl949753alt" "42" "chr19jh159149fix" "43" "chr19kb021647fix" "44" "chr2" "45" "chr2kb663603fix" "46" "chr20" "47" "chr20gl582979fix" "48" "chr21" "49" "chr21ke332506fix" "50" "chr22" "51" "chr22gl383582alt" "52" "chr22jh720449fix" "53" "chr3" "54" "chr3gl383523fix" "55" "chr3jh159132fix" "56" "chr4" "57" "chr4ctg9hap1" "58" "chr4gl000193random" "59" "chr4gl000194random" "60" "chr4gl877872fix" "61" "chr4ke332496fix" "62" "chr5" "63" "chr5gl339449alt" "64" "chr5jh159133fix" "65" "chr5ke332497fix" "66" "chr6" "67" "chr6apdhap1" "68" "chr6coxhap2" "69" "chr6dbbhap3" "70" "chr6jh636056fix" "71" "chr6kb663604fix" "72" "chr6mannhap4" "73" "chr6mcfhap5" "74" "chr6qblhap6" "75" "chr6sstohap7" "76" "chr7" "77" "chr7gl000195random" "78" "chr7gl582971fix" "79" "chr7jh159134fix" "80" "chr8" "81" "chr8gl383535fix" "82" "chr8gl383536fix" "83" "chr9" "84" "chr9gl339450fix" "85" "chrM" "86" "chrUngl000211" "87" "chrUngl000212" "88" "chrUngl000213" "89" "chrUngl000215" "90" "chrUngl000218" "91" "chrUngl000219" "92" "chrUngl000220" "93" "chrUngl000222" "94" "chrUngl000223" "95" "chrUngl000224" "96" "chrUngl000227" "97" "chrUngl000228" "98" "chrUngl000241" "99" "chrX" "100" "chrXjh159150fix" "101" "chrXjh806587fix" "102" "chrXjh806590fix" "103" "chrXjh806593fix" "104" "chrXjh806594fix" "105" "chrXjh806595fix" "106" "chrXjh806597fix" "107" "chrXjh806599fix" "108" "chrXjh806600fix" "109" "chrXjh806601fix" "110" "chrXkb021648_fix" "111" "chrY"

If anyone have any clues, please let me know. Your help is much appreciated. Thank you so much!

sessionInfo()
R version 3.5.2 (2018-12-20)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17134)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] cowplot_0.9.4          reshape_0.8.8          ggplot2_3.1.0          clusterProfiler_3.10.1 GenomicFeatures_1.34.1
 [6] GenomicRanges_1.34.0   GenomeInfoDb_1.18.1    org.Hs.eg.db_3.7.0     AnnotationDbi_1.44.0   IRanges_2.16.0        
[11] S4Vectors_0.20.1       Biobase_2.42.0         BiocGenerics_0.28.0    ChIPseeker_1.18.0
genomicfeatures R bioconductor maketxdbfromucsc • 3.7k views
ADD COMMENT
2
Entering edit mode

Please resist the temptation to post in multiple locations (I think you got them all, here, bioc-devel, GitHub, and the maintainer email address)!

This seems to have been reported before https://support.bioconductor.org/p/114901/ https://support.bioconductor.org/p/107839/ .

We'll work on this over the next several days.

ADD REPLY
0
Entering edit mode

Hi,Martin Sorry for posting in multiple locations. I will cancel posting in other places. Lets keep the discussion here. Thank you for trying to help me out. If there is any progress, please let me know. Thank you for your time and help again.

ADD REPLY
0
Entering edit mode

Do NOT post in multiple places; all locations are monitored by the same people.

ADD REPLY
0
Entering edit mode

I have had a look at this, and certainly confirm the error event with devel branch. A similar error occurs with the request for refGene with hg38.

Browse[6]> where
where 1: .check_foreign_key(transcripts_tx_chrom, NA, "transcripts$tx_chrom", 
    chrominfo$chrom, NA, "chrominfo$chrom")
where 2: .makeTxDb_normarg_chrominfo(chrominfo, transcripts$tx_chrom, 
    splicings$exon_chrom)
where 3: makeTxDb(transcripts, splicings, genes = genes, chrominfo = chrominfo, 
    metadata = metadata, reassign.ids = TRUE)
where 4: .makeTxDbFromUCSCTxTable(ucsc_txtable, txname2geneid$genes, genome, 
    tablename, track, txname2geneid$gene_id_type, full_dataset = is.null(transcript_ids), 
    circ_seqs = circ_seqs, goldenPath_url = goldenPath_url, taxonomyId = taxonomyId, 
    miRBaseBuild = miRBaseBuild)
where 5: makeTxDbFromUCSC(genome = "hg19", tablename = "refGene")

Browse[6]> length(setdiff(referring_vals, referred_vals))
[1] 57
Browse[6]> setdiff(referring_vals, referred_vals)
 [1] "chr19_gl949749_alt" "chr19_gl949746_alt" "chr17_gl582976_fix"
 [4] "chr17_jh159145_fix" "chr11_jh159138_fix" "chr5_jh159133_fix" 
...

for hg19, .fetchUCSCtxtable returns a table with 111 unique values for chrom

BUT

Browse[4]> GenomeInfoDb:::fetch_ChromInfo_from_UCSC
function (genome, goldenPath_url = "http://hgdownload.cse.ucsc.edu/goldenPath") 
{
    url <- paste(goldenPath_url, genome, "database/chromInfo.txt.gz", 
        sep = "/")
    destfile <- tempfile()
    download.file(url, destfile, quiet = TRUE)

has entries for only 93 'chromosomes' when genome == "hg19". So the real problem seems to be synchronization upstream. However it should be possible to devise a soft landing for this event?

ADD REPLY
0
Entering edit mode

Hi Vincent, thank you for taking a look! I still no idea what I can do. Actually, I am a medical student and don't know much about programming. If you know any clues for dealing with this problem, please let me know! Thanks so much for your time and help, much appreciated!

ADD REPLY
3
Entering edit mode
@herve-pages-1542
Last seen 9 hours ago
Seattle, WA, United States

Thanks for the report.

The refGene tables in UCSC databases hg19 and hg38 were last updated in Nov 2018 and now contain transcripts located on sequences that do NOT belong to the corresponding genomes (GRCh37 and GRCh38, respectively). More precisely some transcripts in these tables now belong to patched versions of these genomes: GRCh37.p13 for hg19 and GRCh38.p11 for hg38. Note that this also causes errors on the Genome Browser itself e.g. if you go to https://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19 , enter transcript NM_001910 in the search box, click on GO, then click on the NM_001910 at chr1_jh636054_fix:118-14749 link, you'll get the following error:

Sorry, couldn't locate chr1_jh636054_fix:118-14749 in Human Feb. 2009 (GRCh37/hg19)

I just committed a fix to GenomicFeatures. The fix is to drop these foreign transcripts with a warning. For example calling makeTxDbFromUCSC(genome="hg38", tablename="refGene") now displays the following warning message:

  113 transcripts were dropped because they are on unknown sequences
  (e.g. transcripts NM_024081, NM_001001437, NM_012101, NR_146066, ...)

The fix is in GenomicFeatures 1.35.6 (master branch, see fix here) and GenomicFeatures 1.34.3 (RELEASE_3_8 branch).

These 2 new versions of GenomicFeatures should become available via BiocManager::install() in the next 36 hours or so.

Cheers, H.

ADD COMMENT

Login before adding your answer.

Traffic: 845 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6