The support.bioconductor.org editor has been updated to markdown! Please see more info at: Tutorial: Updated Support Site Editor

Question: error in makeTxDbFromUCSC
0
gravatar for alanchenslm
28 days ago by
alanchenslm0 wrote:

Hi GenomicFeatures support,

I am a phD student at the University of Tokyo, using GenomicFeatures for ChIPSeeker. After I run

txdb=makeTxDbFromUCSC(genome="hg19",tablename="refGene")

I got this error.

Download the refGene table ... OK
Download the hgFixed.refLink table ... OK
Extract the 'transcripts' data frame ... OK
Extract the 'splicings' data frame ... OK
Download and preprocess the 'chrominfo' data frame ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... Error in .check_foreign_key(transcripts_tx_chrom, NA, "transcripts$tx_chrom",  : 
  all the values in 'transcripts$tx_chrom' must be present in 'chrominfo$chrom'

I consider the problem is that refGene version was upgraded last November,however, GenomicFeatures haven't done corresponding changes to the new refGene release. Genomefeatures "x" "1" "chr1" "2" "chr1gl000191random" "3" "chr1gl000192random" "4" "chr10" "5" "chr11" "6" "chr11gl000202random" "7" "chr12" "8" "chr13" "9" "chr14" "10" "chr15" "11" "chr16" "12" "chr17" "13" "chr17ctg5hap1" "14" "chr17gl000203random" "15" "chr17gl000204random" "16" "chr17gl000205random" "17" "chr17gl000206random" "18" "chr18" "19" "chr18gl000207random" "20" "chr19" "21" "chr19gl000208random" "22" "chr19gl000209random" "23" "chr2" "24" "chr20" "25" "chr21" "26" "chr21gl000210random" "27" "chr22" "28" "chr3" "29" "chr4" "30" "chr4ctg9hap1" "31" "chr4gl000193random" "32" "chr4gl000194random" "33" "chr5" "34" "chr6" "35" "chr6apdhap1" "36" "chr6coxhap2" "37" "chr6dbbhap3" "38" "chr6mannhap4" "39" "chr6mcfhap5" "40" "chr6qblhap6" "41" "chr6sstohap7" "42" "chr7" "43" "chr7gl000195random" "44" "chr8" "45" "chr8gl000196random" "46" "chr8gl000197random" "47" "chr9" "48" "chr9gl000198random" "49" "chr9gl000199random" "50" "chr9gl000200random" "51" "chr9gl000201random" "52" "chrM" "53" "chrUngl000211" "54" "chrUngl000212" "55" "chrUngl000213" "56" "chrUngl000214" "57" "chrUngl000215" "58" "chrUngl000216" "59" "chrUngl000217" "60" "chrUngl000218" "61" "chrUngl000219" "62" "chrUngl000220" "63" "chrUngl000221" "64" "chrUngl000222" "65" "chrUngl000223" "66" "chrUngl000224" "67" "chrUngl000225" "68" "chrUngl000226" "69" "chrUngl000227" "70" "chrUngl000228" "71" "chrUngl000229" "72" "chrUngl000230" "73" "chrUngl000231" "74" "chrUngl000232" "75" "chrUngl000233" "76" "chrUngl000234" "77" "chrUngl000235" "78" "chrUngl000236" "79" "chrUngl000237" "80" "chrUngl000238" "81" "chrUngl000239" "82" "chrUngl000240" "83" "chrUngl000241" "84" "chrUngl000242" "85" "chrUngl000243" "86" "chrUngl000244" "87" "chrUngl000245" "88" "chrUngl000246" "89" "chrUngl000247" "90" "chrUngl000248" "91" "chrUn_gl000249" "92" "chrX" "93" "chrY"

UCSCrefgene "x" "1" "chr1" "2" "chr1gl000191random" "3" "chr1gl000192random" "4" "chr1gl383519alt" "5" "chr1gl949741fix" "6" "chr1jh636052fix" "7" "chr1jh636054fix" "8" "chr10" "9" "chr10gl383543fix" "10" "chr10jh591181fix" "11" "chr10jh636060fix" "12" "chr11" "13" "chr11gl949744fix" "14" "chr11jh159138fix" "15" "chr11jh159142fix" "16" "chr12" "17" "chr13" "18" "chr14" "19" "chr14kb021645fix" "20" "chr15" "21" "chr16" "22" "chr17" "23" "chr17ctg5hap1" "24" "chr17gl000205random" "25" "chr17gl383560fix" "26" "chr17gl582976fix" "27" "chr17jh159145fix" "28" "chr18" "29" "chr18gl383571alt" "30" "chr19" "31" "chr19gl000209random" "32" "chr19gl383575alt" "33" "chr19gl582977fix" "34" "chr19gl949746alt" "35" "chr19gl949747alt" "36" "chr19gl949748alt" "37" "chr19gl949749alt" "38" "chr19gl949750alt" "39" "chr19gl949751alt" "40" "chr19gl949752alt" "41" "chr19gl949753alt" "42" "chr19jh159149fix" "43" "chr19kb021647fix" "44" "chr2" "45" "chr2kb663603fix" "46" "chr20" "47" "chr20gl582979fix" "48" "chr21" "49" "chr21ke332506fix" "50" "chr22" "51" "chr22gl383582alt" "52" "chr22jh720449fix" "53" "chr3" "54" "chr3gl383523fix" "55" "chr3jh159132fix" "56" "chr4" "57" "chr4ctg9hap1" "58" "chr4gl000193random" "59" "chr4gl000194random" "60" "chr4gl877872fix" "61" "chr4ke332496fix" "62" "chr5" "63" "chr5gl339449alt" "64" "chr5jh159133fix" "65" "chr5ke332497fix" "66" "chr6" "67" "chr6apdhap1" "68" "chr6coxhap2" "69" "chr6dbbhap3" "70" "chr6jh636056fix" "71" "chr6kb663604fix" "72" "chr6mannhap4" "73" "chr6mcfhap5" "74" "chr6qblhap6" "75" "chr6sstohap7" "76" "chr7" "77" "chr7gl000195random" "78" "chr7gl582971fix" "79" "chr7jh159134fix" "80" "chr8" "81" "chr8gl383535fix" "82" "chr8gl383536fix" "83" "chr9" "84" "chr9gl339450fix" "85" "chrM" "86" "chrUngl000211" "87" "chrUngl000212" "88" "chrUngl000213" "89" "chrUngl000215" "90" "chrUngl000218" "91" "chrUngl000219" "92" "chrUngl000220" "93" "chrUngl000222" "94" "chrUngl000223" "95" "chrUngl000224" "96" "chrUngl000227" "97" "chrUngl000228" "98" "chrUngl000241" "99" "chrX" "100" "chrXjh159150fix" "101" "chrXjh806587fix" "102" "chrXjh806590fix" "103" "chrXjh806593fix" "104" "chrXjh806594fix" "105" "chrXjh806595fix" "106" "chrXjh806597fix" "107" "chrXjh806599fix" "108" "chrXjh806600fix" "109" "chrXjh806601fix" "110" "chrXkb021648_fix" "111" "chrY"

If anyone have any clues, please let me know. Your help is much appreciated. Thank you so much!

sessionInfo()
R version 3.5.2 (2018-12-20)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17134)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] cowplot_0.9.4          reshape_0.8.8          ggplot2_3.1.0          clusterProfiler_3.10.1 GenomicFeatures_1.34.1
 [6] GenomicRanges_1.34.0   GenomeInfoDb_1.18.1    org.Hs.eg.db_3.7.0     AnnotationDbi_1.44.0   IRanges_2.16.0        
[11] S4Vectors_0.20.1       Biobase_2.42.0         BiocGenerics_0.28.0    ChIPseeker_1.18.0
ADD COMMENTlink modified 25 days ago by Hervé Pagès ♦♦ 13k • written 28 days ago by alanchenslm0
2

Please resist the temptation to post in multiple locations (I think you got them all, here, bioc-devel, GitHub, and the maintainer email address)!

This seems to have been reported before https://support.bioconductor.org/p/114901/ https://support.bioconductor.org/p/107839/ .

We'll work on this over the next several days.

ADD REPLYlink written 26 days ago by Martin Morgan ♦♦ 22k

Hi,Martin Sorry for posting in multiple locations. I will cancel posting in other places. Lets keep the discussion here. Thank you for trying to help me out. If there is any progress, please let me know. Thank you for your time and help again.

ADD REPLYlink modified 25 days ago • written 25 days ago by chen.shihang0

Do NOT post in multiple places; all locations are monitored by the same people.

ADD REPLYlink written 24 days ago by Martin Morgan ♦♦ 22k

I have had a look at this, and certainly confirm the error event with devel branch. A similar error occurs with the request for refGene with hg38.

Browse[6]> where
where 1: .check_foreign_key(transcripts_tx_chrom, NA, "transcripts$tx_chrom", 
    chrominfo$chrom, NA, "chrominfo$chrom")
where 2: .makeTxDb_normarg_chrominfo(chrominfo, transcripts$tx_chrom, 
    splicings$exon_chrom)
where 3: makeTxDb(transcripts, splicings, genes = genes, chrominfo = chrominfo, 
    metadata = metadata, reassign.ids = TRUE)
where 4: .makeTxDbFromUCSCTxTable(ucsc_txtable, txname2geneid$genes, genome, 
    tablename, track, txname2geneid$gene_id_type, full_dataset = is.null(transcript_ids), 
    circ_seqs = circ_seqs, goldenPath_url = goldenPath_url, taxonomyId = taxonomyId, 
    miRBaseBuild = miRBaseBuild)
where 5: makeTxDbFromUCSC(genome = "hg19", tablename = "refGene")

Browse[6]> length(setdiff(referring_vals, referred_vals))
[1] 57
Browse[6]> setdiff(referring_vals, referred_vals)
 [1] "chr19_gl949749_alt" "chr19_gl949746_alt" "chr17_gl582976_fix"
 [4] "chr17_jh159145_fix" "chr11_jh159138_fix" "chr5_jh159133_fix" 
...

for hg19, .fetchUCSCtxtable returns a table with 111 unique values for chrom

BUT

Browse[4]> GenomeInfoDb:::fetch_ChromInfo_from_UCSC
function (genome, goldenPath_url = "http://hgdownload.cse.ucsc.edu/goldenPath") 
{
    url <- paste(goldenPath_url, genome, "database/chromInfo.txt.gz", 
        sep = "/")
    destfile <- tempfile()
    download.file(url, destfile, quiet = TRUE)

has entries for only 93 'chromosomes' when genome == "hg19". So the real problem seems to be synchronization upstream. However it should be possible to devise a soft landing for this event?

ADD REPLYlink modified 26 days ago • written 26 days ago by Vincent J. Carey, Jr.6.3k

Hi Vincent, thank you for taking a look! I still no idea what I can do. Actually, I am a medical student and don't know much about programming. If you know any clues for dealing with this problem, please let me know! Thanks so much for your time and help, much appreciated!

ADD REPLYlink written 25 days ago by chen.shihang0
Answer: error in makeTxDbFromUCSC
3
gravatar for Hervé Pagès
25 days ago by
Hervé Pagès ♦♦ 13k
United States
Hervé Pagès ♦♦ 13k wrote:

Thanks for the report.

The refGene tables in UCSC databases hg19 and hg38 were last updated in Nov 2018 and now contain transcripts located on sequences that do NOT belong to the corresponding genomes (GRCh37 and GRCh38, respectively). More precisely some transcripts in these tables now belong to patched versions of these genomes: GRCh37.p13 for hg19 and GRCh38.p11 for hg38. Note that this also causes errors on the Genome Browser itself e.g. if you go to https://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19 , enter transcript NM_001910 in the search box, click on GO, then click on the NM_001910 at chr1_jh636054_fix:118-14749 link, you'll get the following error:

Sorry, couldn't locate chr1_jh636054_fix:118-14749 in Human Feb. 2009 (GRCh37/hg19)

I just committed a fix to GenomicFeatures. The fix is to drop these foreign transcripts with a warning. For example calling makeTxDbFromUCSC(genome="hg38", tablename="refGene") now displays the following warning message:

  113 transcripts were dropped because they are on unknown sequences
  (e.g. transcripts NM_024081, NM_001001437, NM_012101, NR_146066, ...)

The fix is in GenomicFeatures 1.35.6 (master branch, see fix here) and GenomicFeatures 1.34.3 (RELEASE_3_8 branch).

These 2 new versions of GenomicFeatures should become available via BiocManager::install() in the next 36 hours or so.

Cheers, H.

ADD COMMENTlink modified 24 days ago • written 25 days ago by Hervé Pagès ♦♦ 13k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 340 users visited in the last hour