makeTxDbFromUCSC("mm10", "refGene") gives "not supported" error
1
0
Entering edit mode
erhoppe • 0
@erhoppe-20514
Last seen 3.7 years ago

Similar to the question at https://support.bioconductor.org/p/107839/, it appears "refGene" has been removed from the supportedUCSCtracks("mm10") table in GenomicFeatures 1.40.1 and 1.42.1 (if not earlier) but the track is present for mm39. We use mm10's refGene to build our lab's annotation.

It may have been fixed already in the devel version since this code seems to add it manually in makeTxDbFromUCSC.R in lines 253-269 that were added 8 months ago, but I'm not sure how this lines up with those two release versions of GenomicFeatures (https://github.com/Bioconductor/GenomicFeatures/blob/master/R/makeTxDbFromUCSC.R):

if (!(genome %in% c("hg38", "hg19"))) {
        ## Keep only existing tracks.
        ans <- ans[ans$track %in% names(genome_tracknames), , drop=FALSE]
        rownames(ans) <- NULL

        ## Associate subtrack "UCSC RefSeq" to table "refGene" for a few
        ## genome builds.
        if (genome %in% c("mm10", "rn6",
                          "bosTau9", "danRer10", "danRer11",
                          "ce11", "dm6", "galGal6", "panTro6",
                          "rheMac10", "sacCer3"))
        {
            ans_subtrack <- ans[ , "subtrack"]
            ans_subtrack[ans[ , "tablename"] == "refGene"] <- "UCSC RefSeq"
            ans[ , "subtrack"] <- ans_subtrack
        }
    }

Is it possible to restore mm10 refGene to the release version or should we just update to the devel version now? Many thanks!

Session Info:
R version 4.0.4 (2021-02-15)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.6

Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages: [1] stats4 parallel grid stats graphics grDevices utils datasets methods base

other attached packages: [1] ggrepel_0.9.1 scales_1.1.1
[3] broom_0.7.5 readxl_1.3.1
[5] dplyr_1.0.4 plyr_1.8.6
[7] data.table_1.14.0 GenomicAlignments_1.26.0
[9] Rsamtools_2.6.0 BSgenome_1.58.0
[11] rtracklayer_1.50.0 AnnotationHub_2.22.0
[13] BiocFileCache_1.14.0 dbplyr_2.1.0
[15] VennDiagram_1.6.20 futile.logger_1.4.3
[17] RColorBrewer_1.1-2 survival_3.2-7
[19] TCGAbiolinks_2.18.0 RMariaDB_1.1.0
[21] biomaRt_2.46.3 goseq_1.42.0
[23] geneLenDataBase_1.26.0 BiasedUrn_1.07
[25] GO.db_3.12.1 beanplot_1.2
[27] IlluminaHumanMethylation450kmanifest_0.4.0 minfi_1.36.0
[29] bumphunter_1.32.0 locfit_1.5-9.4
[31] iterators_1.0.13 foreach_1.5.1
[33] Biostrings_2.58.0 XVector_0.30.0
[35] SummarizedExperiment_1.20.0 MatrixGenerics_1.2.1
[37] matrixStats_0.58.0 FDb.InfiniumMethylation.hg19_2.2.0
[39] org.Hs.eg.db_3.12.0 TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
[41] GenomicFeatures_1.42.1 AnnotationDbi_1.52.0
[43] Biobase_2.50.0 GenomicRanges_1.42.0
[45] GenomeInfoDb_1.26.2 IRanges_2.24.1
[47] S4Vectors_0.28.1 BiocGenerics_0.36.0
[49] seqLogo_1.56.0 gplots_3.1.1
[51] fastcluster_1.1.25 Rcpp_1.0.6
[53] mgcv_1.8-34 nlme_3.1-152
[55] forcats_0.5.1 stringr_1.4.0
[57] purrr_0.3.4 readr_1.4.0
[59] tidyr_1.1.2 tibble_3.0.6
[61] ggplot2_3.3.3 tidyverse_1.3.0
[63] BiocManager_1.30.10

GenomicF GenomicFeatures • 1.9k views
ADD COMMENT
2
Entering edit mode
@james-w-macdonald-5106
Last seen 2 days ago
United States

As you have noted, this was fixed last November:

> makeTxDbFromUCSC("mm10","refGene")
Download the refGene table ... OK
Download the hgFixed.refLink table ... OK
Extract the 'transcripts' data frame ... OK
Extract the 'splicings' data frame ... OK
Download and preprocess the 'chrominfo' data frame ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
TxDb object:
# Db type: TxDb
# Supporting package: GenomicFeatures
# Data source: UCSC
# Genome: mm10
# Organism: Mus musculus
# Taxonomy ID: 10090
# UCSC Table: refGene
# UCSC Track: NCBI RefSeq
# Resource URL: http://genome.ucsc.edu/
# Type of Gene ID: Entrez Gene ID
# Full dataset: yes
# miRBase build ID: NA
# Nb of transcripts: 47382
# Db created by: GenomicFeatures package from Bioconductor
# Creation time: 2021-02-24 15:41:09 -0500 (Wed, 24 Feb 2021)
# GenomicFeatures version at creation time: 1.42.1
# RSQLite version at creation time: 2.2.3
# DBSCHEMAVERSION: 1.2
Warning message:
In .extract_cds_locs_from_UCSC_txtable(ucsc_txtable) :
  UCSC data anomaly in 119 transcript(s): the cds cumulative length is
  not a multiple of 3 for transcripts 'NM_011633' 'NM_198024'
  'NM_001160424' 'NM_009268' 'NM_001190454' 'NM_001290729' 'NM_025576'
  'NM_001177397' 'NM_001081960' 'NM_010974' 'NM_001128086'
  'NM_001142737' 'NM_001289428' 'NM_001267808' 'NM_001301307'
  'NM_001109684' 'NM_021466' 'NM_025988' 'NM_016901' 'NM_001347054'
  'NM_011261' 'NM_001142760' 'NM_011022' 'NM_008848' 'NM_024470'
  'NM_010707' 'NM_001346422' 'NM_001301034' 'NM_001301737' 'NM_010039'
  'NM_008264' 'NM_010646' 'NM_001347053' 'NM_001206926' 'NM_001177396'
  'NM_009046' 'NM_207683' 'NM_146484' 'NM_001277980' 'NM_001114347'
  'NM_001277958' 'NM_001130175' 'NM_001277959' 'NM_144531' 'NM_181398'
  'NM_001177416' 'NM_001033980' 'NM_001358490' 'NM_008653' 'NM_009485'
  'NM_011154' 'NM_010115' 'NM_001142742' 'NM_008710' 'NM_001159419'
  'NM_001286602' 'NM_001177398' 'NM_148413' 'NM_010846' 'NM_011413'
  'NM_001163415' 'NM_001142739' 'NM_001271586' 'NM_00127 [... truncated]
> sessionInfo()
R version 4.0.0 (2020-04-24)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19041)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
[1] GenomicFeatures_1.42.1 AnnotationDbi_1.52.0   Biobase_2.50.0        
[4] GenomicRanges_1.42.0   GenomeInfoDb_1.26.2    IRanges_2.24.1        
[7] S4Vectors_0.28.1       BiocGenerics_0.36.0   

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.6                  lubridate_1.7.9.2          
 [3] lattice_0.20-41             prettyunits_1.1.1          
 [5] Rsamtools_2.6.0             Biostrings_2.58.0          
 [7] assertthat_0.2.1            utf8_1.1.4                 
 [9] BiocFileCache_1.14.0        R6_2.5.0                   
[11] RSQLite_2.2.3               httr_1.4.2                 
[13] pillar_1.5.0                zlibbioc_1.36.0            
[15] rlang_0.4.10                progress_1.2.2             
[17] curl_4.3                    rstudioapi_0.13            
[19] blob_1.2.1                  Matrix_1.2-18              
[21] BiocParallel_1.24.1         stringr_1.4.0              
[23] RCurl_1.98-1.2              bit_4.0.4                  
[25] biomaRt_2.46.3              DelayedArray_0.16.1        
[27] RMariaDB_1.1.0              compiler_4.0.0             
[29] rtracklayer_1.50.0          pkgconfig_2.0.3            
[31] askpass_1.1                 openssl_1.4.3              
[33] tidyselect_1.1.0            SummarizedExperiment_1.20.0
[35] tibble_3.0.6                GenomeInfoDbData_1.2.4     
[37] matrixStats_0.58.0          XML_3.99-0.5               
[39] fansi_0.4.2                 crayon_1.4.1               
[41] dplyr_1.0.4                 dbplyr_2.1.0               
[43] GenomicAlignments_1.26.0    bitops_1.0-6               
[45] rappdirs_0.3.3              grid_4.0.0                 
[47] lifecycle_1.0.0             DBI_1.1.1                  
[49] magrittr_2.0.1              stringi_1.5.3              
[51] cachem_1.0.4                XVector_0.30.0             
[53] xml2_1.3.2                  ellipsis_0.3.1             
[55] generics_0.1.0              vctrs_0.3.6                
[57] tools_4.0.0                 bit64_4.0.5                
[59] glue_1.4.2                  purrr_0.3.4                
[61] hms_1.0.0                   MatrixGenerics_1.2.1       
[63] fastmap_1.1.0               memoise_2.0.0              
>
ADD COMMENT
0
Entering edit mode

Huh. In a new session makeTxDbFromUCSC("mm10","refGene") does work for me. One of the vagaries of R, I guess. Thanks for the quick response!

ADD REPLY

Login before adding your answer.

Traffic: 851 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6