Question: GenomicFeatures: makeTranscriptDbFromUCSC on "refGene" supported?
9.0 years ago by
Erik van den Akker • 50
Erik van den Akker • 50 wrote:
Hi all, I'm a PhD student in bioinformatics working at the Leiden University Medical Center and at the Delft University of Technlogy in the Netherlands. Currently I'm working on the vizualization of genome wide data sources, such as Linkage, GWAS & Expression data. In order to be able to quickely access information on gene locations (along with the UTR, CDS, exons etc), I thought it would be a good idea to make use of the GenomicFeatures package. This package works perfectly and very quickely for the example provided in the vignette (good job!): > library(GenomicFeatures) > system.time(mm9KG <- makeTranscriptDbFromUCSC(genome = "mm9", tablename = "knownGene")) user system elapsed 49.50 0.69 100.05 > mm9KG TranscriptDb object: | Db type: TranscriptDb | Data source: UCSC | Genome: mm9 | UCSC Table: knownGene | Type of Gene ID: Entrez Gene ID | Full dataset: yes | transcript_nrow: 49409 | exon_nrow: 237551 | cds_nrow: 204831 | Db created by: GenomicFeatures package from Bioconductor | Creation time: 2010-07-14 14:07:54 +0200 (Wed, 14 Jul 2010) | GenomicFeatures version at creation time: 1.0.3 | RSQLite version at creation time: 0.9-1 And even for larger databases(humans), this works perfectly: > system.time(hg19KG <- makeTranscriptDbFromUCSC(genome = "hg19", tablename = "knownGene")) user system elapsed 82.09 1.11 162.53 > hg19KG TranscriptDb object: | Db type: TranscriptDb | Data source: UCSC | Genome: hg19 | UCSC Table: knownGene | Type of Gene ID: Entrez Gene ID | Full dataset: yes | transcript_nrow: 77614 | exon_nrow: 281605 | cds_nrow: 236664 | Db created by: GenomicFeatures package from Bioconductor | Creation time: 2010-07-14 14:11:03 +0200 (Wed, 14 Jul 2010) | GenomicFeatures version at creation time: 1.0.3 | RSQLite version at creation time: 0.9-1 However, for tablename = "refGene" I had to shoot down my R session after half an hour for both the settings genome = "mm9" & genome = "hg19" > system.time(hg19KG <- makeTranscriptDbFromUCSC(genome = "mm9", tablename = "refGene")) > system.time(hg19KG <- makeTranscriptDbFromUCSC(genome = "hg19", tablename = "refGene")) As this package makes use of functionalities provided by rtracklayer, before the actual SQLite db is stored, I verified whether this was working correctly: > library(rtracklayer) > session <- browserSession() > genome(session) <- "hg19" > query <- ucscTableQuery(session,"refGene") > system.time(Table <- getTable(query)) user system elapsed 7.70 0.39 61.73 Typing "head(Table)" gave the expected results, suggesting that something is not working correctly in creating the SQLite databases. So, my question: Given that refGene pops up when using supportedUCSCtables(), I wondered: 1) Did I do something wrong?; 2) should I just have more patience & 3) could anyone confirm these problems? And @PackageMaintainers: If this is a genuine bug, are you planning to fix this or speed things up? As I work with gene expression data, which are commonly annotated to either RefSeqIDs or Ensembl Transcript IDs, I would prefer to work with TranscriptDBs based on these features. Although I can think of many work around solutions using "knownGene" I would prefer to work with the package as originally intended and hence this post. Thanks for the work already done on this great package! Cheerz, Erik van den Akker > sessionInfo() R version 2.11.1 (2010-05-31) i386-pc-mingw32 locale:  LC_COLLATE=Dutch_Netherlands.1252 LC_CTYPE=Dutch_Netherlands.1252 LC_MONETARY=Dutch_Netherlands.1252 LC_NUMERIC=C  LC_TIME=Dutch_Netherlands.1252 attached base packages:  stats graphics grDevices utils datasets methods base other attached packages:  rtracklayer_1.8.1 RCurl_1.4-2 bitops_1.0-4.1 GenomicFeatures_1.0.3 GenomicRanges_1.0.5 IRanges_1.6.8 loaded via a namespace (and not attached):  Biobase_2.8.0 biomaRt_2.4.0 Biostrings_2.16.7 BSgenome_1.16.5 DBI_0.2-5 RSQLite_0.9-1 tools_2.11.1 XML_3.1-0 [[alternative HTML version deleted]]
ADD COMMENT • link •