GenomicFeatures makeTranscriptDbFromUCSC on refGene supported?
0
0
Entering edit mode
@erik-van-den-akker-4165
Last seen 9.6 years ago
Hi all, I'm a PhD student in bioinformatics working at the Leiden University Medical Center and at the Delft University of Technology in the Netherlands. Currently I'm working on the visualization of genome wide data sources, such as Linkage, GWAS & Expression data. In order to be able to quickly access information on gene locations (along with the UTR, CDS, exons etc), I thought it would be a good idea to make use of the GenomicFeatures package. This package works perfectly and very quickely for the example provided in the vignette (good job!): > library(GenomicFeatures) > system.time(mm9KG <- makeTranscriptDbFromUCSC(genome = "mm9", tablename = "knownGene")) user system elapsed 49.50 0.69 100.05 > mm9KG TranscriptDb object: | Db type: TranscriptDb | Data source: UCSC | Genome: mm9 | UCSC Table: knownGene | Type of Gene ID: Entrez Gene ID | Full dataset: yes | transcript_nrow: 49409 | exon_nrow: 237551 | cds_nrow: 204831 | Db created by: GenomicFeatures package from Bioconductor | Creation time: 2010-07-14 14:07:54 +0200 (Wed, 14 Jul 2010) | GenomicFeatures version at creation time: 1.0.3 | RSQLite version at creation time: 0.9-1 And even for even bigger databases, this works perfectly to: > system.time(hg19KG <- makeTranscriptDbFromUCSC(genome = "hg19", tablename = "knownGene")) user system elapsed 82.09 1.11 162.53 > hg19KG TranscriptDb object: | Db type: TranscriptDb | Data source: UCSC | Genome: hg19 | UCSC Table: knownGene | Type of Gene ID: Entrez Gene ID | Full dataset: yes | transcript_nrow: 77614 | exon_nrow: 281605 | cds_nrow: 236664 | Db created by: GenomicFeatures package from Bioconductor | Creation time: 2010-07-14 14:11:03 +0200 (Wed, 14 Jul 2010) | GenomicFeatures version at creation time: 1.0.3 | RSQLite version at creation time: 0.9-1 However, for tablename = "refGene" I had to shoot R down my R session after half an hour for both the genome = "mm9" & genome = "hg19" > system.time(hg19KG <- makeTranscriptDbFromUCSC(genome = "mm9", tablename = "refGene")) > system.time(hg19KG <- makeTranscriptDbFromUCSC(genome = "hg19", tablename = "refGene")) As this package makes use of functionalities provided by rtracklayer, before the actual SQLite db is stored, I verified whether this was working correctly: > library(rtracklayer) > session <- browserSession() > genome(session) <- "hg19" > query <- ucscTableQuery(session,"refGene") > system.time(Table <- getTable(query)) user system elapsed 7.70 0.39 61.73 Typing "head(Table)" gave the expected results, suggesting that something is not working correctly in creating the SQLite databases when setting tablename = "refGene" in the function makeTranscriptDbFromUCSC So, my question: Given that refGene pops up when using supportedUCSCtables(), I wondered: Did I do something wrong?; should I just have more patience & could anyone confirm these problems? And @PackageMaintainers: If this is a genuine bug, are you planning to fix this or speed things up? As I work with gene expression data, which are commonly annotated to either RefSeqIDs or Ensembl Transcript IDs, I would prefer to work with TranscriptDBs based on these features. Although I can think of many work around solutions using "knownGene" I would prefer to work with the package as intended. Thanks for the work already done on this package! Cheerz, Erik van den Akker > sessionInfo() R version 2.11.1 (2010-05-31) i386-pc-mingw32 locale: [1] LC_COLLATE=Dutch_Netherlands.1252 LC_CTYPE=Dutch_Netherlands.1252 LC_MONETARY=Dutch_Netherlands.1252 LC_NUMERIC=C [5] LC_TIME=Dutch_Netherlands.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] rtracklayer_1.8.1 RCurl_1.4-2 bitops_1.0-4.1 GenomicFeatures_1.0.3 GenomicRanges_1.0.5 IRanges_1.6.8 loaded via a namespace (and not attached): [1] Biobase_2.8.0 biomaRt_2.4.0 Biostrings_2.16.7 BSgenome_1.16.5 DBI_0.2-5 RSQLite_0.9-1 tools_2.11.1 XML_3.1-0 [[alternative HTML version deleted]]
Visualization rtracklayer GenomicFeatures Visualization rtracklayer GenomicFeatures • 1.7k views
ADD COMMENT

Login before adding your answer.

Traffic: 948 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6