Error using Homo.sapiens AnnotationDbi package with GenomicFeatures
1
0
Entering edit mode
Chris Whelan ▴ 60
@chris-whelan-4779
Last seen 10.2 years ago
Hi, I'm having trouble using the AnnotationDbi package and was wondering if someone could tell me what I'm doing wrong. I'm trying to use GenomicFeatures to find promoter regions and then use AnnotationDbi to look up the Entrez Gene IDs for those transcripts, but getting an error. If I'm going about this all wrong let me know; I find it a little difficult to follow the thread of the documentation of the various feature/annotation packages. At the very least the error message that I'm getting seems like it might be a little friendlier? Thanks! Chris Bioconductor version 2.11 (BiocInstaller 1.8.3), ?biocLite for help > library(GenomicFeatures) Loading required package: BiocGenerics Attaching package: 'BiocGenerics' The following object(s) are masked from 'package:stats': xtabs The following object(s) are masked from 'package:base': Filter, Find, Map, Position, Reduce, anyDuplicated, cbind, colnames, duplicated, eval, get, intersect, lapply, mapply, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rbind, rep.int, rownames, sapply, setdiff, table, tapply, union, unique Loading required package: IRanges Loading required package: GenomicRanges Loading required package: AnnotationDbi Loading required package: Biobase Welcome to Bioconductor Vignettes contain introductory material; view with 'browseVignettes()'. To cite Bioconductor, see 'citation("Biobase")', and for packages 'citation("pkgname")'. li> library(Homo.sapiens) Loading required package: OrganismDbi Loading required package: GO.db Loading required package: DBI Loading required package: org.Hs.eg.db Loading required package: TxDb.Hsapiens.UCSC.hg19.knownGene > hg19UCSCGenes <- makeTranscriptDbFromUCSC(genome = "hg19", tablename = "knownGene") Download the knownGene table ... OK Download the knownToLocusLink table ... OK Extract the 'transcripts' data frame ... OK Extract the 'splicings' data frame ... OK Download and preprocess the 'chrominfo' data frame ... OK Prepare the 'metadata' data frame ... metadata: OK > k <- elementMetadata(head(promoters(hg19UCSCGenes)))[,"tx_name"] Warning messages: 1: In `start<-`(`*tmp*`, value = c(9874, 9874, 9874, 67091, 319084, : trimmed start values to be positive 2: In `end<-`(`*tmp*`, value = c(12073, 12073, 12073, 69290, 321283, : trimmed end values to be <= seqlengths > k [1] "uc001aaa.3" "uc010nxq.1" "uc010nxr.1" "uc001aal.1" "uc001aaq.2" [6] "uc001aar.2" > head(keys(Homo.sapiens, keytype="TXNAME")) [1] "uc001aaa.3" "uc010nxq.1" "uc010nxr.1" "uc001aal.1" "uc001aaq.2" [6] "uc001aar.2" > select(Homo.sapiens, keys=k, keytype="TXNAME", cols=c("TXNAME", "ENTREZID") + ) Error in if (nrow(res) > 0L) { : argument is of length zero > sessionInfo() R version 2.15.1 (2012-06-22) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] Homo.sapiens_1.0.0 [2] TxDb.Hsapiens.UCSC.hg19.knownGene_2.8.0 [3] org.Hs.eg.db_2.8.0 [4] GO.db_2.8.0 [5] RSQLite_0.11.2 [6] DBI_0.2-5 [7] OrganismDbi_1.0.0 [8] GenomicFeatures_1.10.0 [9] AnnotationDbi_1.20.2 [10] Biobase_2.18.0 [11] GenomicRanges_1.10.4 [12] IRanges_1.16.4 [13] BiocGenerics_0.4.0 [14] BiocInstaller_1.8.3 loaded via a namespace (and not attached): [1] BSgenome_1.26.1 Biostrings_2.26.2 RBGL_1.34.0 RCurl_1.95-3 [5] Rsamtools_1.10.1 XML_3.95-0.1 biomaRt_2.14.0 bitops_1.0-4.2 [9] graph_1.36.0 parallel_2.15.1 rtracklayer_1.18.0 stats4_2.15.1 [13] tools_2.15.1 zlibbioc_1.4.0
GO AnnotationDbi GO AnnotationDbi • 1.8k views
ADD COMMENT
0
Entering edit mode
Marc Carlson ★ 7.2k
@marc-carlson-2264
Last seen 8.3 years ago
United States
Hi Chris, If you load the Homo.sapiens package, you will see it load the TxDb.Hsapiens.UCSC.hg19.knownGene package for you as a dependency. So you don't need to call makeTranscriptDbFromUCSC(), at least not for the track you were going for, because that was already loaded via the TxDb.Hsapiens.UCSC.hg19.knownGene package. To get the promoter regions, you really only need to call promoters like this: library(Homo.sapiens) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene proms <- promoters(txdb, upstream=2000, downstream=200) ## check the defaults in case you don't like them! proms ## Once you have the promoters, you can look up the tx_names for these like this. k <- proms$tx_name ## And then you can use select to retrieve the matching gene IDs ## In the case of Homo.sapiens, the gene IDs actually *are* entrez gene IDs (because that is what the knownGene track is using as a gene ID). res <- select(Homo.sapiens, keys=k, cols=c("GENEID","TXNAME"), keytype="TXNAME") head(res) Marc On 11/08/2012 10:54 AM, Chris Whelan wrote: > Hi, > > I'm having trouble using the AnnotationDbi package and was wondering > if someone could tell me what I'm doing wrong. I'm trying to use > GenomicFeatures to find promoter regions and then use AnnotationDbi to > look up the Entrez Gene IDs for those transcripts, but getting an > error. If I'm going about this all wrong let me know; I find it a > little difficult to follow the thread of the documentation of the > various feature/annotation packages. At the very least the error > message that I'm getting seems like it might be a little friendlier? > > Thanks! > > Chris > > Bioconductor version 2.11 (BiocInstaller 1.8.3), ?biocLite for help >> library(GenomicFeatures) > Loading required package: BiocGenerics > > Attaching package: 'BiocGenerics' > > The following object(s) are masked from 'package:stats': > > xtabs > > The following object(s) are masked from 'package:base': > > Filter, Find, Map, Position, Reduce, anyDuplicated, cbind, > colnames, duplicated, eval, get, intersect, lapply, mapply, mget, > order, paste, pmax, pmax.int, pmin, pmin.int, rbind, rep.int, > rownames, sapply, setdiff, table, tapply, union, unique > > Loading required package: IRanges > Loading required package: GenomicRanges > Loading required package: AnnotationDbi > Loading required package: Biobase > Welcome to Bioconductor > > Vignettes contain introductory material; view with > 'browseVignettes()'. To cite Bioconductor, see > 'citation("Biobase")', and for packages 'citation("pkgname")'. > > li> library(Homo.sapiens) > Loading required package: OrganismDbi > Loading required package: GO.db > Loading required package: DBI > > Loading required package: org.Hs.eg.db > > Loading required package: TxDb.Hsapiens.UCSC.hg19.knownGene >> hg19UCSCGenes<- makeTranscriptDbFromUCSC(genome = "hg19", tablename = "knownGene") > Download the knownGene table ... OK > Download the knownToLocusLink table ... OK > Extract the 'transcripts' data frame ... OK > Extract the 'splicings' data frame ... OK > Download and preprocess the 'chrominfo' data frame ... OK > Prepare the 'metadata' data frame ... metadata: OK >> k<- elementMetadata(head(promoters(hg19UCSCGenes)))[,"tx_name"] > Warning messages: > 1: In `start<-`(`*tmp*`, value = c(9874, 9874, 9874, 67091, 319084, : > trimmed start values to be positive > 2: In `end<-`(`*tmp*`, value = c(12073, 12073, 12073, 69290, 321283, : > trimmed end values to be<= seqlengths >> k > [1] "uc001aaa.3" "uc010nxq.1" "uc010nxr.1" "uc001aal.1" "uc001aaq.2" > [6] "uc001aar.2" >> head(keys(Homo.sapiens, keytype="TXNAME")) > [1] "uc001aaa.3" "uc010nxq.1" "uc010nxr.1" "uc001aal.1" "uc001aaq.2" > [6] "uc001aar.2" >> select(Homo.sapiens, keys=k, keytype="TXNAME", cols=c("TXNAME", "ENTREZID") > + ) > Error in if (nrow(res)> 0L) { : argument is of length zero >> sessionInfo() > R version 2.15.1 (2012-06-22) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] Homo.sapiens_1.0.0 > [2] TxDb.Hsapiens.UCSC.hg19.knownGene_2.8.0 > [3] org.Hs.eg.db_2.8.0 > [4] GO.db_2.8.0 > [5] RSQLite_0.11.2 > [6] DBI_0.2-5 > [7] OrganismDbi_1.0.0 > [8] GenomicFeatures_1.10.0 > [9] AnnotationDbi_1.20.2 > [10] Biobase_2.18.0 > [11] GenomicRanges_1.10.4 > [12] IRanges_1.16.4 > [13] BiocGenerics_0.4.0 > [14] BiocInstaller_1.8.3 > > loaded via a namespace (and not attached): > [1] BSgenome_1.26.1 Biostrings_2.26.2 RBGL_1.34.0 RCurl_1.95-3 > [5] Rsamtools_1.10.1 XML_3.95-0.1 biomaRt_2.14.0 bitops_1.0-4.2 > [9] graph_1.36.0 parallel_2.15.1 rtracklayer_1.18.0 stats4_2.15.1 > [13] tools_2.15.1 zlibbioc_1.4.0 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
Hi Chris, I also noticed that in your select query from before "ENTREZID" was not coming back properly. This has now been fixed. So (after a quick update) you can also do this for the last step: res2<- select(Homo.sapiens, keys=k, cols=c("ENTREZID","TXNAME"), keytype="TXNAME") head(res2) Marc On 11/08/2012 04:44 PM, Marc Carlson wrote: > Hi Chris, > > If you load the Homo.sapiens package, you will see it load the > TxDb.Hsapiens.UCSC.hg19.knownGene package for you as a dependency. So > you don't need to call makeTranscriptDbFromUCSC(), at least not for > the track you were going for, because that was already loaded via the > TxDb.Hsapiens.UCSC.hg19.knownGene package. To get the promoter > regions, you really only need to call promoters like this: > > library(Homo.sapiens) > txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene > proms <- promoters(txdb, upstream=2000, downstream=200) ## check the > defaults in case you don't like them! > proms > > ## Once you have the promoters, you can look up the tx_names for these > like this. > k <- proms$tx_name > > ## And then you can use select to retrieve the matching gene IDs > ## In the case of Homo.sapiens, the gene IDs actually *are* entrez > gene IDs (because that is what the knownGene track is using as a gene > ID). > res <- select(Homo.sapiens, keys=k, cols=c("GENEID","TXNAME"), > keytype="TXNAME") > head(res) > > > > Marc > > > > > On 11/08/2012 10:54 AM, Chris Whelan wrote: >> Hi, >> >> I'm having trouble using the AnnotationDbi package and was wondering >> if someone could tell me what I'm doing wrong. I'm trying to use >> GenomicFeatures to find promoter regions and then use AnnotationDbi to >> look up the Entrez Gene IDs for those transcripts, but getting an >> error. If I'm going about this all wrong let me know; I find it a >> little difficult to follow the thread of the documentation of the >> various feature/annotation packages. At the very least the error >> message that I'm getting seems like it might be a little friendlier? >> >> Thanks! >> >> Chris >> >> Bioconductor version 2.11 (BiocInstaller 1.8.3), ?biocLite for help >>> library(GenomicFeatures) >> Loading required package: BiocGenerics >> >> Attaching package: 'BiocGenerics' >> >> The following object(s) are masked from 'package:stats': >> >> xtabs >> >> The following object(s) are masked from 'package:base': >> >> Filter, Find, Map, Position, Reduce, anyDuplicated, cbind, >> colnames, duplicated, eval, get, intersect, lapply, mapply, mget, >> order, paste, pmax, pmax.int, pmin, pmin.int, rbind, rep.int, >> rownames, sapply, setdiff, table, tapply, union, unique >> >> Loading required package: IRanges >> Loading required package: GenomicRanges >> Loading required package: AnnotationDbi >> Loading required package: Biobase >> Welcome to Bioconductor >> >> Vignettes contain introductory material; view with >> 'browseVignettes()'. To cite Bioconductor, see >> 'citation("Biobase")', and for packages 'citation("pkgname")'. >> >> li> library(Homo.sapiens) >> Loading required package: OrganismDbi >> Loading required package: GO.db >> Loading required package: DBI >> >> Loading required package: org.Hs.eg.db >> >> Loading required package: TxDb.Hsapiens.UCSC.hg19.knownGene >>> hg19UCSCGenes<- makeTranscriptDbFromUCSC(genome = "hg19", tablename >>> = "knownGene") >> Download the knownGene table ... OK >> Download the knownToLocusLink table ... OK >> Extract the 'transcripts' data frame ... OK >> Extract the 'splicings' data frame ... OK >> Download and preprocess the 'chrominfo' data frame ... OK >> Prepare the 'metadata' data frame ... metadata: OK >>> k<- elementMetadata(head(promoters(hg19UCSCGenes)))[,"tx_name"] >> Warning messages: >> 1: In `start<-`(`*tmp*`, value = c(9874, 9874, 9874, 67091, 319084, : >> trimmed start values to be positive >> 2: In `end<-`(`*tmp*`, value = c(12073, 12073, 12073, 69290, 321283, : >> trimmed end values to be<= seqlengths >>> k >> [1] "uc001aaa.3" "uc010nxq.1" "uc010nxr.1" "uc001aal.1" "uc001aaq.2" >> [6] "uc001aar.2" >>> head(keys(Homo.sapiens, keytype="TXNAME")) >> [1] "uc001aaa.3" "uc010nxq.1" "uc010nxr.1" "uc001aal.1" "uc001aaq.2" >> [6] "uc001aar.2" >>> select(Homo.sapiens, keys=k, keytype="TXNAME", cols=c("TXNAME", >>> "ENTREZID") >> + ) >> Error in if (nrow(res)> 0L) { : argument is of length zero >>> sessionInfo() >> R version 2.15.1 (2012-06-22) >> Platform: x86_64-pc-linux-gnu (64-bit) >> >> locale: >> [1] C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] Homo.sapiens_1.0.0 >> [2] TxDb.Hsapiens.UCSC.hg19.knownGene_2.8.0 >> [3] org.Hs.eg.db_2.8.0 >> [4] GO.db_2.8.0 >> [5] RSQLite_0.11.2 >> [6] DBI_0.2-5 >> [7] OrganismDbi_1.0.0 >> [8] GenomicFeatures_1.10.0 >> [9] AnnotationDbi_1.20.2 >> [10] Biobase_2.18.0 >> [11] GenomicRanges_1.10.4 >> [12] IRanges_1.16.4 >> [13] BiocGenerics_0.4.0 >> [14] BiocInstaller_1.8.3 >> >> loaded via a namespace (and not attached): >> [1] BSgenome_1.26.1 Biostrings_2.26.2 RBGL_1.34.0 >> RCurl_1.95-3 >> [5] Rsamtools_1.10.1 XML_3.95-0.1 biomaRt_2.14.0 >> bitops_1.0-4.2 >> [9] graph_1.36.0 parallel_2.15.1 rtracklayer_1.18.0 >> stats4_2.15.1 >> [13] tools_2.15.1 zlibbioc_1.4.0 >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY

Login before adding your answer.

Traffic: 930 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6