Error using Homo.sapiens AnnotationDbi package with GenomicFeatures

0

Entering edit mode

Chris Whelan ▴ 60

@chris-whelan-4779

Last seen 11.3 years ago

Hi, I'm having trouble using the AnnotationDbi package and was wondering if someone could tell me what I'm doing wrong. I'm trying to use GenomicFeatures to find promoter regions and then use AnnotationDbi to look up the Entrez Gene IDs for those transcripts, but getting an error. If I'm going about this all wrong let me know; I find it a little difficult to follow the thread of the documentation of the various feature/annotation packages. At the very least the error message that I'm getting seems like it might be a little friendlier? Thanks! Chris Bioconductor version 2.11 (BiocInstaller 1.8.3), ?biocLite for help > library(GenomicFeatures) Loading required package: BiocGenerics Attaching package: 'BiocGenerics' The following object(s) are masked from 'package:stats': xtabs The following object(s) are masked from 'package:base': Filter, Find, Map, Position, Reduce, anyDuplicated, cbind, colnames, duplicated, eval, get, intersect, lapply, mapply, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rbind, rep.int, rownames, sapply, setdiff, table, tapply, union, unique Loading required package: IRanges Loading required package: GenomicRanges Loading required package: AnnotationDbi Loading required package: Biobase Welcome to Bioconductor Vignettes contain introductory material; view with 'browseVignettes()'. To cite Bioconductor, see 'citation("Biobase")', and for packages 'citation("pkgname")'. li> library(Homo.sapiens) Loading required package: OrganismDbi Loading required package: GO.db Loading required package: DBI Loading required package: org.Hs.eg.db Loading required package: TxDb.Hsapiens.UCSC.hg19.knownGene > hg19UCSCGenes <- makeTranscriptDbFromUCSC(genome = "hg19", tablename = "knownGene") Download the knownGene table ... OK Download the knownToLocusLink table ... OK Extract the 'transcripts' data frame ... OK Extract the 'splicings' data frame ... OK Download and preprocess the 'chrominfo' data frame ... OK Prepare the 'metadata' data frame ... metadata: OK > k <- elementMetadata(head(promoters(hg19UCSCGenes)))[,"tx_name"] Warning messages: 1: In `start<-`(`*tmp*`, value = c(9874, 9874, 9874, 67091, 319084, : trimmed start values to be positive 2: In `end<-`(`*tmp*`, value = c(12073, 12073, 12073, 69290, 321283, : trimmed end values to be <= seqlengths > k [1] "uc001aaa.3" "uc010nxq.1" "uc010nxr.1" "uc001aal.1" "uc001aaq.2" [6] "uc001aar.2" > head(keys(Homo.sapiens, keytype="TXNAME")) [1] "uc001aaa.3" "uc010nxq.1" "uc010nxr.1" "uc001aal.1" "uc001aaq.2" [6] "uc001aar.2" > select(Homo.sapiens, keys=k, keytype="TXNAME", cols=c("TXNAME", "ENTREZID") + ) Error in if (nrow(res) > 0L) { : argument is of length zero > sessionInfo() R version 2.15.1 (2012-06-22) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] Homo.sapiens_1.0.0 [2] TxDb.Hsapiens.UCSC.hg19.knownGene_2.8.0 [3] org.Hs.eg.db_2.8.0 [4] GO.db_2.8.0 [5] RSQLite_0.11.2 [6] DBI_0.2-5 [7] OrganismDbi_1.0.0 [8] GenomicFeatures_1.10.0 [9] AnnotationDbi_1.20.2 [10] Biobase_2.18.0 [11] GenomicRanges_1.10.4 [12] IRanges_1.16.4 [13] BiocGenerics_0.4.0 [14] BiocInstaller_1.8.3 loaded via a namespace (and not attached): [1] BSgenome_1.26.1 Biostrings_2.26.2 RBGL_1.34.0 RCurl_1.95-3 [5] Rsamtools_1.10.1 XML_3.95-0.1 biomaRt_2.14.0 bitops_1.0-4.2 [9] graph_1.36.0 parallel_2.15.1 rtracklayer_1.18.0 stats4_2.15.1 [13] tools_2.15.1 zlibbioc_1.4.0

GO AnnotationDbi GO AnnotationDbi • 2.1k views

ADD COMMENT • link updated 13.1 years ago by Marc Carlson ★ 7.2k • written 13.1 years ago by Chris Whelan ▴ 60

0

Entering edit mode

Marc Carlson ★ 7.2k

@marc-carlson-2264

Last seen 9.4 years ago

United States

Hi Chris, If you load the Homo.sapiens package, you will see it load the TxDb.Hsapiens.UCSC.hg19.knownGene package for you as a dependency. So you don't need to call makeTranscriptDbFromUCSC(), at least not for the track you were going for, because that was already loaded via the TxDb.Hsapiens.UCSC.hg19.knownGene package. To get the promoter regions, you really only need to call promoters like this: library(Homo.sapiens) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene proms <- promoters(txdb, upstream=2000, downstream=200) ## check the defaults in case you don't like them! proms ## Once you have the promoters, you can look up the tx_names for these like this. k <- proms$tx_name ## And then you can use select to retrieve the matching gene IDs ## In the case of Homo.sapiens, the gene IDs actually *are* entrez gene IDs (because that is what the knownGene track is using as a gene ID). res <- select(Homo.sapiens, keys=k, cols=c("GENEID","TXNAME"), keytype="TXNAME") head(res) Marc On 11/08/2012 10:54 AM, Chris Whelan wrote: > Hi, > > I'm having trouble using the AnnotationDbi package and was wondering > if someone could tell me what I'm doing wrong. I'm trying to use > GenomicFeatures to find promoter regions and then use AnnotationDbi to > look up the Entrez Gene IDs for those transcripts, but getting an > error. If I'm going about this all wrong let me know; I find it a > little difficult to follow the thread of the documentation of the > various feature/annotation packages. At the very least the error > message that I'm getting seems like it might be a little friendlier? > > Thanks! > > Chris > > Bioconductor version 2.11 (BiocInstaller 1.8.3), ?biocLite for help >> library(GenomicFeatures) > Loading required package: BiocGenerics > > Attaching package: 'BiocGenerics' > > The following object(s) are masked from 'package:stats': > > xtabs > > The following object(s) are masked from 'package:base': > > Filter, Find, Map, Position, Reduce, anyDuplicated, cbind, > colnames, duplicated, eval, get, intersect, lapply, mapply, mget, > order, paste, pmax, pmax.int, pmin, pmin.int, rbind, rep.int, > rownames, sapply, setdiff, table, tapply, union, unique > > Loading required package: IRanges > Loading required package: GenomicRanges > Loading required package: AnnotationDbi > Loading required package: Biobase > Welcome to Bioconductor > > Vignettes contain introductory material; view with > 'browseVignettes()'. To cite Bioconductor, see > 'citation("Biobase")', and for packages 'citation("pkgname")'. > > li> library(Homo.sapiens) > Loading required package: OrganismDbi > Loading required package: GO.db > Loading required package: DBI > > Loading required package: org.Hs.eg.db > > Loading required package: TxDb.Hsapiens.UCSC.hg19.knownGene >> hg19UCSCGenes<- makeTranscriptDbFromUCSC(genome = "hg19", tablename = "knownGene") > Download the knownGene table ... OK > Download the knownToLocusLink table ... OK > Extract the 'transcripts' data frame ... OK > Extract the 'splicings' data frame ... OK > Download and preprocess the 'chrominfo' data frame ... OK > Prepare the 'metadata' data frame ... metadata: OK >> k<- elementMetadata(head(promoters(hg19UCSCGenes)))[,"tx_name"] > Warning messages: > 1: In `start<-`(`*tmp*`, value = c(9874, 9874, 9874, 67091, 319084, : > trimmed start values to be positive > 2: In `end<-`(`*tmp*`, value = c(12073, 12073, 12073, 69290, 321283, : > trimmed end values to be<= seqlengths >> k > [1] "uc001aaa.3" "uc010nxq.1" "uc010nxr.1" "uc001aal.1" "uc001aaq.2" > [6] "uc001aar.2" >> head(keys(Homo.sapiens, keytype="TXNAME")) > [1] "uc001aaa.3" "uc010nxq.1" "uc010nxr.1" "uc001aal.1" "uc001aaq.2" > [6] "uc001aar.2" >> select(Homo.sapiens, keys=k, keytype="TXNAME", cols=c("TXNAME", "ENTREZID") > + ) > Error in if (nrow(res)> 0L) { : argument is of length zero >> sessionInfo() > R version 2.15.1 (2012-06-22) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] Homo.sapiens_1.0.0 > [2] TxDb.Hsapiens.UCSC.hg19.knownGene_2.8.0 > [3] org.Hs.eg.db_2.8.0 > [4] GO.db_2.8.0 > [5] RSQLite_0.11.2 > [6] DBI_0.2-5 > [7] OrganismDbi_1.0.0 > [8] GenomicFeatures_1.10.0 > [9] AnnotationDbi_1.20.2 > [10] Biobase_2.18.0 > [11] GenomicRanges_1.10.4 > [12] IRanges_1.16.4 > [13] BiocGenerics_0.4.0 > [14] BiocInstaller_1.8.3 > > loaded via a namespace (and not attached): > [1] BSgenome_1.26.1 Biostrings_2.26.2 RBGL_1.34.0 RCurl_1.95-3 > [5] Rsamtools_1.10.1 XML_3.95-0.1 biomaRt_2.14.0 bitops_1.0-4.2 > [9] graph_1.36.0 parallel_2.15.1 rtracklayer_1.18.0 stats4_2.15.1 > [13] tools_2.15.1 zlibbioc_1.4.0 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 13.1 years ago Marc Carlson ★ 7.2k

0

Entering edit mode

Hi Chris, I also noticed that in your select query from before "ENTREZID" was not coming back properly. This has now been fixed. So (after a quick update) you can also do this for the last step: res2<- select(Homo.sapiens, keys=k, cols=c("ENTREZID","TXNAME"), keytype="TXNAME") head(res2) Marc On 11/08/2012 04:44 PM, Marc Carlson wrote: > Hi Chris, > > If you load the Homo.sapiens package, you will see it load the > TxDb.Hsapiens.UCSC.hg19.knownGene package for you as a dependency. So > you don't need to call makeTranscriptDbFromUCSC(), at least not for > the track you were going for, because that was already loaded via the > TxDb.Hsapiens.UCSC.hg19.knownGene package. To get the promoter > regions, you really only need to call promoters like this: > > library(Homo.sapiens) > txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene > proms <- promoters(txdb, upstream=2000, downstream=200) ## check the > defaults in case you don't like them! > proms > > ## Once you have the promoters, you can look up the tx_names for these > like this. > k <- proms$tx_name > > ## And then you can use select to retrieve the matching gene IDs > ## In the case of Homo.sapiens, the gene IDs actually *are* entrez > gene IDs (because that is what the knownGene track is using as a gene > ID). > res <- select(Homo.sapiens, keys=k, cols=c("GENEID","TXNAME"), > keytype="TXNAME") > head(res) > > > > Marc > > > > > On 11/08/2012 10:54 AM, Chris Whelan wrote: >> Hi, >> >> I'm having trouble using the AnnotationDbi package and was wondering >> if someone could tell me what I'm doing wrong. I'm trying to use >> GenomicFeatures to find promoter regions and then use AnnotationDbi to >> look up the Entrez Gene IDs for those transcripts, but getting an >> error. If I'm going about this all wrong let me know; I find it a >> little difficult to follow the thread of the documentation of the >> various feature/annotation packages. At the very least the error >> message that I'm getting seems like it might be a little friendlier? >> >> Thanks! >> >> Chris >> >> Bioconductor version 2.11 (BiocInstaller 1.8.3), ?biocLite for help >>> library(GenomicFeatures) >> Loading required package: BiocGenerics >> >> Attaching package: 'BiocGenerics' >> >> The following object(s) are masked from 'package:stats': >> >> xtabs >> >> The following object(s) are masked from 'package:base': >> >> Filter, Find, Map, Position, Reduce, anyDuplicated, cbind, >> colnames, duplicated, eval, get, intersect, lapply, mapply, mget, >> order, paste, pmax, pmax.int, pmin, pmin.int, rbind, rep.int, >> rownames, sapply, setdiff, table, tapply, union, unique >> >> Loading required package: IRanges >> Loading required package: GenomicRanges >> Loading required package: AnnotationDbi >> Loading required package: Biobase >> Welcome to Bioconductor >> >> Vignettes contain introductory material; view with >> 'browseVignettes()'. To cite Bioconductor, see >> 'citation("Biobase")', and for packages 'citation("pkgname")'. >> >> li> library(Homo.sapiens) >> Loading required package: OrganismDbi >> Loading required package: GO.db >> Loading required package: DBI >> >> Loading required package: org.Hs.eg.db >> >> Loading required package: TxDb.Hsapiens.UCSC.hg19.knownGene >>> hg19UCSCGenes<- makeTranscriptDbFromUCSC(genome = "hg19", tablename >>> = "knownGene") >> Download the knownGene table ... OK >> Download the knownToLocusLink table ... OK >> Extract the 'transcripts' data frame ... OK >> Extract the 'splicings' data frame ... OK >> Download and preprocess the 'chrominfo' data frame ... OK >> Prepare the 'metadata' data frame ... metadata: OK >>> k<- elementMetadata(head(promoters(hg19UCSCGenes)))[,"tx_name"] >> Warning messages: >> 1: In `start<-`(`*tmp*`, value = c(9874, 9874, 9874, 67091, 319084, : >> trimmed start values to be positive >> 2: In `end<-`(`*tmp*`, value = c(12073, 12073, 12073, 69290, 321283, : >> trimmed end values to be<= seqlengths >>> k >> [1] "uc001aaa.3" "uc010nxq.1" "uc010nxr.1" "uc001aal.1" "uc001aaq.2" >> [6] "uc001aar.2" >>> head(keys(Homo.sapiens, keytype="TXNAME")) >> [1] "uc001aaa.3" "uc010nxq.1" "uc010nxr.1" "uc001aal.1" "uc001aaq.2" >> [6] "uc001aar.2" >>> select(Homo.sapiens, keys=k, keytype="TXNAME", cols=c("TXNAME", >>> "ENTREZID") >> + ) >> Error in if (nrow(res)> 0L) { : argument is of length zero >>> sessionInfo() >> R version 2.15.1 (2012-06-22) >> Platform: x86_64-pc-linux-gnu (64-bit) >> >> locale: >> [1] C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] Homo.sapiens_1.0.0 >> [2] TxDb.Hsapiens.UCSC.hg19.knownGene_2.8.0 >> [3] org.Hs.eg.db_2.8.0 >> [4] GO.db_2.8.0 >> [5] RSQLite_0.11.2 >> [6] DBI_0.2-5 >> [7] OrganismDbi_1.0.0 >> [8] GenomicFeatures_1.10.0 >> [9] AnnotationDbi_1.20.2 >> [10] Biobase_2.18.0 >> [11] GenomicRanges_1.10.4 >> [12] IRanges_1.16.4 >> [13] BiocGenerics_0.4.0 >> [14] BiocInstaller_1.8.3 >> >> loaded via a namespace (and not attached): >> [1] BSgenome_1.26.1 Biostrings_2.26.2 RBGL_1.34.0 >> RCurl_1.95-3 >> [5] Rsamtools_1.10.1 XML_3.95-0.1 biomaRt_2.14.0 >> bitops_1.0-4.2 >> [9] graph_1.36.0 parallel_2.15.1 rtracklayer_1.18.0 >> stats4_2.15.1 >> [13] tools_2.15.1 zlibbioc_1.4.0 >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 13.1 years ago Marc Carlson ★ 7.2k

Login before adding your answer.