biomaRt not returning expected results
1
0
Entering edit mode
Guest User ★ 13k
@guest-user-4897
Last seen 10.2 years ago
Dear Bioconductor users, I am trying to annotate a series of genomic regions using the biomaRt package. The intention is to retrieve the gene(s) and associated cytoband(s) in these regions. This is working fine for most cases but for one region I am unable to retrieve any annotation. The specific case is as follows ensembl <- useMart('ensembl', dataset='hsapiens_gene_ensembl') annotation <- getBM(attributes=c('hgnc_symbol', 'band'), filters=c('chromosome_name', 'start', 'end'), values=list(2, 2414662, 2457350), mart=ensembl) These coordinates represent a region on chromosome 2 in cytoband 2p25.3. There are no genes in this region so I would expect this to return just the cytoband (p25.3). What I actually get back is an empty data frame. Even if I search for just the band I still get nothing. I reconstructed the same query through the biomart web interface and got the same (lack of) results. For a given region of the genome I would expect to get back at least cytoband information at minimum and I don't understand why that isn't happening here. All suggestions welcome. Richard -- output of sessionInfo(): > sessionInfo() R version 2.15.2 (2012-10-26) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 [7] LC_PAPER=C LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] grid stats graphics grDevices utils datasets methods base other attached packages: [1] TxDb.Hsapiens.UCSC.hg19.knownGene_2.8.0 GenomicFeatures_1.10.2 org.Hs.eg.db_2.8.0 RSQLite_0.11.2 [5] DBI_0.2-5 AnnotationDbi_1.20.7 Biobase_2.18.0 gdata_2.12.0 [9] reshape_0.8.4 plyr_1.8 vcd_1.2-13 colorspace_1.2-1 [13] MASS_7.3-23 stringr_0.6.2 reshape2_1.2.2 biomaRt_2.14.0 [17] biovizBase_1.6.2 ggbio_1.6.6 gridExtra_0.9.1 scales_0.2.3 [21] ggplot2_0.9.3.1 GenomicRanges_1.10.7 IRanges_1.16.6 BiocGenerics_0.4.0 [25] markdown_0.5.4 knitr_1.1 loaded via a namespace (and not attached): [1] Biostrings_2.26.3 bitops_1.0-5 BSgenome_1.26.1 cluster_1.14.3 codetools_0.2-8 dichromat_2.0-0 digest_0.6.3 [8] evaluate_0.4.3 formatR_0.7 gtable_0.1.2 gtools_2.7.1 Hmisc_3.10-1 labeling_0.1 lattice_0.20-14 [15] munsell_0.4 parallel_2.15.2 proto_0.3-10 RColorBrewer_1.0-5 RCurl_1.95-4.1 Rsamtools_1.10.2 rtracklayer_1.18.2 [22] stats4_2.15.2 tools_2.15.2 VariantAnnotation_1.4.12 XML_3.96-0.2 zlibbioc_1.4.0 -- Sent via the guest posting facility at bioconductor.org.
Annotation annotate biomaRt Annotation annotate biomaRt • 1.6k views
ADD COMMENT
0
Entering edit mode
@steffen-durinck-4465
Last seen 10.2 years ago
Hi Richard, The Ensembl BioMart "gene" datasets (hsapiens_gene_ensembl,....) are transcript/gene centric. So only annotation related to genes/transcripts can be retrieved from those datasets. You can get all cytoband annotation for all genes as follows: ensembl <- useMart('ensembl', dataset='hsapiens_gene_ensembl') annotation <- getBM(c('ensembl_gene_id','hgnc_symbol','band'), filters="with_hgnc",values=TRUE,mart=ensembl) Note that I am only retrieving genes with an hgnc_symbol in the above query, if you also want the ones without symbols do: annotation <- getBM(c('ensembl_gene_id','hgnc_symbol','band'),mart=ensembl) Cheers, Steffen On Wed, May 1, 2013 at 3:13 AM, Richard Birnie [guest] < guest@bioconductor.org> wrote: > > Dear Bioconductor users, > > I am trying to annotate a series of genomic regions using the biomaRt > package. The intention is to retrieve the gene(s) and associated > cytoband(s) in these regions. This is working fine for most cases but for > one region I am unable to retrieve any annotation. > > The specific case is as follows > > ensembl <- useMart('ensembl', dataset='hsapiens_gene_ensembl') > annotation <- getBM(attributes=c('hgnc_symbol', 'band'), > filters=c('chromosome_name', 'start', 'end'), > values=list(2, 2414662, 2457350), mart=ensembl) > > These coordinates represent a region on chromosome 2 in cytoband 2p25.3. > There are no genes in this region so I would expect this to return just the > cytoband (p25.3). What I actually get back is an empty data frame. Even if > I search for just the band I still get nothing. I reconstructed the same > query through the biomart web interface and got the same (lack of) results. > > For a given region of the genome I would expect to get back at least > cytoband information at minimum and I don't understand why that isn't > happening here. > > All suggestions welcome. > Richard > > -- output of sessionInfo(): > > > sessionInfo() > R version 2.15.2 (2012-10-26) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C > LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 > LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 > [7] LC_PAPER=C LC_NAME=C LC_ADDRESS=C > LC_TELEPHONE=C LC_MEASUREMENT=en_GB.UTF-8 > LC_IDENTIFICATION=C > > attached base packages: > [1] grid stats graphics grDevices utils datasets methods > base > > other attached packages: > [1] TxDb.Hsapiens.UCSC.hg19.knownGene_2.8.0 GenomicFeatures_1.10.2 > org.Hs.eg.db_2.8.0 RSQLite_0.11.2 > [5] DBI_0.2-5 AnnotationDbi_1.20.7 > Biobase_2.18.0 gdata_2.12.0 > [9] reshape_0.8.4 plyr_1.8 > vcd_1.2-13 colorspace_1.2-1 > [13] MASS_7.3-23 stringr_0.6.2 > reshape2_1.2.2 biomaRt_2.14.0 > [17] biovizBase_1.6.2 ggbio_1.6.6 > gridExtra_0.9.1 scales_0.2.3 > [21] ggplot2_0.9.3.1 GenomicRanges_1.10.7 > IRanges_1.16.6 BiocGenerics_0.4.0 > [25] markdown_0.5.4 knitr_1.1 > > loaded via a namespace (and not attached): > [1] Biostrings_2.26.3 bitops_1.0-5 BSgenome_1.26.1 > cluster_1.14.3 codetools_0.2-8 dichromat_2.0-0 > digest_0.6.3 > [8] evaluate_0.4.3 formatR_0.7 gtable_0.1.2 > gtools_2.7.1 Hmisc_3.10-1 labeling_0.1 > lattice_0.20-14 > [15] munsell_0.4 parallel_2.15.2 proto_0.3-10 > RColorBrewer_1.0-5 RCurl_1.95-4.1 Rsamtools_1.10.2 > rtracklayer_1.18.2 > [22] stats4_2.15.2 tools_2.15.2 > VariantAnnotation_1.4.12 XML_3.96-0.2 zlibbioc_1.4.0 > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT

Login before adding your answer.

Traffic: 770 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6