BioMaRt query
2
0
Entering edit mode
@rene-dreos-3880
Last seen 9.6 years ago
Dear BioC mailing list, I am trying to annotate Arabidopsis ATH1 genome array results using biomaRt, but it looks like some of the probesets are not annotated in biomaRt database. Here is one example: > library(biomaRt) > AT.db <- useMart(biomart="plant_mart_6", dataset="athaliana_eg_gene") > getBM(attributes = c("affy_ath1_121501","ensembl_gene_id","description"), filters = "affy_ath1_121501", values = "254998_at", mart = AT.db) [1] affy_ath1_121501 ensembl_gene_id description <0 rows> (or 0-length row.names) But if I use ath1121501.db library to annotate the same probeset: > library(annotate) > library(ath1121501.db) > mget("254998_at", env=ath1121501GENENAME) $`254998_at` [1] "encodes a choline synthase whose gene expression is induced by high salt and mannitol." > mget("254998_at", env=ath1121501ACCNUM) $`254998_at` [1] "AT4G09760" Why is this happening? Thank you for any advice, best regards r > sessionInfo() R version 2.11.1 (2010-05-31) x86_64-apple-darwin9.8.0 locale: [1] C attached base packages: [1] grid stats graphics grDevices utils datasets methods [8] base other attached packages: [1] ath1121501.db_2.4.1 org.At.tair.db_2.4.3 [3] RSQLite_0.9-2 annotate_1.26.1 [5] ath1121501cdf_2.6.0 biomaRt_2.4.0 [7] genefilter_1.30.0 marray_1.26.0 [9] gplots_2.8.0 caTools_1.10 [11] bitops_1.0-4.1 gdata_2.7.2 [13] gtools_2.6.2 bradiar1b520742cdf_1.24.0 [15] arrayQualityMetrics_2.6.0 affyPLM_1.24.1 [17] gcrma_2.20.0 preprocessCore_1.10.0 [19] matchprobes_1.20.0 Biostrings_2.16.9 [21] IRanges_1.6.15 AnnotationDbi_1.10.2 [23] affxparser_1.20.0 makecdfenv_1.26.0 [25] lattice_0.18-8 RMySQL_0.7-5 [27] DBI_0.2-5 affy_1.26.1 [29] Biobase_2.8.0 limma_3.4.4 loaded via a namespace (and not attached): [1] RColorBrewer_1.0-2 RCurl_1.4-2 XML_3.1-1 [4] affyio_1.16.0 beadarray_1.16.0 hwriter_1.2 [7] latticeExtra_0.6-14 simpleaffy_2.24.0 splines_2.11.1 [10] stats4_2.11.1 survival_2.35-8 tools_2.11.1 [13] vsn_3.16.0 xtable_1.5-6 [[alternative HTML version deleted]]
ath1121501 annotate ath1121501 annotate • 1.3k views
ADD COMMENT
0
Entering edit mode
@kasper-daniel-hansen-2979
Last seen 10 months ago
United States
When you use biomaRt you are querying Ensembl. Ensembl remaps all probesets independently of Affymetrix. The *.db package reflects the (current at the time of build) annotation available from Affymetrix. So for some reason Ensembl has decided that this particular probeset does not map to a gene. You will need to track down how Ensembl decides to do the probeset->gene (which is not trivial) mapping in order to understand why, but my guess is that they are in some sense stricter than Affymetrix. While this is not related to Ensembl, you might want to read this paper describing some of the problems with probe->probeset->gene mappings: http://nar.oxfordjournals.org/cgi/content/full/33/20/e175?ijkey=zaJMV7 qU1XANIci&keytype=ref Kasper On Mon, Oct 4, 2010 at 4:44 AM, Ren? Dreos <talponer at="" gmail.com=""> wrote: > Dear BioC mailing list, > > I am trying to annotate Arabidopsis ATH1 genome array results using biomaRt, > but it looks like some of the probesets are not annotated in biomaRt > database. Here is one example: > >> library(biomaRt) >> AT.db <- useMart(biomart="plant_mart_6", dataset="athaliana_eg_gene") >> getBM(attributes = c("affy_ath1_121501","ensembl_gene_id","description"), > filters = "affy_ath1_121501", values = "254998_at", mart = AT.db) > [1] affy_ath1_121501 ensembl_gene_id ?description > <0 rows> (or 0-length row.names) > > But if I use ath1121501.db library to annotate the same probeset: > >> library(annotate) >> library(ath1121501.db) > >> mget("254998_at", env=ath1121501GENENAME) > $`254998_at` > [1] "encodes a choline synthase whose gene expression is induced by high > salt and mannitol." > >> mget("254998_at", env=ath1121501ACCNUM) > $`254998_at` > [1] "AT4G09760" > > Why is this happening? > > Thank you for any advice, > best regards > r > >> sessionInfo() > R version 2.11.1 (2010-05-31) > x86_64-apple-darwin9.8.0 > > locale: > [1] C > > attached base packages: > [1] grid ? ? ?stats ? ? graphics ?grDevices utils ? ? datasets ?methods > [8] base > > other attached packages: > ?[1] ath1121501.db_2.4.1 ? ? ? org.At.tair.db_2.4.3 > ?[3] RSQLite_0.9-2 ? ? ? ? ? ? annotate_1.26.1 > ?[5] ath1121501cdf_2.6.0 ? ? ? biomaRt_2.4.0 > ?[7] genefilter_1.30.0 ? ? ? ? marray_1.26.0 > ?[9] gplots_2.8.0 ? ? ? ? ? ? ?caTools_1.10 > [11] bitops_1.0-4.1 ? ? ? ? ? ?gdata_2.7.2 > [13] gtools_2.6.2 ? ? ? ? ? ? ?bradiar1b520742cdf_1.24.0 > [15] arrayQualityMetrics_2.6.0 affyPLM_1.24.1 > [17] gcrma_2.20.0 ? ? ? ? ? ? ?preprocessCore_1.10.0 > [19] matchprobes_1.20.0 ? ? ? ?Biostrings_2.16.9 > [21] IRanges_1.6.15 ? ? ? ? ? ?AnnotationDbi_1.10.2 > [23] affxparser_1.20.0 ? ? ? ? makecdfenv_1.26.0 > [25] lattice_0.18-8 ? ? ? ? ? ?RMySQL_0.7-5 > [27] DBI_0.2-5 ? ? ? ? ? ? ? ? affy_1.26.1 > [29] Biobase_2.8.0 ? ? ? ? ? ? limma_3.4.4 > > loaded via a namespace (and not attached): > ?[1] RColorBrewer_1.0-2 ?RCurl_1.4-2 ? ? ? ? XML_3.1-1 > ?[4] affyio_1.16.0 ? ? ? beadarray_1.16.0 ? ?hwriter_1.2 > ?[7] latticeExtra_0.6-14 simpleaffy_2.24.0 ? splines_2.11.1 > [10] stats4_2.11.1 ? ? ? survival_2.35-8 ? ? tools_2.11.1 > [13] vsn_3.16.0 ? ? ? ? ?xtable_1.5-6 > > ? ? ? ?[[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 10 hours ago
United States
Hi Rene, On 10/4/2010 4:44 AM, Ren? Dreos wrote: > Dear BioC mailing list, > > I am trying to annotate Arabidopsis ATH1 genome array results using biomaRt, > but it looks like some of the probesets are not annotated in biomaRt > database. Here is one example: > >> library(biomaRt) >> AT.db<- useMart(biomart="plant_mart_6", dataset="athaliana_eg_gene") >> getBM(attributes = c("affy_ath1_121501","ensembl_gene_id","description"), > filters = "affy_ath1_121501", values = "254998_at", mart = AT.db) > [1] affy_ath1_121501 ensembl_gene_id description > <0 rows> (or 0-length row.names) > > But if I use ath1121501.db library to annotate the same probeset: > >> library(annotate) >> library(ath1121501.db) > >> mget("254998_at", env=ath1121501GENENAME) > $`254998_at` > [1] "encodes a choline synthase whose gene expression is induced by high > salt and mannitol." > >> mget("254998_at", env=ath1121501ACCNUM) > $`254998_at` > [1] "AT4G09760" > > Why is this happening? Because you are querying two different data sources and have found an instance in which they are not consistent. This is a pretty common occurrence, given how fluid gene definitions are (and likely will be for some time). Best, Jim > > Thank you for any advice, > best regards > r > >> sessionInfo() > R version 2.11.1 (2010-05-31) > x86_64-apple-darwin9.8.0 > > locale: > [1] C > > attached base packages: > [1] grid stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] ath1121501.db_2.4.1 org.At.tair.db_2.4.3 > [3] RSQLite_0.9-2 annotate_1.26.1 > [5] ath1121501cdf_2.6.0 biomaRt_2.4.0 > [7] genefilter_1.30.0 marray_1.26.0 > [9] gplots_2.8.0 caTools_1.10 > [11] bitops_1.0-4.1 gdata_2.7.2 > [13] gtools_2.6.2 bradiar1b520742cdf_1.24.0 > [15] arrayQualityMetrics_2.6.0 affyPLM_1.24.1 > [17] gcrma_2.20.0 preprocessCore_1.10.0 > [19] matchprobes_1.20.0 Biostrings_2.16.9 > [21] IRanges_1.6.15 AnnotationDbi_1.10.2 > [23] affxparser_1.20.0 makecdfenv_1.26.0 > [25] lattice_0.18-8 RMySQL_0.7-5 > [27] DBI_0.2-5 affy_1.26.1 > [29] Biobase_2.8.0 limma_3.4.4 > > loaded via a namespace (and not attached): > [1] RColorBrewer_1.0-2 RCurl_1.4-2 XML_3.1-1 > [4] affyio_1.16.0 beadarray_1.16.0 hwriter_1.2 > [7] latticeExtra_0.6-14 simpleaffy_2.24.0 splines_2.11.1 > [10] stats4_2.11.1 survival_2.35-8 tools_2.11.1 > [13] vsn_3.16.0 xtable_1.5-6 > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
ADD COMMENT

Login before adding your answer.

Traffic: 680 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6