using BioMart to query UniProt identifiers
1
0
Entering edit mode
@wolfgang-raffelsberger-2876
Last seen 4 days ago
France
Dear list, Context : I'd like to calculate GO enrichments for a list of UniProt identifiers (note that they are "ID" or "Entry name" and NOT "AC" or "Accession"). So I tried to use BioMart to extract the GO-IDs for my list of UniProt identifiers, see code below. Basically after calling getBM() R doesn't return the command-line any more for more than 5 minutes. I tested this on Linux and Windows -> both same problem, so I suppose either I might be doing wrong or something isn't working right. Any hints ? Thank's in advance, Wolfgang Raffelsberger ## the code .. require(annotate) require(biomaRt) IDs <- c("MTMR1_HUMAN","MTMR2_HUMAN","MTMR3_HUMAN","MTMR4_HUMAN") ## existing UniProt IDs uniProt <- useMart("unimart") listAttributes(useDataset("uniprot",mart=uniProt)) ## contains "name" and "go_id" GO_IDs <- getBM(attributes =c("name","go_id"),values=IDs, mart=useDataset("uniprot",mart=uniProt)) ## after >5 minutes the command-line is still not returned ... ## for completeness : sessionInfo() R version 2.12.2 (2011-02-25) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=French_France.1252 LC_CTYPE=French_France.1252 [3] LC_MONETARY=French_France.1252 LC_NUMERIC=C [5] LC_TIME=French_France.1252 attached base packages: [1] grDevices datasets splines graphics stats tcltk utils [8] methods base other attached packages: [1] biomaRt_2.6.0 annotate_1.28.0 AnnotationDbi_1.12.0 [4] Biobase_2.10.0 svSocket_0.9-51 TinnR_1.0.3 [7] R2HTML_2.2 Hmisc_3.8-3 survival_2.36-5 loaded via a namespace (and not attached): [1] cluster_1.13.3 DBI_0.2-5 grid_2.12.2 lattice_0.19-17 [5] RCurl_1.4-2.1 RSQLite_0.9-4 svMisc_0.9-61 tools_2.12.2 [9] XML_3.1-0.1 xtable_1.5-6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wolfgang Raffelsberger, PhD IGBMC, 1 rue Laurent Fries, 67404 Illkirch Strasbourg, France Tel (+33) 388 65 3300 Fax (+33) 388 65 3276 wolfgang.raffelsberger (at) igbmc.fr [[alternative HTML version deleted]]
GO biomaRt GO biomaRt • 3.6k views
0
Entering edit mode
@steffen-durinck-4465
Last seen 7.1 years ago
Hi Wolfgang, There are a few issues: 1) You're missing a filter attribute in your getBM query. This will result in you querying for GO ids of everything that is in uniprot and that is probably why it is taking so long. If you do the following commands it should be fast: uniProt <- useMart("unimart", dataset="uniprot") IDs <- c("MTMR1_HUMAN","MTMR2_HUMAN","MTMR3_HUMAN","MTMR4_HUMAN") GO_IDs <- getBM(attributes =c("name","go_id"),filter="accession",values=IDs ,mart=uniProt) 2) You'll notice that you don't get anything back. You'll either need to give it an accession number (for MTMR1 this is Q13613) and use the accession filter name or give it a gene name e.g. MTMR1 and use the gene_name filter. e.g.: getBM(attributes =c("name","go_id"),filter="gene_name",values="MTMR1" ,mart=uniProt) or getBM(attributes =c("name","go_id"),filter="accession",values="Q13613" ,mart=uniProt) Cheers, Steffen On Wed, Apr 6, 2011 at 8:50 AM, Wolfgang RAFFELSBERGER <wraff at="" igbmc.fr=""> wrote: > Dear list, > > Context : I'd like to calculate GO enrichments for a list of UniProt identifiers (note that they are "ID" or "Entry name" and NOT "AC" or "Accession"). > So I tried to use BioMart to extract the GO-IDs for my list of UniProt identifiers, see code below. > Basically after calling getBM() R doesn't return the command-line any more for more than 5 minutes. I tested this on Linux and Windows -> both same problem, so I suppose either I might be doing wrong or something isn't working right. > > Any hints ?? > > Thank's in advance, > Wolfgang Raffelsberger > > > ## the code .. > ?require(annotate) > ?require(biomaRt) > > ?IDs <- c("MTMR1_HUMAN","MTMR2_HUMAN","MTMR3_HUMAN","MTMR4_HUMAN") ?## ?existing UniProt IDs > > ?uniProt <- useMart("unimart") > ?listAttributes(useDataset("uniprot",mart=uniProt)) ? ## contains "name" and "go_id" > ?GO_IDs <- getBM(attributes =c("name","go_id"),values=IDs, mart=useDataset("uniprot",mart=uniProt)) > ## after >5 minutes the command-line is still not returned ... > > > ## for completeness : > ?sessionInfo() > > R version 2.12.2 (2011-02-25) > Platform: i386-pc-mingw32/i386 (32-bit) > > locale: > [1] LC_COLLATE=French_France.1252 ?LC_CTYPE=French_France.1252 > [3] LC_MONETARY=French_France.1252 LC_NUMERIC=C > [5] LC_TIME=French_France.1252 > > attached base packages: > [1] grDevices datasets ?splines ? graphics ?stats ? ? tcltk ? ? utils > [8] methods ? base > > other attached packages: > [1] biomaRt_2.6.0 ? ? ? ?annotate_1.28.0 ? ? ?AnnotationDbi_1.12.0 > [4] Biobase_2.10.0 ? ? ? svSocket_0.9-51 ? ? ?TinnR_1.0.3 > [7] R2HTML_2.2 ? ? ? ? ? Hmisc_3.8-3 ? ? ? ? ?survival_2.36-5 > > loaded via a namespace (and not attached): > ?[1] cluster_1.13.3 ?DBI_0.2-5 ? ? ? grid_2.12.2 ? ? lattice_0.19-17 > ?[5] RCurl_1.4-2.1 ? RSQLite_0.9-4 ? svMisc_0.9-61 ? tools_2.12.2 > ?[9] XML_3.1-0.1 ? ? xtable_1.5-6 > > > > . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . > Wolfgang Raffelsberger, PhD > IGBMC, > 1 rue Laurent Fries, ?67404 Illkirch ?Strasbourg, ?France > Tel (+33) 388 65 3300 ? ? ? ? Fax (+33) 388 65 3276 > wolfgang.raffelsberger (at) igbmc.fr > > ? ? ? ?[[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >