Biomart query in Web interface Vs. biomaRt package?
2
0
Entering edit mode
@jjplebreclumcnl-2413
Last seen 10.2 years ago
Hi, Using the web based Biomart tool ( http://www.ensembl.org/biomart/martview/ ) in database=Ensembl 46, dataset=Homo sapiens Genes (NCBI 36), I have manually extracted all unique genes' 'External Gene ID' using GO pathway GO:0006996 as a filter. I obtained 1141 unique genes. I tried to automate the process using the BiomaRt package with the below query which only yielded 9 unique genes! > human = useMart("ensembl", dataset = "hsapiens_gene_ensembl") Checking attributes and filters ... ok > getBM(attributes = "external_gene_id", filters = "go", values = "GO:0006996", mart = human) external_gene_id 1 KIF3A 2 HPS3 3 HPS3 4 DTNBP1 5 DTNBP1 6 KIF5C 7 KIF4A 8 HPS1 9 HPS6 10 HPS6 11 HPS6 12 KIF25 13 HPS4 > > > sessionInfo() R version 2.5.1 (2007-06-27) i386-pc-mingw32 locale: LC_COLLATE=French_France.1252;LC_CTYPE=French_France.1252;LC_MONETARY= Fr ench_France.1252;LC_NUMERIC=C;LC_TIME=French_France.1252 attached base packages: [1] "stats" "graphics" "grDevices" "utils" "datasets" "methods" [7] "base" other attached packages: biomaRt RCurl XML "1.10.1" "0.8-0" "1.9-0" > I thought the two queries to be equivalent, could you please tell me what I am doing wrong here? Many thanks in advance, Jeremie
GO PROcess biomaRt GO PROcess biomaRt • 1.5k views
ADD COMMENT
0
Entering edit mode
Steffen ▴ 500
@steffen-2351
Last seen 10.2 years ago
Hi Jeremie, Many thanks for reporting this. Yes the BioMart web interfaces and biomaRt results should be identical if the same query is used. The query you sent via the web interface was (you can see this by clicking the XML button): <query virtualschemaname="default" header="0" uniquerows="0" count="" datasetconfigversion="0.6"> <dataset name="hsapiens_gene_ensembl" interface="default"> <filter name="biol_process" value="GO:0006996"/> <attribute name="ensembl_gene_id"/> <attribute name="ensembl_transcript_id"/> <attribute name="hgnc_symbol"/> <attribute name="external_gene_id"/> </dataset> </query> While the query via biomaRt was (you can see this by setting verbose = TRUE): > getBM(attributes = "external_gene_id", filters = "go", values ="GO:0006996", mart = human, verbose=TRUE) <query virtualschemaname="default" uniquerows="1" count="0" datasetconfigversion="0.6" requestid="biomaRt"> <dataset name="hsapiens_gene_ensembl"><attribute name="external_gene_id"/><filter name="go" value="GO:0006996"/></dataset></query> These queries use different filter names and indeed give different results but I'm not sure if this is intended. We should contact the Ensembl helpdesk to report the inconsistency so we can figure out what's going on. Cheers, Steffen J.J.P.Lebrec at lumc.nl wrote: > Hi, > > Using the web based Biomart tool ( > http://www.ensembl.org/biomart/martview/ ) in database=Ensembl 46, > dataset=Homo sapiens Genes (NCBI 36), I have manually extracted all > unique genes' 'External Gene ID' using GO pathway GO:0006996 as a > filter. I obtained 1141 unique genes. > > I tried to automate the process using the BiomaRt package with the below > query which only yielded 9 unique genes! > > >> human = useMart("ensembl", dataset = "hsapiens_gene_ensembl") >> > Checking attributes and filters ... ok > >> getBM(attributes = "external_gene_id", filters = "go", values = >> > "GO:0006996", mart = human) > external_gene_id > 1 KIF3A > 2 HPS3 > 3 HPS3 > 4 DTNBP1 > 5 DTNBP1 > 6 KIF5C > 7 KIF4A > 8 HPS1 > 9 HPS6 > 10 HPS6 > 11 HPS6 > 12 KIF25 > 13 HPS4 > >> sessionInfo() >> > R version 2.5.1 (2007-06-27) > i386-pc-mingw32 > > locale: > LC_COLLATE=French_France.1252;LC_CTYPE=French_France.1252;LC_MONETAR Y=Fr > ench_France.1252;LC_NUMERIC=C;LC_TIME=French_France.1252 > > attached base packages: > [1] "stats" "graphics" "grDevices" "utils" "datasets" > "methods" > [7] "base" > > other attached packages: > biomaRt RCurl XML > "1.10.1" "0.8-0" "1.9-0" > > > I thought the two queries to be equivalent, could you please tell me > what I am doing wrong here? > > Many thanks in advance, > > Jeremie > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > >
ADD COMMENT
0
Entering edit mode
Steffen ▴ 500
@steffen-2351
Last seen 10.2 years ago
Hi Jeremie, Below the answer from the Ensembl helpdesk, in short the 'go' filter will retrieve all genes associated with a particular GO identifier and the 'biol_process' filter will retrieve all genes associated with a particular GO identifier and all of it's children thus explaining why one gets more genes when using 'biol_process' compared to 'go' as filter. (the Ensembl BioMart Web interface uses 'biol_process' and you used 'go' in your biomaRt query) Cheers, Steffen ----- When you query BioMart filtering a specific GO term (GO:0006996, or a list) you can retrieve all those entries associated to that/those GO term(s)... But if you filter using a 'Biological process' and then add an ID, in this case you get all the entries matching that ID and all the children... organelle organization and biogenesis [GO:0006996] autophagic vacuole formation [GO:0000045] chromosome organization and biogenesis [GO:0051276] chromosome condensation [GO:0030261] chromosome decondensation [GO:0051312] chromosome organization and biogenesis (sensu Bacteria) [GO:0051277] chromosome organization and biogenesis (sensu Eukaryota) [GO:0007001] chromosome breakage [GO:0031052] establishment and/or maintenance of chromatin architecture [GO:0006325] karyosome formation [GO:0030717] .... As seen here: http://www.ensembl.org/Homo_sapiens/goview?depth=2;query=organelle+org anization+and+biogenesis I hope this explains, -- Xose M Fernandez (Ensembl User Support) J.J.P.Lebrec at lumc.nl wrote: > Hi, > > Using the web based Biomart tool ( > http://www.ensembl.org/biomart/martview/ ) in database=Ensembl 46, > dataset=Homo sapiens Genes (NCBI 36), I have manually extracted all > unique genes' 'External Gene ID' using GO pathway GO:0006996 as a > filter. I obtained 1141 unique genes. > > I tried to automate the process using the BiomaRt package with the below > query which only yielded 9 unique genes! > > >> human = useMart("ensembl", dataset = "hsapiens_gene_ensembl") >> > Checking attributes and filters ... ok > >> getBM(attributes = "external_gene_id", filters = "go", values = >> > "GO:0006996", mart = human) > external_gene_id > 1 KIF3A > 2 HPS3 > 3 HPS3 > 4 DTNBP1 > 5 DTNBP1 > 6 KIF5C > 7 KIF4A > 8 HPS1 > 9 HPS6 > 10 HPS6 > 11 HPS6 > 12 KIF25 > 13 HPS4 > >> sessionInfo() >> > R version 2.5.1 (2007-06-27) > i386-pc-mingw32 > > locale: > LC_COLLATE=French_France.1252;LC_CTYPE=French_France.1252;LC_MONETAR Y=Fr > ench_France.1252;LC_NUMERIC=C;LC_TIME=French_France.1252 > > attached base packages: > [1] "stats" "graphics" "grDevices" "utils" "datasets" > "methods" > [7] "base" > > other attached packages: > biomaRt RCurl XML > "1.10.1" "0.8-0" "1.9-0" > > > I thought the two queries to be equivalent, could you please tell me > what I am doing wrong here? > > Many thanks in advance, > > Jeremie > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > >
ADD COMMENT
0
Entering edit mode
Hi Steffen, There does not seem to be a 'biol_process' filter in dataset 'hsapiens_gene_ensembl' (see below). > human = useMart("ensembl", dataset = "hsapiens_gene_ensembl") Checking attributes and filters ... ok > getBM(attributes = "external_gene_id", filters = "biol_process", values = "GO:0006996", mart = human) Erreur dans getBM(attributes = "external_gene_id", filters = "biol_process", : Invalid filters(s): biol_process Please use the function 'listFilters' to get valid filter names > So I have tried to generate the same gene list as in the web query (which yields exactly 1140 unique genes) using the following code to get all biological processes offspring of GO:0006996 : > library(GO) > dim( unique( getBM(attributes = c("external_gene_id"), filters = "go", values = c("GO:0006996",GOBPOFFSPRING$"GO:0006996"), mart = human) ) ) [1] 1143 1 The two gene lists have 1137 genes in common and I cannot explain this remaining discrepancy. Thanks again for enquiring about this, J?r?mie -----Original Message----- From: Steffen [mailto:sdurinck@lbl.gov] Sent: lundi 8 octobre 2007 18:14 To: Lebrec, J.J.P. (MSTAT) Cc: bioconductor at stat.math.ethz.ch Subject: Re: [BioC] Biomart query in Web interface Vs. biomaRt package? Hi Jeremie, Below the answer from the Ensembl helpdesk, in short the 'go' filter will retrieve all genes associated with a particular GO identifier and the 'biol_process' filter will retrieve all genes associated with a particular GO identifier and all of it's children thus explaining why one gets more genes when using 'biol_process' compared to 'go' as filter. (the Ensembl BioMart Web interface uses 'biol_process' and you used 'go' in your biomaRt query) Cheers, Steffen ----- When you query BioMart filtering a specific GO term (GO:0006996, or a list) you can retrieve all those entries associated to that/those GO term(s)... But if you filter using a 'Biological process' and then add an ID, in this case you get all the entries matching that ID and all the children... organelle organization and biogenesis [GO:0006996] autophagic vacuole formation [GO:0000045] chromosome organization and biogenesis [GO:0051276] chromosome condensation [GO:0030261] chromosome decondensation [GO:0051312] chromosome organization and biogenesis (sensu Bacteria) [GO:0051277] chromosome organization and biogenesis (sensu Eukaryota) [GO:0007001] chromosome breakage [GO:0031052] establishment and/or maintenance of chromatin architecture [GO:0006325] karyosome formation [GO:0030717] .... As seen here: http://www.ensembl.org/Homo_sapiens/goview?depth=2;query=organelle+org anization+and+biogenesis I hope this explains, -- Xose M Fernandez (Ensembl User Support) J.J.P.Lebrec at lumc.nl wrote: > Hi, > > Using the web based Biomart tool ( > http://www.ensembl.org/biomart/martview/ ) in database=Ensembl 46, > dataset=Homo sapiens Genes (NCBI 36), I have manually extracted all > unique genes' 'External Gene ID' using GO pathway GO:0006996 as a > filter. I obtained 1141 unique genes. > > I tried to automate the process using the BiomaRt package with the below > query which only yielded 9 unique genes! > > >> human = useMart("ensembl", dataset = "hsapiens_gene_ensembl") >> > Checking attributes and filters ... ok > >> getBM(attributes = "external_gene_id", filters = "go", values = >> > "GO:0006996", mart = human) > external_gene_id > 1 KIF3A > 2 HPS3 > 3 HPS3 > 4 DTNBP1 > 5 DTNBP1 > 6 KIF5C > 7 KIF4A > 8 HPS1 > 9 HPS6 > 10 HPS6 > 11 HPS6 > 12 KIF25 > 13 HPS4 > >> sessionInfo() >> > R version 2.5.1 (2007-06-27) > i386-pc-mingw32 > > locale: > LC_COLLATE=French_France.1252;LC_CTYPE=French_France.1252;LC_MONETAR Y=Fr > ench_France.1252;LC_NUMERIC=C;LC_TIME=French_France.1252 > > attached base packages: > [1] "stats" "graphics" "grDevices" "utils" "datasets" > "methods" > [7] "base" > > other attached packages: > biomaRt RCurl XML > "1.10.1" "0.8-0" "1.9-0" > > > I thought the two queries to be equivalent, could you please tell me > what I am doing wrong here? > > Many thanks in advance, > > Jeremie > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > >
ADD REPLY

Login before adding your answer.

Traffic: 679 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6