Question: biomaRt query : inconsistent results for different Attribute set
0
gravatar for Md.Mamunur Rashid
8.8 years ago by
Md.Mamunur Rashid260 wrote:
Dear List, I am trying to download some annotation information from using biomaRt package. I am using some ENSG gene Ids as an identifier and trying to download more annotation information about those ENSG gene Ids from archived version (ENSM 50). what I do here is download two different data frames with different attribute sets. For some reason they have different number of unique ENSG gene Ids though I am passing same ENSG Ids in the *values* field of getBM() function. Here is the code that I am trying ..... library(biomaRt) listMarts(host="jul2008.archive.ensembl.org",path="/biomart/martservic e",archive=FALSE) mart_50 = useMart("ENSEMBL_MART_ENSEMBL", dataset="hsapiens_gene_ensembl", host="jul2008.archive.ensembl.org", path="/biomart/martservice", archive=FALSE) #### # case 1: with "entrezgene","refseq_dna" field in attribute field. #### annotation_obj_1<- getBM(attributes=c("ensembl_gene_id","hgnc_symbol", "description","chromosome_name","strand","band","start_position","end_ position","entrezgene","refseq_dna"),values= Ensm_ids , mart= mart_50,uniqueRows=TRUE) dim(annotation_obj_1) [1] 42391 10 ***?? check , how many unique ENSG Ids are here. : length(unique(annotation_obj_20101020_all[,1])) [1] 21785 *** In the first case I have 21785 unique ENSG ids #### Case 2 : without ---- "entrezgene","refseq_dna" field in attribute field. #### annotation_obj_2<- getBM(attributes=c("ensembl_gene_id","hgnc_symbol", "description","chromosome_name","strand","band","start_position","end_ position"),values= Ensm_ids , mart= mart_50,uniqueRows=TRUE) dim(annotation_obj_2) [1] 36777 8 length(unique(annotation_obj_2[,1])) [1] 36396 *** In the second case I have 36396 unique ENSG ids Question 1: can anybody please explain why there is such inconsistency in case of different attributes set even though the Ids passed in value field is same. Thanks in advance. regards, Mamun Here is the R session info. ---------------------------------------------------------------------- -------------------------- R version 2.11.1 (2010-05-31) x86_64-pc-mingw32 locale: [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C LC_TIME=English_United Kingdom.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] biomaRt_2.4.0 lumi_1.14.0 MASS_7.3-6 RSQLite_0.9-1 DBI_0.2-5 preprocessCore_1.10.0 mgcv_1.6-2 affy_1.26.1 annotate_1.26.0 [10] AnnotationDbi_1.10.1 Biobase_2.8.0 loaded via a namespace (and not attached): [1] affyio_1.16.0 grid_2.11.1 lattice_0.18-8 Matrix_0.999375-39 nlme_3.1-96 RCurl_1.4-3.1 tools_2.11.1 XML_3.1-1.1 xtable_1.5-6 [[alternative HTML version deleted]]
annotation • 713 views
ADD COMMENTlink modified 8.8 years ago by Steffen500 • written 8.8 years ago by Md.Mamunur Rashid260
Answer: biomaRt query : inconsistent results for different Attribute set
0
gravatar for Steffen
8.8 years ago by
Steffen500
Steffen500 wrote:
Dear Mamun, Thank you for reporting this inconsistent behavior. It looks like a bug at the http://www.biomart.org side not the biomaRt package. I'll contact the developers to see what is going on. Cheers, Steffen On Thu, Oct 21, 2010 at 6:57 AM, Md.Mamunur Rashid <mamunur.rashid@kcl.ac.uk> wrote: > Dear List, > I am trying to download some annotation information from using biomaRt > package. > I am using some ENSG gene Ids as an identifier and trying to download > more annotation > information about those ENSG gene Ids from archived version (ENSM 50). > > what I do here is download two different data frames with different > attribute sets. > For some reason they have different number of unique ENSG gene Ids > though I am > passing same ENSG Ids in the *values* field of getBM() function. > > Here is the code that I am trying ..... > > library(biomaRt) > listMarts(host="jul2008.archive.ensembl.org > ",path="/biomart/martservice",archive=FALSE) > mart_50 = useMart("ENSEMBL_MART_ENSEMBL", > dataset="hsapiens_gene_ensembl", > host="jul2008.archive.ensembl.org", > path="/biomart/martservice", > archive=FALSE) > > #### > # case 1: with "entrezgene","refseq_dna" field in attribute field. > #### > > annotation_obj_1<- > getBM(attributes=c("ensembl_gene_id","hgnc_symbol","description","ch romosome_name","strand","band","start_position","end_position","entrez gene","refseq_dna"),values= > Ensm_ids , mart= mart_50,uniqueRows=TRUE) > dim(annotation_obj_1) > [1] 42391 10 > ***?? check , how many unique ENSG Ids are here. : > length(unique(annotation_obj_20101020_all[,1])) > [1] 21785 > > *** In the first case I have 21785 unique ENSG ids > > > #### > Case 2 : without ---- "entrezgene","refseq_dna" field in attribute field. > #### > > annotation_obj_2<- > getBM(attributes=c("ensembl_gene_id","hgnc_symbol","description","ch romosome_name","strand","band","start_position","end_position"),values = > Ensm_ids , mart= mart_50,uniqueRows=TRUE) > dim(annotation_obj_2) > [1] 36777 8 > length(unique(annotation_obj_2[,1])) > [1] 36396 > > *** In the second case I have 36396 unique ENSG ids > > Question 1: > can anybody please explain why there is such inconsistency in case of > different > attributes set even though the Ids passed in value field is same. > > Thanks in advance. > > regards, > Mamun > > > Here is the R session info. > > > -------------------------------------------------------------------- ---------------------------- > R version 2.11.1 (2010-05-31) > x86_64-pc-mingw32 > > locale: > [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United > Kingdom.1252 LC_MONETARY=English_United Kingdom.1252 > LC_NUMERIC=C LC_TIME=English_United Kingdom.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] biomaRt_2.4.0 lumi_1.14.0 MASS_7.3-6 > RSQLite_0.9-1 DBI_0.2-5 preprocessCore_1.10.0 > mgcv_1.6-2 affy_1.26.1 annotate_1.26.0 > [10] AnnotationDbi_1.10.1 Biobase_2.8.0 > > loaded via a namespace (and not attached): > [1] affyio_1.16.0 grid_2.11.1 lattice_0.18-8 > Matrix_0.999375-39 nlme_3.1-96 RCurl_1.4-3.1 > tools_2.11.1 XML_3.1-1.1 xtable_1.5-6 > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENTlink written 8.8 years ago by Steffen500
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 142 users visited in the last hour