biomaRt query : inconsistent results for different Attribute set
1
0
Entering edit mode
@mdmamunur-rashid-3595
Last seen 11.1 years ago
Dear List, I am trying to download some annotation information from using biomaRt package. I am using some ENSG gene Ids as an identifier and trying to download more annotation information about those ENSG gene Ids from archived version (ENSM 50). what I do here is download two different data frames with different attribute sets. For some reason they have different number of unique ENSG gene Ids though I am passing same ENSG Ids in the *values* field of getBM() function. Here is the code that I am trying ..... library(biomaRt) listMarts(host="jul2008.archive.ensembl.org",path="/biomart/martservic e",archive=FALSE) mart_50 = useMart("ENSEMBL_MART_ENSEMBL", dataset="hsapiens_gene_ensembl", host="jul2008.archive.ensembl.org", path="/biomart/martservice", archive=FALSE) #### # case 1: with "entrezgene","refseq_dna" field in attribute field. #### annotation_obj_1<- getBM(attributes=c("ensembl_gene_id","hgnc_symbol", "description","chromosome_name","strand","band","start_position","end_ position","entrezgene","refseq_dna"),values= Ensm_ids , mart= mart_50,uniqueRows=TRUE) dim(annotation_obj_1) [1] 42391 10 ***?? check , how many unique ENSG Ids are here. : length(unique(annotation_obj_20101020_all[,1])) [1] 21785 *** In the first case I have 21785 unique ENSG ids #### Case 2 : without ---- "entrezgene","refseq_dna" field in attribute field. #### annotation_obj_2<- getBM(attributes=c("ensembl_gene_id","hgnc_symbol", "description","chromosome_name","strand","band","start_position","end_ position"),values= Ensm_ids , mart= mart_50,uniqueRows=TRUE) dim(annotation_obj_2) [1] 36777 8 length(unique(annotation_obj_2[,1])) [1] 36396 *** In the second case I have 36396 unique ENSG ids Question 1: can anybody please explain why there is such inconsistency in case of different attributes set even though the Ids passed in value field is same. Thanks in advance. regards, Mamun Here is the R session info. ---------------------------------------------------------------------- -------------------------- R version 2.11.1 (2010-05-31) x86_64-pc-mingw32 locale: [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C LC_TIME=English_United Kingdom.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] biomaRt_2.4.0 lumi_1.14.0 MASS_7.3-6 RSQLite_0.9-1 DBI_0.2-5 preprocessCore_1.10.0 mgcv_1.6-2 affy_1.26.1 annotate_1.26.0 [10] AnnotationDbi_1.10.1 Biobase_2.8.0 loaded via a namespace (and not attached): [1] affyio_1.16.0 grid_2.11.1 lattice_0.18-8 Matrix_0.999375-39 nlme_3.1-96 RCurl_1.4-3.1 tools_2.11.1 XML_3.1-1.1 xtable_1.5-6 [[alternative HTML version deleted]]
Annotation Annotation • 1.5k views
ADD COMMENT
0
Entering edit mode
Steffen ▴ 500
@steffen-2351
Last seen 11.1 years ago
Dear Mamun, Thank you for reporting this inconsistent behavior. It looks like a bug at the http://www.biomart.org side not the biomaRt package. I'll contact the developers to see what is going on. Cheers, Steffen On Thu, Oct 21, 2010 at 6:57 AM, Md.Mamunur Rashid <mamunur.rashid@kcl.ac.uk> wrote: > Dear List, > I am trying to download some annotation information from using biomaRt > package. > I am using some ENSG gene Ids as an identifier and trying to download > more annotation > information about those ENSG gene Ids from archived version (ENSM 50). > > what I do here is download two different data frames with different > attribute sets. > For some reason they have different number of unique ENSG gene Ids > though I am > passing same ENSG Ids in the *values* field of getBM() function. > > Here is the code that I am trying ..... > > library(biomaRt) > listMarts(host="jul2008.archive.ensembl.org > ",path="/biomart/martservice",archive=FALSE) > mart_50 = useMart("ENSEMBL_MART_ENSEMBL", > dataset="hsapiens_gene_ensembl", > host="jul2008.archive.ensembl.org", > path="/biomart/martservice", > archive=FALSE) > > #### > # case 1: with "entrezgene","refseq_dna" field in attribute field. > #### > > annotation_obj_1<- > getBM(attributes=c("ensembl_gene_id","hgnc_symbol","description","ch romosome_name","strand","band","start_position","end_position","entrez gene","refseq_dna"),values= > Ensm_ids , mart= mart_50,uniqueRows=TRUE) > dim(annotation_obj_1) > [1] 42391 10 > ***?? check , how many unique ENSG Ids are here. : > length(unique(annotation_obj_20101020_all[,1])) > [1] 21785 > > *** In the first case I have 21785 unique ENSG ids > > > #### > Case 2 : without ---- "entrezgene","refseq_dna" field in attribute field. > #### > > annotation_obj_2<- > getBM(attributes=c("ensembl_gene_id","hgnc_symbol","description","ch romosome_name","strand","band","start_position","end_position"),values = > Ensm_ids , mart= mart_50,uniqueRows=TRUE) > dim(annotation_obj_2) > [1] 36777 8 > length(unique(annotation_obj_2[,1])) > [1] 36396 > > *** In the second case I have 36396 unique ENSG ids > > Question 1: > can anybody please explain why there is such inconsistency in case of > different > attributes set even though the Ids passed in value field is same. > > Thanks in advance. > > regards, > Mamun > > > Here is the R session info. > > > -------------------------------------------------------------------- ---------------------------- > R version 2.11.1 (2010-05-31) > x86_64-pc-mingw32 > > locale: > [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United > Kingdom.1252 LC_MONETARY=English_United Kingdom.1252 > LC_NUMERIC=C LC_TIME=English_United Kingdom.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] biomaRt_2.4.0 lumi_1.14.0 MASS_7.3-6 > RSQLite_0.9-1 DBI_0.2-5 preprocessCore_1.10.0 > mgcv_1.6-2 affy_1.26.1 annotate_1.26.0 > [10] AnnotationDbi_1.10.1 Biobase_2.8.0 > > loaded via a namespace (and not attached): > [1] affyio_1.16.0 grid_2.11.1 lattice_0.18-8 > Matrix_0.999375-39 nlme_3.1-96 RCurl_1.4-3.1 > tools_2.11.1 XML_3.1-1.1 xtable_1.5-6 > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT

Login before adding your answer.

Traffic: 976 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6