Entering edit mode
Md.Mamunur Rashid
▴
260
@mdmamunur-rashid-3595
Last seen 10.2 years ago
Dear List,
I am trying to download some annotation information from using biomaRt
package.
I am using some ENSG gene Ids as an identifier and trying to download
more annotation
information about those ENSG gene Ids from archived version (ENSM 50).
what I do here is download two different data frames with different
attribute sets.
For some reason they have different number of unique ENSG gene Ids
though I am
passing same ENSG Ids in the *values* field of getBM() function.
Here is the code that I am trying .....
library(biomaRt)
listMarts(host="jul2008.archive.ensembl.org",path="/biomart/martservic
e",archive=FALSE)
mart_50 = useMart("ENSEMBL_MART_ENSEMBL",
dataset="hsapiens_gene_ensembl",
host="jul2008.archive.ensembl.org",
path="/biomart/martservice",
archive=FALSE)
####
# case 1: with "entrezgene","refseq_dna" field in attribute field.
####
annotation_obj_1<- getBM(attributes=c("ensembl_gene_id","hgnc_symbol",
"description","chromosome_name","strand","band","start_position","end_
position","entrezgene","refseq_dna"),values= Ensm_ids , mart=
mart_50,uniqueRows=TRUE)
dim(annotation_obj_1)
[1] 42391 10
***?? check , how many unique ENSG Ids are here. :
length(unique(annotation_obj_20101020_all[,1]))
[1] 21785
*** In the first case I have 21785 unique ENSG ids
####
Case 2 : without ---- "entrezgene","refseq_dna" field in attribute
field.
####
annotation_obj_2<- getBM(attributes=c("ensembl_gene_id","hgnc_symbol",
"description","chromosome_name","strand","band","start_position","end_
position"),values= Ensm_ids , mart= mart_50,uniqueRows=TRUE)
dim(annotation_obj_2)
[1] 36777 8
length(unique(annotation_obj_2[,1]))
[1] 36396
*** In the second case I have 36396 unique ENSG ids
Question 1:
can anybody please explain why there is such inconsistency in case of
different
attributes set even though the Ids passed in value field is same.
Thanks in advance.
regards,
Mamun
Here is the R session info.
----------------------------------------------------------------------
--------------------------
R version 2.11.1 (2010-05-31)
x86_64-pc-mingw32
locale:
[1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United
Kingdom.1252 LC_MONETARY=English_United Kingdom.1252
LC_NUMERIC=C LC_TIME=English_United
Kingdom.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] biomaRt_2.4.0 lumi_1.14.0 MASS_7.3-6
RSQLite_0.9-1 DBI_0.2-5 preprocessCore_1.10.0
mgcv_1.6-2 affy_1.26.1 annotate_1.26.0
[10] AnnotationDbi_1.10.1 Biobase_2.8.0
loaded via a namespace (and not attached):
[1] affyio_1.16.0 grid_2.11.1 lattice_0.18-8
Matrix_0.999375-39 nlme_3.1-96 RCurl_1.4-3.1
tools_2.11.1 XML_3.1-1.1 xtable_1.5-6
[[alternative HTML version deleted]]