Entering edit mode
After my premature posting yesterday, I am bit hesitant to ask, but I
am
puzzled by what I am getting from biomaRt. (To avoid clutter, I added
the sessionInfo at the end of the message.)
I used ReadAffy() to read in a rat dataset and called it CELdata.
CELdata
AffyBatch object
size of arrays=834x834 features (19 kb)
cdf=Rat230_2 (31099 affyids)
number of samples=8
number of genes=31099
annotation=rat2302
notes=
features=featureNames(CELdata)
>length(features)
[1] 31099
>sumis.na(features))
[1] 0
I use features to query biomaRt for the Entrez-ids. I got back only
18882 probesets (but actually fewer, because some probesets are
matched to 2 Entrez-ids). On the other hand, some of the Affy-ids
there were returned did not match anything, so I am not sure why they
were returned.
matchFeature=getBM(attributes=c('affy_rat230_2','entrezgene'), filters
='affy_rat230_2', values = features, mart = ensembl)
>dim(matchFeature)
[1] 18882 2
>sum(!is.na(matchFeature$affy_rat230_2))
[1] 18882
>sum(!is.na(matchFeature$entrezgene))
[1] 17814
I then use the non-missing Entrez-ids to query biomaRt for the Affy-
ids. I got back only 18249 Entrez-ids (presumable because some
Entrez-ids are matched to 2 probesets). Nothing is missing.
matchEntrez=getBM(attributes=c('affy_rat230_2','entrezgene'), filters
='entrezgene', values = matchFeature[!is.na(matchFeature[,2]),2], mart
= ensembl)
>dim(matchEntrez)
[1] 18249 2
>sum(!is.na(matchEntrez[,1]))
[1] 18249
>sum(!is.na(matchEntrez[,2]))
[1] 18249
I am pretty sure that the discrepancies in the counts has to do with
how getBM is handling multiple matches.
length(unique(matchFeature[,1]))
[1] 16851
>length(unique(matchEntrez[,1]))
[1] 16143
>length(unique(matchFeature[,2]))
[1] 13738
>length(unique(matchEntrez[,2]))
[1] 13737
>length(unique(matchFeature[!is.na(matchFeature[,2]),1]))
[1] 16142
In any case, I seem to be missing about 13000 probesets. Surely there
cannot be that many probesets on the array with no Entrez-id?
Thanks for any help you can provide.
Naomi Altman
>sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United
States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] parallel stats graphics grDevices utils datasets
methods base
other attached packages:
[1] rat2302cdf_2.13.0 hgu95av2cdf_2.13.0 AnnotationDbi_1.24.0
biomaRt_2.18.0
[5] edgeR_3.4.2 limma_3.18.13 affy_1.40.0
Biobase_2.22.0
[9] BiocGenerics_0.8.0
loaded via a namespace (and not attached):
[1] affyio_1.30.0 BiocInstaller_1.12.0 DBI_0.2-7
IRanges_1.20.7
[5] preprocessCore_1.24.0 RCurl_1.95-4.1 RSQLite_0.11.4
stats4_3.0.2
[9] tools_3.0.2 XML_3.98-1.1 zlibbioc_1.8.0
[[alternative HTML version deleted]]