Entering edit mode
Dick Beyer
★
1.4k
@dick-beyer-26
Last seen 10.2 years ago
Hello,
I have a puzzling problem with a disagreement between the numbers of
EntrezGene IDs I get from biomaRt and what I get from topGO for the GO
category GO:0006836, neurotransmitter transport. I would assume the
mismatch would be due to different GO builds, but what I find seems
too extreme for that.
Let me give a little background to how I get my list of Entrez Gene
IDs.
I start with the file
ftp://ftp.ebi.ac.uk/pub/databases/IPI/current/ipi.HUMAN.dat.gz
from which I extract IPI, EG, Symbol, description, UniGene
This eventually gives me a list of 22963 unique Entrez Gene IDs. I
process this list through biomaRt as follows:
library(biomaRt)
mart <- useMart( "ensembl", dataset="hsapiens_gene_ensembl")
get.go.biomart <-
getGO(id=ipi.LL.sym.descrip.ug.unique_eg[,2],type="entrezgene",mart=ma
rt)
>From this data.frame I can determine that there are 32 unique Entrez
Gene IDs in the Gene Ontology Biological Process category GO:0006836,
neurotransmitter transport. Everything is fine so far.
Now I set up things for a topGO (non-affy) calculation:
gene2GO <- as.list(get.go.biomart$go)
names(gene2GO) <- get.go.biomart$entrezgene
geneList <- list()
geneList[1] <-
list(factor(as.integer(ipi.LL.sym.descrip.ug.unique_eg[,2] %in%
LL1027[,2])))
names(geneList[[1]]) <- ipi.LL.sym.descrip.ug.unique_eg[,2]
This geneList[[1]] has, among its 22963 entries, 32 entries (that is,
32 of the "names" are Entrez Gene IDs) that are in category
GO:0006836, and 4 of entries I have marked as significant by setting
their values to 1.
Next, I create a topGOdata Biological Process object:
BP.GOdata <- list()
BP.GOdata[1i] <- list(new("topGOdata", ontology = "BP", allGenes =
geneList[[1]],
annot = annFUN.gene2GO, gene2GO = gene2GO))
and submit this topGOdata object to the following:
test.stat.BP <- list()
result.BP <- list()
test.stat.BP[1] <- list(new("classicCount", testStatistic =
GOFisherTest, name = "Fisher test"))
result.BP[1] <- list(getSigGroups(BP.GOdata,
test.stat.BP[[1]]))
test.stat.BP[2] <- list(new("elimCount", testStatistic =
GOFisherTest, name = "Fisher test", cutOff = 0.01))
result.BP[2] <- list(getSigGroups(BP.GOdata,
test.stat.BP[[2]]))
test.stat.BP[3] <- list(new("weightCount", testStatistic =
GOFisherTest, name = "Fisher test", sigRatio = "ratio"))
result.BP[3] <- list(getSigGroups(BP.GOdata,
test.stat.BP[[3]]))
l <- list(classic = score(result.BP[[1]]), elim =
score(result.BP[[2]]), weight = score(result.BP[[3]]))
allRes.BP <- genTable(BP.GOdata, l, orderBy = "classic", ranksOf
= "classic", top = length(score(result.BP[[1]])))
My resulting data.frame, allRes.BP, shows that for category
GO:0006836, there are only 11 annotated Entrez Gene IDs, not 32. The
call to genTable() does correctly find that 4 of these are
significant.
This latest released version of topGO must be using the Mar 14
11:46:06 2007 build of GO, and I assume biomaRt is more current as I
did the query just today. If category GO:0006836, neurotransmitter
transport, has grown from 11 EGs to 32 EGs between now and last March
for Human, then everything is as it should be. But this seems
unlikely to me. So I wonder if there is a bug in the topGO
calculations.
Here is my session:
sessionInfo()
R version 2.5.1 (2007-06-27)
i386-pc-mingw32
locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
States.1252;LC_MONETARY=English_United
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
attached base packages:
[1] "tools" "stats" "graphics" "grDevices" "utils"
"datasets" "methods" "base"
other attached packages:
goTools hgu133plus2 Rgraphviz geneplotter lattice
topGO SparseM annotate GO graph Biobase
biomaRt RCurl XML
"1.8.0" "1.16.0" "1.14.1" "1.14.0" "0.15-11"
"1.2.1" "0.73" "1.14.1" "1.16.0" "1.14.2" "1.14.1"
"1.10.1" "0.8-0" "1.9-0"
Thanks very much,
Dick
**********************************************************************
*********
Richard P. Beyer, Ph.D. University of Washington
Tel.:(206) 616 7378 Env. & Occ. Health Sci. , Box 354695
Fax: (206) 685 4696 4225 Roosevelt Way NE, # 100
Seattle, WA 98105-6099
http://depts.washington.edu/ceeh/ServiceCores/FC5/FC5.html
http://staff.washington.edu/~dbeyer