Entering edit mode
Mike Dewar
Last seen 10.4 years ago
I've been trying to figure out the usage of topGO for situations where
all you have is a list of interesting probesets, along with all the
probesets on a microarray. My problems, which have been discussed on
the biostar webpage, seem to have been around the specification of
geneSel and allGenes. I have ended up not specifying geneSel at all
(which would normally be some test to do with a p-value or other
score) and to specify allGenes as a named factor, where the probeset
is 1 if I consider it interesting, and 0 otherwise. In the code
snippet below, `exprset` is an ExpressionSet object and
`interesting_genes` is the list of probesets I find interesting.
all_genes <- rownames(exprs(exprset))
# then make a factor that is 1 if the probeset is "interesting" and 0
geneList <- factor(as.integer (all_genes %in% interesting_genes))
# name the factor with the probeset names
names (geneList) <- allGenes
# form the GOdata object
GOdata <-new ("topGOdata",
ontology = "BP",
allGenes = geneList,
nodeSize = 5,
# annot, tells topGO to map from GO terms to "genes"
annot = annFUN.GO2genes,
# so annot then calls something to perform this mapping GO2genes,
# which is this from the mogene... library
GO2genes = as.list(mogene10sttranscriptclusterGO2PROBE)
My questions for the list are:
1) is this OK? Will the results I get from a Fisher Test be valid?
They /seem/ fine.
2) if this is valid, would it be worth making it clearer in the topGO
documentation? It is specified that allGenes should be a vector of
strings, or a named numerical vector.
Mike Dewar
- - -
Dr Michael Dewar
Postdoctoral Research Scientist
Applied Mathematics
Columbia University
[[alternative HTML version deleted]]