Entering edit mode
Mike Dewar
▴
60
@mike-dewar-4038
Last seen 10.4 years ago
Hi,
I've been trying to figure out the usage of topGO for situations where
all you have is a list of interesting probesets, along with all the
probesets on a microarray. My problems, which have been discussed on
the biostar webpage, seem to have been around the specification of
geneSel and allGenes. I have ended up not specifying geneSel at all
(which would normally be some test to do with a p-value or other
score) and to specify allGenes as a named factor, where the probeset
is 1 if I consider it interesting, and 0 otherwise. In the code
snippet below, `exprset` is an ExpressionSet object and
`interesting_genes` is the list of probesets I find interesting.
library(topGO)
library(mogene10sttranscriptcluster.db)
load('exprset')
load('interesting_genes')
all_genes <- rownames(exprs(exprset))
# then make a factor that is 1 if the probeset is "interesting" and 0
otherwise
geneList <- factor(as.integer (all_genes %in% interesting_genes))
# name the factor with the probeset names
names (geneList) <- allGenes
# form the GOdata object
GOdata <-new ("topGOdata",
ontology = "BP",
allGenes = geneList,
nodeSize = 5,
# annot, tells topGO to map from GO terms to "genes"
annot = annFUN.GO2genes,
# so annot then calls something to perform this mapping GO2genes,
# which is this from the mogene... library
GO2genes = as.list(mogene10sttranscriptclusterGO2PROBE)
)
My questions for the list are:
1) is this OK? Will the results I get from a Fisher Test be valid?
They /seem/ fine.
2) if this is valid, would it be worth making it clearer in the topGO
documentation? It is specified that allGenes should be a vector of
strings, or a named numerical vector.
Cheers,
Mike Dewar
- - -
Dr Michael Dewar
Postdoctoral Research Scientist
Applied Mathematics
Columbia University
http://www.columbia.edu/~md2954/
[[alternative HTML version deleted]]