Question

Gene Set Enrichment Analysis with topGO

0

Entering edit mode

jacorvar ▴ 40

@jacorvar-8972

Last seen 5 months ago

European Union

Dear BioC community,

I have a boolean vector called pval where 1 means differentially expressed and 0 non-DE, whose element names are the Entrez IDs.

In order to make a GSEA according to GO MF, I do the next:

GOobj <- new("topGOdata", description = "Simple session",
    ontology = 'MF', allGenes = pval,
    geneSel = function(x) return(x), nodeSize = 10,
    annot = annFUN.org, mapping = annotation(transcript), ID = 'entrez')
allGO = usedGO(object = GOobj)
resultFisher <- runTest(GOobj, algorithm = "classic", statistic = "fisher")
allRes <- GenTable(GOobj, classicFisher = resultFisher,
                   orderBy = "classicFisher", ranksOf = "classicFisher", topNodes = length(allGO))

My main problem is that the lowest p-values always correspond to very general terms, such as binding, and therefore they comprise lots of other (probably more interesting) terms. Is there any way to avoid this with the topGO package or is there another package that solves this issue?

topGO gene set analysis • 1.5k views

ADD COMMENT • link updated 8.4 years ago by James W. MacDonald 65k • written 8.4 years ago by jacorvar ▴ 40

score 0 · Answer 1 · 2015-11-20

The topGO package is based on methods that Adrian Alexa developed to do exactly what you are talking about. Paradoxically it seems that most of the examples in the vignette and the help pages describe the classic Fisher's exact instead. Anyway, you probably want either the elim, weight or weight01 methods (the latter being a combination of the two former methods). The weight01 is the default method, so if you just did

resultFisher <- runTest(GOobj, statistic = "fisher")

you would probably get what you wanted all along. If you care to know about the different methods, see the package vignette, which has references.