I have been using
goana() from edgeR to perform GO enrichment analysis on my list of DE genes and I'm puzzled by the results. My code to get the output is as follows:
design <- model.matrix(~condition) disp <- estimateDisp(y, design, robust=TRUE) fit <- glmFit(disp, design, robust=TRUE) lrt <- glmLRT(fit, coef = 2) go <- goana(lrt, FDR=0.1) topgobp <- topGO(go, ont="BP", n=50)
And a snippet of the result from the final command is as follows:
Term Ont N Up Down P.Up P.Down
GO:0006955 immune response BP 1052 124 113 1.16395565456951e-16 0.00962505873802103
GO:0019221 cytokine-mediated signaling pathway BP 351 61 37 8.54320669089232e-16 0.129568753575334
GO:0006952 defense response BP 934 112 100 1.49961996610012e-15 0.0158422706753765
GO:0071345 cellular response to cytokine stimulus BP 447 70 42 1.50658420177947e-15 0.326662132519315
It can be seen that even from the first GO term "immune response", the numbers of upregulated genes and downregulated genes are very similar (124 to 113) and yet the corresponding p.values are vastly different (1.16e-16 to 0.009). Am I misunderstanding how
goana() works? I thought it is an analysis involving Fisher's exact test to determine overrepresentation of terms based on list of genes provided. I would appreciate if somebody has an explanation for this strange (to me) output. Thanks!
Please show the number of DE genes, i.e., the output from:
Turns out I got the numbers wrong for the comments I made below. The output is:
So the upregulated genes are indeed more than the downregulated genes. I suppose this makes the output I got above sense? Apologies for the confusion.