Strange output from goana()
1
0
Entering edit mode
cronanz • 0
@cronanz-12047
Last seen 5.0 years ago

Dear everyone,

I have been using goana() from edgeR to perform GO enrichment analysis on my list of DE genes and I'm puzzled by the results. My code to get the output is as follows:

design <- model.matrix(~condition)

disp <- estimateDisp(y, design, robust=TRUE)

fit <- glmFit(disp, design, robust=TRUE)

lrt <- glmLRT(fit, coef = 2)

go <- goana(lrt, FDR=0.1)

topgobp <- topGO(go, ont="BP", n=50)

And a snippet of the result from the final command is as follows:

 

                        Term                                                       Ont    N         Up    Down P.Up                                             P.Down
GO:0006955    immune response                                  BP    1052    124   113    1.16395565456951e-16   0.00962505873802103
GO:0019221    cytokine-mediated signaling pathway    BP    351      61     37      8.54320669089232e-16 0.129568753575334
GO:0006952    defense response                                  BP    934      112   100    1.49961996610012e-15 0.0158422706753765
GO:0071345    cellular response to cytokine stimulus   BP    447      70     42      1.50658420177947e-15 0.326662132519315

 

It can be seen that even from the first GO term "immune response", the numbers of upregulated genes and downregulated genes are very similar (124 to 113) and yet the corresponding p.values are vastly different (1.16e-16 to 0.009). Am I misunderstanding how goana() works? I thought it is an analysis involving Fisher's exact test to determine overrepresentation of terms based on list of genes provided. I would appreciate if somebody has an explanation for this strange (to me) output. Thanks!

edger goana • 2.3k views
ADD COMMENT
0
Entering edit mode

Please show the number of DE genes, i.e., the output from:

summary(decideTestsDGE(lrt, p=0.1))
ADD REPLY
0
Entering edit mode

Turns out I got the numbers wrong for the comments I made below. The output is:

-1   990
0  10586
1    677

So the upregulated genes are indeed more than the downregulated genes. I suppose this makes the output I got above sense? Apologies for the confusion.

ADD REPLY
2
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 9 hours ago
The city by the bay

The extra piece of information that determines the p-value calculation is the total number of up- or down-regulated genes. The numbers reported in Up and Down are the number of genes that are up- or down-regulated, respectively, and in the gene set corresponding to the row. Even if these numbers are similar, you can get very different p-values, e.g., if the total number of up-regulated genes is (much) lower than the total number of down-regulated genes (i.e., the proportion of up-regulated genes in the set is higher than that of the down-regulated genes). Another contributor is the fact that the hypergeometric distribution has very low variance, so modest differences in the up/down proportions can result in fairly dramatic differences in the p-values.

ADD COMMENT
0
Entering edit mode

Thank you Aaron for your answer.

Well if that's the case, that's a big difference in p.values I'm seeing here. I know that it would be prudent to gauge the significance with p.value alone, but given that the above comparison had 70 genes difference between up- and down-regulated genes (more up-regulated genes), I'm still surprised that the difference is big. Is that the entirety of how p.values are computed? I suppose the FDR ranking is not taken into account.

ADD REPLY
1
Entering edit mode

Based on your response to Gordon, the proportion of up-regulated genes in the first set is 124/677 = 0.18. The proportion of down-regulated genes is 113/990 = 0.11. So it's actually a fairly big difference in the proportions, which explains the difference in the p-values corresponding to each direction of DE.

ADD REPLY
0
Entering edit mode

That makes sense. Thank you always for your detailed response

ADD REPLY

Login before adding your answer.

Traffic: 1108 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6