I extracted differentially expressed gene from a publicly available data set. I wanted to do an over-representation analysis (OAR) and, since this was an old microarray with less than 10K genes tested, I felt compelled to do OAR against the actual microarray background (also termed reference list on panther).
I tried Panther, DAVID and consensusPathDB online tools and, though pathways shown are somewhat different, a generic sad trend is present: while there are significantly enriched pathways before multiple testing correction, the latter (Benjamini for DAVID, Bonferroni for Panther, can't remember what for consensusPathDB) kills any significance for all tested cases. Some cases are puzzling. For example, in consensusPathDB (shows nice details in the summary table), one of the pathways: overall 22 genes, 8 of them are in my background, 6 out of 8 are enriched, p value is 0.00157, but q-value is 0.766 (btw, it's 0.766 for all shown pathways). It is certainly humbling to realize how little intuition I have about statistics, but, on the practical side, I do not know what to make of it. By the way, if I use generic RAT background, I am in decent shape.
Shall I consider all of the pathways insignificant? Shall I take this with a grain of salt? Any advice, please?
This was the important part, but if you are answering this and have another minute to spare: is the multiple testing correction needed because genes can belong to different pathways? I would expect to have this correction when we consider whether a given gene is significant (divide by the number pathways, for example), but not vice versa.