suppose I have 1000 prob-id(output of the limma) with the significant p-value, should all of them be select for GSEA, or there is a rational way of choosing the number of DEG for GSEA?
suppose I have 1000 prob-id(output of the limma) with the significant p-value, should all of them be select for GSEA, or there is a rational way of choosing the number of DEG for GSEA?
The nice thing about some GSEA methods (like camera and roast) is that you don't need to choose a cutoff, but they rather work over the entire spectrum of changes across a comparison specified in your linear model.
Consider reading through the Gene set analysis and the Gene set testing sections in the PH525x biomedical science book
Since you are analyzing a gene list, I assume that you are doing a simple overlap pathway analysis such as that done by the goana() or kegga() functions in limma. (I wouldn't call these methods GSEA, that's a term I prefer to reserve for more complicated things that are analogous to the GSEA software form the Broad Institute.)
The overlap analyses, whereby we count the number of DE genes in each annotated term or pathway, work best when there are lots of DE genes. So having 1000 DE genes is not by any means too many. However, a few thousand DE genes would be on the high side. In that case, I would trim the list down by using treat() with a higher lfc threshold.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Thank you, Gordon Smyth
Your explanation was interesting, well I could try with Broad Institute
software , http://software.broadinstitute.org/gsea/index.jsp
Or, since you already are using limma, you could try some of the many GSEA methods it provides as I suggested above (camera, roast, romer, ...)