Hi
In GSEA manual says
Normalizing RNA-seq quantification to support comparisons of a feature's expression levels across samples is important for GSEA. Normalization methods (such as, TMM, geometric mean) which operate on raw counts data should be applied prior to running GSEA. Tools such as DESeq2 can be made to produce properly normalized data (normalized counts) which are compatible with GSEA
So I have two groups (n=9 versus n=24)
I put my raw counts matrix in this formula
dge <- DGEList(M)
dge <- calcNormFactors(dge)
logCPM <- cpm(dge, log=TRUE)
Does logCPM gives proper input for GSEA?
Sorry Gordon Smyth
Why people say TMM is not suitable in any context while GSEA accepts TMM
I'm not sure sure why people love to have a debate about normalization methods but it is irrelevant here. GSEA uses a very robust permutation algorithm and it is not sensitive to the particular normalization method used. Any reasonable normalization method that produces logCPM type values will be fine for GSEA. edgeR's
calcNormFactors
andcpm
functions are certainly suitable, as the GSEA documentation itself already tells you.Please read my response: https://www.biostars.org/p/456240/#456346
For clarity, for GSEA, please use the log CPM values. If you are going the DESeq2 route, then use the variance-stabilised expression levels.