Question

Having TMM for GSEA

0

Entering edit mode

AZ ▴ 30

@fereshteh-15803

Last seen 23 months ago

United Kingdom

Hi

In GSEA manual says

Normalizing RNA-seq quantification to support comparisons of a feature's expression levels across samples is important for GSEA. Normalization methods (such as, TMM, geometric mean) which operate on raw counts data should be applied prior to running GSEA. Tools such as DESeq2 can be made to produce properly normalized data (normalized counts) which are compatible with GSEA

So I have two groups (n=9 versus n=24)

I put my raw counts matrix in this formula

dge <- DGEList(M)
dge <- calcNormFactors(dge)
logCPM <- cpm(dge, log=TRUE)

Does logCPM gives proper input for GSEA?

edger deseq2 • 3.7k views

ADD COMMENT • link updated 4.0 years ago by Kevin Blighe ★ 4.0k • written 4.6 years ago by AZ ▴ 30

score 3 · Accepted Answer · 2020-08-19

3

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 1 hour ago

WEHI, Melbourne, Australia

Yes, it's fine.

Alternatively you could try camera() in edgeR which has analogous functionality to GSEA.

ADD COMMENT • link 4.6 years ago Gordon Smyth 52k

0

Entering edit mode

Sorry Gordon Smyth

Why people say TMM is not suitable in any context while GSEA accepts TMM

EdgeR --> TMM (Trimmed Median of M-values)
DESeq2 --> Geometric mean
Both are debatable and not suitable for every context.

Perhaps take a look at the output of ?DESeq2::counts

...

Description:

The counts slot holds the count data as a matrix of non-negative integer count values, one row for each observational unit (gene or the like), and one column for each sample.

...

normalized: logical indicating whether or not to divide the counts by the size factors or normalization factors before returning (normalization factors always preempt size factors)

...

Author(s): Simon Anders

ADD REPLY • link 4.6 years ago AZ ▴ 30

2

Entering edit mode

I'm not sure sure why people love to have a debate about normalization methods but it is irrelevant here. GSEA uses a very robust permutation algorithm and it is not sensitive to the particular normalization method used. Any reasonable normalization method that produces logCPM type values will be fine for GSEA. edgeR's calcNormFactors and cpm functions are certainly suitable, as the GSEA documentation itself already tells you.

ADD REPLY • link 4.6 years ago Gordon Smyth 52k

1

Entering edit mode

Why people say TMM is not suitable in any context while GSEA accepts TMM

Please read my response: https://www.biostars.org/p/456240/#456346

For clarity, for GSEA, please use the log CPM values. If you are going the DESeq2 route, then use the variance-stabilised expression levels.

ADD REPLY • link 4.6 years ago • updated 4.0 years ago Kevin Blighe ★ 4.0k