Question

Normalization for within-sample gene correlation

0

Entering edit mode

tcalvo ▴ 100

@tcalvo-12466

Last seen 22 months ago

Brazil

Dear colleagues,

I'm working with correlations between several genes from a group of samples. I have counts and TPM values quantified with Salmon from bulk-RNAseq data. I checked some written sources and found opposite recomendations, so what's your take?

Adequate normalization/metric for within-sample correlation between genes (I'm doing pairwise for genes).

TPM or log(TPM);
CPM or log(CPM);
CPM + TMM (edgeR);
log(counts) normalized by median of ratios method (DESEq2);

So far I'm inclined towards simply using TPM or CPM, maybe CPM + TMM; however, I'm not sure about the latter.

Although I'm taking precautions to validate these results in other datasets, I do not want to take this decision after seeing the correlation coefficients or their significance.

Thank you very much for your help,

edgeR Normalization DESeq2 tpm RNASeq • 2.6k views

ADD COMMENT • link updated 4.4 years ago by Gordon Smyth 52k • written 4.4 years ago by tcalvo ▴ 100

score 2 · Accepted Answer · 2020-11-18

2

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 11 hours ago

WEHI, Melbourne, Australia

edgeR and DESeq2 both give advice how to extract normalized expressed values for exploratory analyses. Just follow that advice.

In edgeR, the advice is to use cpm(dge, log=TRUE) where dge is the TMM normalized DGEList.

ADD COMMENT • link 4.4 years ago Gordon Smyth 52k

0

Entering edit mode

Thank you, Gordon. I haven't found specific guidance in documentation of those packages for "within-sample" normalization, only for between-samples and for variance stabilization for PCA and heatmap/clustering.

However, in books and forums, some suggested TPM or any other method that adjusted for gene/transcript lenght. Then, I was afraid that between-sample normalization procedures would hurt or at least not needed, hence the doubt.

One source: https://hbctraining.github.io/DGE_workshop/lessons/02_DGE_count_normalization.html

Michael Love recommended TPM for this as well Within-sample gene comparison with DESeq2

Thanks.

ADD REPLY • link 4.4 years ago tcalvo ▴ 100

0

Entering edit mode

You shouldn't expect the edgeR User's guide to give specific advice about things that are unneeded for the analyses that the package is designed for.

Adjusting for gene length is irrelevant for a correlation analysis. You can easily adjust for genelength by using rpkm() in edgeR instead of cpm() but the inter-gene correlations will be identical.

On the other hand, between-sample normalization is absolutely essential and I have never known anyone to suggest otherwise.

ADD REPLY • link 4.4 years ago Gordon Smyth 52k