Normalization for within-sample gene correlation
Entering edit mode
tcalvo ▴ 70
Last seen 6 months ago

Dear colleagues,

I'm working with correlations between several genes from a group of samples. I have counts and TPM values quantified with Salmon from bulk-RNAseq data. I checked some written sources and found opposite recomendations, so what's your take?

Adequate normalization/metric for within-sample correlation between genes (I'm doing pairwise for genes).

  • TPM or log(TPM);
  • CPM or log(CPM);
  • CPM + TMM (edgeR);
  • log(counts) normalized by median of ratios method (DESEq2);

So far I'm inclined towards simply using TPM or CPM, maybe CPM + TMM; however, I'm not sure about the latter.

Although I'm taking precautions to validate these results in other datasets, I do not want to take this decision after seeing the correlation coefficients or their significance.

Thank you very much for your help,

edgeR Normalization DESeq2 tpm RNASeq • 734 views
Entering edit mode
Last seen 7 hours ago
WEHI, Melbourne, Australia

edgeR and DESeq2 both give advice how to extract normalized expressed values for exploratory analyses. Just follow that advice.

In edgeR, the advice is to use cpm(dge, log=TRUE) where dge is the TMM normalized DGEList.

Entering edit mode

Thank you, Gordon. I haven't found specific guidance in documentation of those packages for "within-sample" normalization, only for between-samples and for variance stabilization for PCA and heatmap/clustering.

However, in books and forums, some suggested TPM or any other method that adjusted for gene/transcript lenght. Then, I was afraid that between-sample normalization procedures would hurt or at least not needed, hence the doubt.

One source:

Michael Love recommended TPM for this as well Within-sample gene comparison with DESeq2


Entering edit mode

You shouldn't expect the edgeR User's guide to give specific advice about things that are unneeded for the analyses that the package is designed for.

Adjusting for gene length is irrelevant for a correlation analysis. You can easily adjust for genelength by using rpkm() in edgeR instead of cpm() but the inter-gene correlations will be identical.

On the other hand, between-sample normalization is absolutely essential and I have never known anyone to suggest otherwise.


Login before adding your answer.

Traffic: 514 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6