Normalization for within-sample gene correlation
1
0
Entering edit mode
tcalvo ▴ 40
@tcalvo-12466
Last seen 11 weeks ago
Brazil

Dear colleagues,

I'm working with correlations between several genes from a group of samples. I have counts and TPM values quantified with Salmon from bulk-RNAseq data. I checked some written sources and found opposite recomendations, so what's your take?

Adequate normalization/metric for within-sample correlation between genes (I'm doing pairwise for genes).

  • TPM or log(TPM);
  • CPM or log(CPM);
  • CPM + TMM (edgeR);
  • log(counts) normalized by median of ratios method (DESEq2);

So far I'm inclined towards simply using TPM or CPM, maybe CPM + TMM; however, I'm not sure about the latter.

Although I'm taking precautions to validate these results in other datasets, I do not want to take this decision after seeing the correlation coefficients or their significance.

Thank you very much for your help,

edgeR Normalization DESeq2 tpm RNASeq • 355 views
ADD COMMENT
2
Entering edit mode
@gordon-smyth
Last seen 12 hours ago
WEHI, Melbourne, Australia

edgeR and DESeq2 both give advice how to extract normalized expressed values for exploratory analyses. Just follow that advice.

In edgeR, the advice is to use cpm(dge, log=TRUE) where dge is the TMM normalized DGEList.

ADD COMMENT
0
Entering edit mode

Thank you, Gordon. I haven't found specific guidance in documentation of those packages for "within-sample" normalization, only for between-samples and for variance stabilization for PCA and heatmap/clustering.

However, in books and forums, some suggested TPM or any other method that adjusted for gene/transcript lenght. Then, I was afraid that between-sample normalization procedures would hurt or at least not needed, hence the doubt.

One source: https://hbctraining.github.io/DGE_workshop/lessons/02_DGE_count_normalization.html

Michael Love recommended TPM for this as well Within-sample gene comparison with DESeq2

Thanks.

ADD REPLY
0
Entering edit mode

You shouldn't expect the edgeR User's guide to give specific advice about things that are unneeded for the analyses that the package is designed for.

Adjusting for gene length is irrelevant for a correlation analysis. You can easily adjust for genelength by using rpkm() in edgeR instead of cpm() but the inter-gene correlations will be identical.

On the other hand, between-sample normalization is absolutely essential and I have never known anyone to suggest otherwise.

ADD REPLY

Login before adding your answer.

Traffic: 190 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6