I'm working with correlations between several genes from a group of samples. I have counts and TPM values quantified with Salmon from bulk-RNAseq data. I checked some written sources and found opposite recomendations, so what's your take?
Adequate normalization/metric for within-sample correlation between genes (I'm doing pairwise for genes).
- TPM or log(TPM);
- CPM or log(CPM);
- CPM + TMM (edgeR);
- log(counts) normalized by median of ratios method (DESEq2);
So far I'm inclined towards simply using TPM or CPM, maybe CPM + TMM; however, I'm not sure about the latter.
Although I'm taking precautions to validate these results in other datasets, I do not want to take this decision after seeing the correlation coefficients or their significance.
Thank you very much for your help,