Hi,
I'm trying to use heatmap to show the expression pattern of genes of interest across samples(cell types). I landed on two options here:
1) to use CPM(counts per million) row z-score and base_mean CPM as shown here: https://drive.google.com/file/d/1KVP-flOnINjUOOTWbwVIacJITuXH68HU/view?usp=sharing
2) to use rlog difference (rlog-mean_rlog(same gene across samples)) and base_mean rlog, as shown in the second figure :https://drive.google.com/file/d/1rlHxaLRUW5K7CSk-OCCj_E0LNGY_8YjY/view?usp=sharing
My question is:
is it reasonable to use rlog value to represent expression? Is the difference comparable? say rlog_difference 2 generally reflects bigger change than rlog_difference 1.
I know that it's not completely correlated with counts value, but I think it's the estimation of the 'true' expression.
Thank you very much.
Thank you for your answer.
I'm also wondering whether it's fair to compare the abundance of mRNA between two genes.
It does not make a whole lot of sense but I want to achieve something like:
If a gene has CPM=5000 (or rlog=11) and another gene has CPM=2 (or rlog=0), I might favor the former as a candidate gene, for example, to generate mouse tools, etc.
No rlog and CPM are not proportional to expression when comparing across genes. Longer genes will have higher rlog and CPM. For comparing across samples and genes, you would want a measure like TPM. This is the "abundance" matrix that is imported by tximport from transcript quantification methods.