Hi Guys,
My first question here. I'll make it as short and concise as possible. Please correct me if I am wrong (which I very well might be).
I would like to obtain between sample normalized, within sample normalized gene expression values, e.g. size factor adjusted TPM values. I have quantified my RNA-seq experiment using Salmon and imported the results with tximport to do the differential analysis with DESeq2. Tximport allows the use of the TPM values from Salmon to do the differential analysis, but entails a transformation of said values to either scaledTPM or lengthScaledTPM values.
As I understand it, the scaledTPM values are the between-sample normalized TPM values multiplied by the library size in millions. So something like "transcripts pr sample", which is not really what I want.
My question is, would it be possible to somehow output between sample normalized TPM values for each gene or is this somehow violation a principle I am overlooking?
Could I just divide the the TPMs from Salmon with the sizeFactors obtained from DESeq2?
Cheers,
Michael
Hi Michael,
Thank you for answering my question. The reason for the question was that I would like to plot TPM normalized expression values of different genes over multiple conditions, such that the relative expression levels of those genes can be compared to each other. Now my second question is whether it is actually necessary to apply the size factor normalization to the TPMs or whether it would be sufficient (or even more correct) to just plot the TPMs from Salmon directly? This kind of graphical representation is not really related to the DESeq2 analysis anyway.
Best,
Michael
It's up to you. If I really want to make sure there are not spurious shifts in abundance estimates (for genewise testing say) I tend to re-normalize, but it's fine to just plot the canonical TPMs of course, this is the standard way. Anyway, I do testing and EDA most often on counts.