Question: Getting between sample/within sample normalized gene expression values from DESeq2
gravatar for kofoed
8 months ago by
kofoed0 wrote:

Hi Guys,

My first question here. I'll make it as short and concise as possible. Please correct me if I am wrong (which I very well might be).

I would like to obtain between sample normalized, within sample normalized gene expression values, e.g. size factor adjusted TPM values. I have quantified my RNA-seq experiment using Salmon and imported the results with tximport to do the differential analysis with DESeq2. Tximport allows the use of the TPM values from Salmon to do the differential analysis, but entails a transformation of said values to either scaledTPM or lengthScaledTPM values.

As I understand it, the scaledTPM values are the between-sample normalized TPM values multiplied by the library size in millions. So something like "transcripts pr sample", which is not really what I want.

My question is, would it be possible to somehow output between sample normalized TPM values for each gene or is this somehow violation a principle I am overlooking?

Could I just divide the the TPMs from Salmon with the sizeFactors obtained from DESeq2?



ADD COMMENTlink modified 8 months ago by Michael Love16k • written 8 months ago by kofoed0
gravatar for Michael Love
8 months ago by
Michael Love16k
United States
Michael Love16k wrote:

You might consider dividing out the size factors returned by estimateSizeFactorsForMatrix on the TPMs, but not from the counts. I've done this before, in the alpine paper for example. The reason one might consider to re-normalize TPM (in theory library size has been removed) for some datasets or some tasks is that it is possible for very highly abundant genes with small errors in measurement to induce errors in other less abundant transcripts. So you end up, using the technique above, with something like TPM, but which reduces this problem by assuming that the median TPM ratio between samples should be 1. It will not sum to 1e6 exactly for samples though, but that's not a problem in my opinion.




ADD COMMENTlink modified 8 months ago • written 8 months ago by Michael Love16k

Hi  Michael,

Thank you for answering my question. The reason for the question was that I would like to plot TPM normalized expression values of different genes over multiple conditions, such that the relative expression levels of those genes can be compared to each other. Now my second question is whether it is actually necessary to apply the size factor normalization to the TPMs or whether it would be sufficient (or even more correct) to just plot the TPMs from Salmon directly? This kind of graphical representation is not really related to the DESeq2 analysis anyway.



ADD REPLYlink written 8 months ago by kofoed0

It's up to you. If I really want to make sure there are not spurious shifts in abundance estimates (for genewise testing say) I tend to re-normalize, but it's fine to just plot the canonical TPMs of course, this is the standard way. Anyway, I do testing and EDA most often on counts.

ADD REPLYlink written 8 months ago by Michael Love16k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 101 users visited in the last hour