Question

Getting between sample/within sample normalized gene expression values from DESeq2

0

Entering edit mode

kofoed • 0

@kofoed-13121

Last seen 6.9 years ago

Hi Guys,

My first question here. I'll make it as short and concise as possible. Please correct me if I am wrong (which I very well might be).

I would like to obtain between sample normalized, within sample normalized gene expression values, e.g. size factor adjusted TPM values. I have quantified my RNA-seq experiment using Salmon and imported the results with tximport to do the differential analysis with DESeq2. Tximport allows the use of the TPM values from Salmon to do the differential analysis, but entails a transformation of said values to either scaledTPM or lengthScaledTPM values.

As I understand it, the scaledTPM values are the between-sample normalized TPM values multiplied by the library size in millions. So something like "transcripts pr sample", which is not really what I want.

My question is, would it be possible to somehow output between sample normalized TPM values for each gene or is this somehow violation a principle I am overlooking?

Could I just divide the the TPMs from Salmon with the sizeFactors obtained from DESeq2?

Cheers,

Michael

rnaseq deseq2 tximport salmon normalization • 2.5k views

ADD COMMENT • link updated 6.9 years ago by Michael Love 41k • written 6.9 years ago by kofoed • 0

score 2 · Answer 1 · 2017-06-11

2

Entering edit mode

Michael Love 41k

@mikelove

Last seen 20 hours ago

United States

You might consider dividing out the size factors returned by estimateSizeFactorsForMatrix on the TPMs, but not from the counts. I've done this before, in the alpine paper for example. The reason one might consider to re-normalize TPM (in theory library size has been removed) for some datasets or some tasks is that it is possible for very highly abundant genes with small errors in measurement to induce errors in other less abundant transcripts. So you end up, using the technique above, with something like TPM, but which reduces this problem by assuming that the median TPM ratio between samples should be 1. It will not sum to 1e6 exactly for samples though, but that's not a problem in my opinion.

ADD COMMENT • link 6.9 years ago Michael Love 41k

0

Entering edit mode

Hi Michael,

Thank you for answering my question. The reason for the question was that I would like to plot TPM normalized expression values of different genes over multiple conditions, such that the relative expression levels of those genes can be compared to each other. Now my second question is whether it is actually necessary to apply the size factor normalization to the TPMs or whether it would be sufficient (or even more correct) to just plot the TPMs from Salmon directly? This kind of graphical representation is not really related to the DESeq2 analysis anyway.

Best,

Michael

ADD REPLY • link 6.9 years ago kofoed • 0

0

Entering edit mode

It's up to you. If I really want to make sure there are not spurious shifts in abundance estimates (for genewise testing say) I tend to re-normalize, but it's fine to just plot the canonical TPMs of course, this is the standard way. Anyway, I do testing and EDA most often on counts.

ADD REPLY • link 6.9 years ago Michael Love 41k