Hi,

I have a question concerning the "new" scaling method offered by `tximport`

called `dtuScaledTPM`

and how it affects DTU analysis.
So far I have used `scaledTPM`

which as I understand (correct me if Im wrong) scales the TPM values to library size by multiplying the TPM of a transcript of a sample with the column sum of the count matrix and thus brings them back onto count scale?
The `dtuScaledTPM`

additionally includes the transcript length into the library size info, (dividing count based library size by library size calculated from TPM*transcript length). And the transcript length is the median of transcript lengths of all transcripts in a gene ( where the transcript length itself is the average across all samples). Is this correct? And if so, why is this beneficial for DTU analysis. I understand that using `lengthScaledTPM`

is not advantageous but I cant wrap my head around why this method is better "just" because the transcript length value is the median instead of the mean?

Sorry if this is a rather confusing question. Im seeking to understand why this method should be used for DTU . It would be great to get a worst case toy example where `lengthScaledTPM`

would not work and also where lack of length scaling in `scaledTPM`

would not work well.

Thanks in advance

Fiona

thanks for taking the time to answer. so the main difference to

`lengthScaledTPM`

is, that you take the average overall samples:and so you do not unpuprosley set down the whole gene expression, in one sample (groupA) and not the other (groupB). Hence you do not divide by a different total gene expression amount when calculating the proportions. (as opposed to what happens when using

`lengthScaledTPM`

, which is wanted for DGE)And the main argument for including transcript length scaling (

in DTU) basically cooks down to, preventing to confound the low dispersion (or high precision) for long transcripts "just" because they generally have higher counts (due to their length).Is that right?

With

`dtuScaledTPM`

, and comparing to`scaledTPM`

, we want that long transcripts tend to have counts in the range similar to their original counts. Actually the precision has more to do with the Poisson component than the dispersion component (if we are considering a Gamma Poisson for modeling the data).What would you recommend to use for downstream linear modeling (e.g., network or transcript-QTL) type of analyses? It still seems like the

`dtuScaledTPM`

is the best choice. Also, would you recommend variance stabilizing transform on`dtuScaledTPM`

? Thanks!If you are doing something marginally per transcript, you could use any countsFromAbundance method. This thread is about DTU modeling where the outcome is a vector (over isoform counts). Yes, I’d recommend VST for eQTL discovery, I’ve benchmarked this in a collaboration and it works well.