Hi folks,
The DESeq2 vst rlog size factors computations account for differences in library composition amongst other things. Library composition differences between sample groups are one of the reasons for not using TPMs. However, we are often presented only with TPM values. I wondered if there are existing methods for quantifying a library composition problem from a set of TPM values and if not what might be a good way to quantify the risk of using the TPMs (for visualisation or other purposes). For example, would starting off by computing a matrix of pairwise Kolmogorov-Smirnov tests for sample count distributions and find significant differences do the trick?
Many thanks
Tim
Thanks Mike.
Yes this would be for where you have just TPMs available and want to check to see if library composition is a problem or not.
But wrt tximport - are you saying that tximport->TPM alone will account for a library composition effect in an equivalent way that the median of ratios does? So if you know your TPMs were calculated that way then you're on safer ground?
No. That is just how the data is passed to the methods which then do their own normalization/offset approach. If you do TPM + library size information -> tximport scaledTPM approach -> VST or rlog, then the last step will perform appropriate median ratio scaling and transformation that stabilizes variance.
Ah ok that makes sense. Normally wouldn't have library size information and only TPMs and so want to try and quantify likely issues, e.g. have an application that already uses TPMs and want to add a quick test to quantify risk by detecting potential library composition effect. Perhaps the K-S test will suffice? Thanks for helping.
Hmm, not sure about a test. Good luck!