Hi,
I have some total RNA-seq data that was obtained from RNA that had quite low RIN values (unfortunately due to the source material, this was pretty much unavoidable). I've been looking at different ways people have tried to approach this problem and try and correct for it. I've seen that the transcript integrity number (TIN) has been developed in part to address this problem and one of the ways they did this was to first calculate a median TIN value for each sample and then correct the gene read counts based on this. In the paper they regressed the gene level read counts to the TIN value using a loess method.
And I was basically wondering how would be the most sensible way/if it is even possible to implement this sort of normalization in a Salmon -> tximport > DESeq2 pipeline.
I guess my main question would be, on what counts I would perform this normalization.
Would it be best to just do it on the the counts after importing to gene level with tximport? and then feeding these normalised counts back to the tximport matrix before running DESeq2 (with DESeqDataSetFromTximport)?
Or would running tximport using countsFromAbundance and then using these counts for normalization and then feeding the normalised counts straight to DESeq2 (with DESeqDataSetFromMatrix) be better?
I am just unsure of how the offset matrix and things would play into all this. Are any of these approaches sensible here? Apologies if this is all way off track/not clear.
Any other suggestions on how to best deal with data with low RINs would also be appreciated.
Thanks.
Hi Michael,
Many thanks for the response.
I have never used RUV or SVA before but I will have a read of those and try them here and will post if I have any further questions.
Thanks again.