Hi,
I realize that when correcting for GC bias via EDAseq and doing DE in DESeq2, the recommended method is to pass EDAseq's offsets to your DESeq object as NormalizationFactors as per the DESeq2 manual. However, we have a previous analysis that was done a few years ago, where the EDAseq normalized counts were passed directly into tximports' count slot prior to creation of the DESeq object. While I know this is not the current recommended pipeline, is this completely wrong or just less ideal (I know this was recommended in original DESeq prior to NormalizationFactors)? I only ask because we have since validated a number of these targets that are significant in the old analysis but are not when doing it the recommended way.
Thanks, I appreciate your time!
Hi Michael,
Thanks for the quick reply. I realize how the above could be a bit confusing without some code:
Tximport was used to import RSEM genes.results to txi.rsem and uCovar contains the gc percents.
The total library size changed only very slightly with what was done:
These data just seem to be finicky, but this analysis being run this way lines up much better with our more recent scRNA-Seq datasets and nanostring validation. Thanks again!
And how did you run the quantification tool and tximport — this is also relevant.
For running RSEM, STAR transcriptome aligned BAMs were quantified with:
The genes.results files were imported with tximport via:
Thanks again and sorry for lacking important details. Let me know if anything else would be helpful.
Ok this looks fine. Tximport here is just calculating a transcript-usage based offset, and EDASeq is fixing GC in place.
So i dont have any issue with your pipeline that doesn’t use EDASeq offsets.
For other users with other cases, its a bit more complicated, e.g. depending on whether counts are generated from abundance, whether GC bias correction is applied by Salmon, what input goes into the R based bias modeler, etc.
Perfect. Thanks a ton for the info here and elsewhere!