The tximport vignette shows two ways to import results for differential expression analysis at the gene level.
The first approach (used with DESeq and edgeR) is to use the estimated counts from your quantification tool of choice, and let (or set) DESeq (or edgeR) use an offset to account for changes in average transcript length per sample.
The second way is to just fetch
countsFromAbundance="lengthScaledTPM" directly (no offsets used here), which is then suitable for use by voom (presumably these can also be used (but sub optimal) in the DESeq2 and edgeR world (right?)).
I'm curious about what version of the imported data to use in the "exploratory data analysis" phase. For instance, in DESeq2, one would either
rlog the data first before doing clustering, etc. and in the edgeR world, you'd simply run
cpm with a higher
In these situations, is it better to use the the data as you would have imported it for use with voom? I see that different places in the DESeq2 codebase (
fpkm), for instance, there is support for an
avgTxLength matrix, but calls to
rlog seem to interact directly with the
counts function, which doesn't look at this transcript length.
cpm function doesn't use offset information ...
My guess is that I should technically use the data as I would import for voom in this case, although it would most likely practically make little difference during EDA either way, but thought I'd get input from the experts.