DESeqDataSetFromTximport with and without offset
Entering edit mode
igor ▴ 40
Last seen 9 months ago
United States

I am trying to better understand how to use tximport and DESeq2 together. Both packages provide a good summary. From the tximport vignette:

Note: there are two suggested ways of importing estimates for use with differential gene expression (DGE) methods. The first method, which we show below for edgeR and for DESeq2, is to use the gene-level estimated counts from the quantification tools, and additionally to use the transcript-level abundance estimates to calculate a gene-level offset that corrects for changes to the average transcript length across samples. ... the function DESeqDataSetFromTximport takes care of creation of the offset for you. Let’s call this method “original counts and offset”. The second method is to use the tximport argument countsFromAbundance="lengthScaledTPM" or "scaledTPM", and then to use the gene-level count matrix txi$counts directly as you would a regular count matrix with these software. Let’s call this method “bias corrected counts without an offset”

Looking at the DESeqDataSetFromTximport() code, it looks like it will properly handle a tximport object regardless of the countsFromAbundance setting:

  stopifnot(txi$countsFromAbundance %in% c("no","scaledTPM","lengthScaledTPM"))
  if (txi$countsFromAbundance %in% c("scaledTPM","lengthScaledTPM")) {
    message("using just counts from tximport")
  } else {
    message("using counts and average transcript lengths from tximport")
    lengths <- txi$length
    stopifnot(all(lengths > 0))
    dimnames(lengths) <- dimnames(object)
    assays(object)[["avgTxLength"]] <- lengths

Using "lengthScaledTPM" or "scaledTPM" is actually more flexible for DESeq2 since that allows you to use either the count matrix or the tximport object. Is that correct? Maybe I am misinterpreting, but the note makes it sound like using the offset is the preferred method. It seems using "lengthScaledTPM" or "scaledTPM" like in the "limma-voom" workflow would be simpler for other workflows as well. Is there a downside to that approach?

deseq2 tximport • 284 views
Entering edit mode
Last seen 12 hours ago
United States

"it looks like it will properly tximport object regardless of the countsFromAbundance setting"


I theoretically prefer the offset method. It's basically the same approach we have to other technical aspects that inflate counts, such as sequencing depth and unwanted factors of variation (e.g. RUVSeq or svaseq factors).

I don't think there is a practical downside to the scaled TPM approaches. I think it doesn't make a big difference, do consult the tximport paper from 2015 for the simulation evaluations.

Entering edit mode

Thanks for clarifying. I actually didn't realize that the two approaches are not completely identical.


Login before adding your answer.

Traffic: 355 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6