Question

DESeqDataSetFromTximport with and without offset

0

Entering edit mode

igor ▴ 40

@igor

Last seen 10 months ago

United States

I am trying to better understand how to use tximport and DESeq2 together. Both packages provide a good summary. From the tximport vignette:

Note: there are two suggested ways of importing estimates for use with differential gene expression (DGE) methods. The first method, which we show below for edgeR and for DESeq2, is to use the gene-level estimated counts from the quantification tools, and additionally to use the transcript-level abundance estimates to calculate a gene-level offset that corrects for changes to the average transcript length across samples. ... the function DESeqDataSetFromTximport takes care of creation of the offset for you. Let’s call this method “original counts and offset”. The second method is to use the tximport argument countsFromAbundance="lengthScaledTPM" or "scaledTPM", and then to use the gene-level count matrix txi$counts directly as you would a regular count matrix with these software. Let’s call this method “bias corrected counts without an offset”

Looking at the DESeqDataSetFromTximport() code, it looks like it will properly handle a tximport object regardless of the countsFromAbundance setting:

  stopifnot(txi$countsFromAbundance %in% c("no","scaledTPM","lengthScaledTPM"))
  if (txi$countsFromAbundance %in% c("scaledTPM","lengthScaledTPM")) {
    message("using just counts from tximport")
  } else {
    message("using counts and average transcript lengths from tximport")
    lengths <- txi$length
    stopifnot(all(lengths > 0))
    dimnames(lengths) <- dimnames(object)
    assays(object)[["avgTxLength"]] <- lengths
  }

Using "lengthScaledTPM" or "scaledTPM" is actually more flexible for DESeq2 since that allows you to use either the count matrix or the tximport object. Is that correct? Maybe I am misinterpreting, but the note makes it sound like using the offset is the preferred method. It seems using "lengthScaledTPM" or "scaledTPM" like in the "limma-voom" workflow would be simpler for other workflows as well. Is there a downside to that approach?

deseq2 tximport • 2.1k views

ADD COMMENT • link 3.8 years ago igor ▴ 40

score 2 · Accepted Answer · 2020-07-06

2

Entering edit mode

Michael Love 41k

@mikelove

Last seen 21 minutes ago

United States

"it looks like it will properly tximport object regardless of the countsFromAbundance setting"

yes.

I theoretically prefer the offset method. It's basically the same approach we have to other technical aspects that inflate counts, such as sequencing depth and unwanted factors of variation (e.g. RUVSeq or svaseq factors).

I don't think there is a practical downside to the scaled TPM approaches. I think it doesn't make a big difference, do consult the tximport paper from 2015 for the simulation evaluations.

ADD COMMENT • link 3.8 years ago Michael Love 41k

0

Entering edit mode

Thanks for clarifying. I actually didn't realize that the two approaches are not completely identical.

ADD REPLY • link 3.8 years ago igor ▴ 40