Question

Confused about tximport-DESeq2 setup

0

Entering edit mode

Dunois • 0

@f7ec0822

Last seen 2.2 years ago

Universe

The Downstream DGE in Bioconductor section in the tximport vignette has two Notes in it and nothing else, and the way things are explained there is confusing.

Which of the two code snippets below is the correct approach for importing (and subsequently passing on to DESeq2) expression levels quantified using Salmon with the transcript-gene relationship given by a two column data.frame named tx2gene?

(1):

txi <- tximport::tximport(files = flist, type = "salmon", tx2gene = tx2gene, countsFromAbundance="lengthScaledTPM")
dds <- DESeqDataSetFromTximport(txi, sampleTable, ~cond)

(2):

txi <- tximport::tximport(files = flist, type = "salmon", tx2gene = tx2gene)
dds <- DESeqDataSetFromTximport(txi, sampleTable, ~cond)

DESeq2 tximport salmon • 1.1k views

ADD COMMENT • link updated 2.2 years ago by ATpoint ★ 4.7k • written 2.2 years ago by Dunois • 0

score 2 · Answer 1 · 2023-01-03

They’re almost identical in what they effectively do for the user, which is making sure that differences in average transcript length per gene and sample does not bias the counts. The first one modifies the counts to correct for average tx length so you get a single matrix of raw counts ready for downstream analysis. The second one produces an offset matrix of average lengths per gene and sample which DESeq2 then can use to incorporate into its model. Both are valid, the first one is more generic since some tools/approaches (like limma-voom) do not support a length offset matrix. I prefer the generic one but choice is yours.

See also the vignette: https://bioconductor.org/packages/release/bioc/vignettes/tximport/inst/doc/tximport.html#Downstream_DGE_in_Bioconductor