Question

If transcript-to-gene conversion is needed in kallisto > tximport > DESeq2 pipeline

0

Entering edit mode

Yunlu Zhu • 0

@yunlu-zhu-15240

Last seen 7.0 years ago

USA

Hi guys,

I'm new to bioinformatics so this may be a naive question. My workflow:

1. tximport function to create the txi data frame from .h5 kallisto files
2. DESeqDataSetFromTximport function to generate DESeq data set

Question:

Do I need to generate gene-level unnormalized counts during the import*? or just to use transcript-level counts**?

---

I found the article "Importing transcript abundance datasets with tximport" very helpful but still got confused with the following paragraph:

Note: there are two suggested ways of importing estimates for use with differential gene expression (DGE) methods. The first method, which we show below for edgeR and for DESeq2, is to use the gene-level estimated counts from the quantification tools, and additionally to use the transcript-level abundance estimates to calculate a gene-level offset that corrects for changes to the average transcript length across samples. The code examples below accomplish these steps for you, keeping track of appropriate matrices and calculating these offsets. For edgeR you need to assign a matrix to y$offset, but the function DESeqDataSetFromTximport takes care of creation of the offset for you. Let’s call this method “original counts and offset”.

Thanks in advance,
Yunlu

*

txi.kallisto.gl <- tximport(files, type = "kallisto", tx2gene = tx2gene)

**

txi.kallisto <- tximport(files, type = "kallisto", txOut = TRUE)

deseq2 rnaseq kallisto tximport • 3.0k views

ADD COMMENT • link updated 7.0 years ago by Michael Love 43k • written 7.0 years ago by Yunlu Zhu • 0

score 1 · Answer 1 · 2018-03-14

1

Entering edit mode

Michael Love 43k

@mikelove

Last seen 4 days ago

United States

Here’s my comment to this Q on another thread:

C: Incredibly high/low foldChange

To repeat from that thread, it works and will find DE at the transcript level, but you should consider to apply a stricter padj threshold. And we are meanwhile working on improvements.

ADD COMMENT • link 7.0 years ago Michael Love 43k

0

Entering edit mode

Yes, the DE analysis seems to work well. but for gene ontology and pathway-level analysis, is it necessary to generate a gene-level dds? Thanks!

ADD REPLY • link 7.0 years ago Yunlu Zhu • 0

0

Entering edit mode

You could use stageR or the aggregation methods from DEXSeq to combine to gene level. Yes, the gene set methods need gene level results.

ADD REPLY • link 7.0 years ago Michael Love 43k

0

Entering edit mode

I'll just use tximport (txOut = FALSE and tx2gene) to estimate abundance at gene level, then proceed to the DESeq2 and following analysis. Thank you very much Mike! This is very helpful!

ADD REPLY • link 7.0 years ago Yunlu Zhu • 0