Hi guys,
I'm new to bioinformatics so this may be a naive question. My workflow:
1. tximport function to create the txi data frame from .h5 kallisto files
2. DESeqDataSetFromTximport function to generate DESeq data set
Question:
Do I need to generate gene-level unnormalized counts during the import*? or just to use transcript-level counts**?
---
I found the article "Importing transcript abundance datasets with tximport" very helpful but still got confused with the following paragraph:
Note: there are two suggested ways of importing estimates for use with differential gene expression (DGE) methods. The first method, which we show below for edgeR and for DESeq2, is to use the gene-level estimated counts from the quantification tools, and additionally to use the transcript-level abundance estimates to calculate a gene-level offset that corrects for changes to the average transcript length across samples. The code examples below accomplish these steps for you, keeping track of appropriate matrices and calculating these offsets. For edgeR you need to assign a matrix to y$offset
, but the function DESeqDataSetFromTximport takes care of creation of the offset for you. Let’s call this method “original counts and offset”.
Thanks in advance,
Yunlu
*
txi.kallisto.gl <- tximport(files, type = "kallisto", tx2gene = tx2gene)
**
txi.kallisto <- tximport(files, type = "kallisto", txOut = TRUE)
Yes, the DE analysis seems to work well. but for gene ontology and pathway-level analysis, is it necessary to generate a gene-level dds? Thanks!
You could use stageR or the aggregation methods from DEXSeq to combine to gene level. Yes, the gene set methods need gene level results.
I'll just use tximport (txOut = FALSE and tx2gene) to estimate abundance at gene level, then proceed to the DESeq2 and following analysis. Thank you very much Mike! This is very helpful!