Question: Collapsing transcript counts into pathways / gene lists with DESeq2
gravatar for Gon Nido
20 months ago by
Gon Nido0
Gon Nido0 wrote:


I've been using the DESeq2 package lately with tximport to import the counts generated by Salmon. I wanted to do some pathway analysis and I used tximport with the tx2gene argument set to a mapping between transcript ids and pathway name.

> head(pw_txdb)
           TXNAME                 pathway
1 ENST00000585714 mitochondrial_transport
2 ENST00000495634 mitochondrial_transport
3 ENST00000492580 mitochondrial_transport
4 ENST00000340001 mitochondrial_transport
5 ENST00000460872 mitochondrial_transport
6 ENST00000370732 mitochondrial_transport
> txi <- tximport(files, type="salmon", tx2gene=pw_txdb)

Then run the DESeq2 standard DE analysis.

My assumption here is that the counts can be summed up regardless of the definition of a "region" (gene or gene set). However, I'm wondering if the assumptions behind the DESeq model behind still hold (e.g. negative binomial distribution)? In addition, I'm afraid that overlapping regions (genes in common between pathways) would violate the assumption of independence in the multiple tests to correct for... Perhaps permutation of labels would be a better way to go about it?

Thank you,


ADD COMMENTlink modified 20 months ago by Michael Love26k • written 20 months ago by Gon Nido0
Answer: Collapsing transcript counts into pathways / gene lists with DESeq2
gravatar for Michael Love
20 months ago by
Michael Love26k
United States
Michael Love26k wrote:

You don't want to sum up counts to the pathway level. Imagine if you have a pathway with two genes, and one increases in expression and the other goes down. This will be tossed. You should do either gene and/or transcript-level analysis (tximport gives code for gene-level analysis; for transcript-level, there are many options, and we're still evaluating these in our lab, but you can use a differential transcript usage detection method like DRIMSeq on Salmon quantifications). You can then use a gene-set tool like goseq (you can use this downstream of DESeq2), or you can use the gene-set methods in limma: roast and camera (not downstream of DESeq2 though).

ADD COMMENTlink written 20 months ago by Michael Love26k

Thank you for the input Michael. I understand that if the genes change expression in different directions, the pathway will not be "differentially expressed". I thought, however, this would be analogous to two transcript ids mapped to the same gene whose expressions change in opposite directions (again, using tximport with the argument tx2gene).

I'm using goseq for downstream analyses, but I was wondering if conceptually it made any sense to use a negative binomial model to estimate differential expression between sets of genes. This idea emerged from the (a priory) hypothesis that there is a general downregulation of most of the components in some pathways in our case/control study.

ADD REPLYlink modified 20 months ago • written 20 months ago by Gon Nido0

I find it useful to distinguish between gene-level DE and DTU, and you can look for pathway or gene-set results for both analyses. Another approach is to test for any change within a gene, e.g. you can consider the stageR method and paper, and then you could perform the gene-set analysis on this result. But I do not recommend collapsing the gene expression to the pathway level, this is really a big loss of information, whereas looking for gene-level DE as separate from DTU is something that many people are interested in performing as a complementary analysis to DTU.

ADD REPLYlink written 20 months ago by Michael Love26k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 417 users visited in the last hour