I've been using the
DESeq2 package lately with
tximport to import the counts generated by Salmon. I wanted to do some pathway analysis and I used
tximport with the
tx2gene argument set to a mapping between transcript ids and pathway name.
> head(pw_txdb) TXNAME pathway 1 ENST00000585714 mitochondrial_transport 2 ENST00000495634 mitochondrial_transport 3 ENST00000492580 mitochondrial_transport 4 ENST00000340001 mitochondrial_transport 5 ENST00000460872 mitochondrial_transport 6 ENST00000370732 mitochondrial_transport > txi <- tximport(files, type="salmon", tx2gene=pw_txdb)
Then run the
DESeq2 standard DE analysis.
My assumption here is that the counts can be summed up regardless of the definition of a "region" (gene or gene set). However, I'm wondering if the assumptions behind the DESeq model behind still hold (e.g. negative binomial distribution)? In addition, I'm afraid that overlapping regions (genes in common between pathways) would violate the assumption of independence in the multiple tests to correct for... Perhaps permutation of labels would be a better way to go about it?