Grouping similar genes in differential expression analysis to increase power
1
0
Entering edit mode
@willmacnair-7054
Last seen 8.7 years ago
Switzerland

Hi

I've recently done some DE analysis with DESeq2, with relatively small sample sizes. The experimental setup is a time course of 4 points, with matched samples of two different tissue types from 7 patients (so 56 samples in total). I've tested t3 against t0 in both tissue types. Since I expect the responses to be different in the different tissue types, I've included an interaction term.

Design:

~ patient_id + time_id + tissue_id + time_id:tissue_id

Contrasts:

c('time_id', '0', '3')
c(0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1)


Looking for the genes with maximal fold changes in the size factor-normalized counts identifies a large number of very similar genes (i.e. names identical apart from a number, no difference in description on GeneCards) with log2 fold changes less than -4. However these genes are highly variable, and this combined with small sample sizes (I assume) means that they are not reported by DESeq2, even at a 10% FDR.

It seems like there should be a way (most likely using something other than or in addition to DESeq2) to increase effective sample size and therefore power, by somehow grouping these functionally very similar genes. Grouping could be e.g. by annotations, or by a preliminary time-course profile clustering step. I've looked on Bioconductor for this, but haven't found anything. Any suggestions welcome!

Thanks in advance,
Will

deseq2 rnaseq differential gene expression • 1.0k views
ADD COMMENT
1
Entering edit mode
@mikelove
Last seen 1 hour ago
United States

Absolutely, testing gene sets increases power, especially for experiments where the signal for each gene individual might be low.

You can use the goseq package in combination with DESeq2.

Or you can try the various gene set testing methods in limma: roast, camera, etc.

ADD COMMENT
0
Entering edit mode

Thanks for the quick response. I had a quick search for this combination (goseq then DESeq2) but couldn't see anything obvious. I'm not clear how these would be combined, but perhaps I'm missing something obvious. The approach I'm thinking of would be something like:

- group all measured genes into groups according to GO terms

- use DESeq2 to fit a model to each gene group, maybe with a term for which individual gene as equivalent to a batch effect 

Then this would test for DE in each group. Is this what you had in mind?

If so, I don't see how to use goseq for this - that tests for annotations when you have a group of differentially expressed genes, when what I want to do is test for differential expression when I have a group of annotations.

It also seems like it could be statistically a bit tricky, e.g., how to select GO terms without first having differential expression; does the statistical model assumed in DESeq2 still hold for groups of genes; do interactions between GO terms affect any assumptions of independence. I'm sure there are more!

ADD REPLY
0
Entering edit mode
If the goseq model doesn't fit what you're after (set testing after per gene DE testing) , take a look at the limma methods I mentioned, and the accompanying papers. These are doing what you describe.
ADD REPLY

Login before adding your answer.

Traffic: 759 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6