Question

Grouping similar genes in differential expression analysis to increase power

0

Entering edit mode

willmacnair • 0

@willmacnair-7054

Last seen 9.8 years ago

Switzerland

Hi

I've recently done some DE analysis with DESeq2, with relatively small sample sizes. The experimental setup is a time course of 4 points, with matched samples of two different tissue types from 7 patients (so 56 samples in total). I've tested t3 against t0 in both tissue types. Since I expect the responses to be different in the different tissue types, I've included an interaction term.

Design:

~ patient_id + time_id + tissue_id + time_id:tissue_id

Contrasts:

c('time_id', '0', '3') c(0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1)

Looking for the genes with maximal fold changes in the size factor-normalized counts identifies a large number of very similar genes (i.e. names identical apart from a number, no difference in description on GeneCards) with log2 fold changes less than -4. However these genes are highly variable, and this combined with small sample sizes (I assume) means that they are not reported by DESeq2, even at a 10% FDR.

It seems like there should be a way (most likely using something other than or in addition to DESeq2) to increase effective sample size and therefore power, by somehow grouping these functionally very similar genes. Grouping could be e.g. by annotations, or by a preliminary time-course profile clustering step. I've looked on Bioconductor for this, but haven't found anything. Any suggestions welcome!

Thanks in advance,
Will

deseq2 rnaseq differential gene expression • 1.2k views

ADD COMMENT • link updated 9.9 years ago by Michael Love 43k • written 9.9 years ago by willmacnair • 0

score 1 · Answer 1 · 2016-02-25

1

Entering edit mode

Michael Love 43k

@mikelove

Last seen 5 days ago

United States

Absolutely, testing gene sets increases power, especially for experiments where the signal for each gene individual might be low.

You can use the goseq package in combination with DESeq2.

Or you can try the various gene set testing methods in limma: roast, camera, etc.

ADD COMMENT • link 9.9 years ago Michael Love 43k

0

Entering edit mode

Thanks for the quick response. I had a quick search for this combination (goseq then DESeq2) but couldn't see anything obvious. I'm not clear how these would be combined, but perhaps I'm missing something obvious. The approach I'm thinking of would be something like:

- group all measured genes into groups according to GO terms

- use DESeq2 to fit a model to each gene group, maybe with a term for which individual gene as equivalent to a batch effect

Then this would test for DE in each group. Is this what you had in mind?

If so, I don't see how to use goseq for this - that tests for annotations when you have a group of differentially expressed genes, when what I want to do is test for differential expression when I have a group of annotations.

It also seems like it could be statistically a bit tricky, e.g., how to select GO terms without first having differential expression; does the statistical model assumed in DESeq2 still hold for groups of genes; do interactions between GO terms affect any assumptions of independence. I'm sure there are more!

ADD REPLY • link 9.9 years ago willmacnair • 0

0

Entering edit mode

If the goseq model doesn't fit what you're after (set testing after per gene DE testing) , take a look at the limma methods I mentioned, and the accompanying papers. These are doing what you describe.

ADD REPLY • link 9.9 years ago Michael Love 43k