Hi
I've recently done some DE analysis with DESeq2, with relatively small sample sizes. The experimental setup is a time course of 4 points, with matched samples of two different tissue types from 7 patients (so 56 samples in total). I've tested t3 against t0 in both tissue types. Since I expect the responses to be different in the different tissue types, I've included an interaction term.
Design:
~ patient_id + time_id + tissue_id + time_id:tissue_id
Contrasts:
c('time_id', '0', '3')
c(0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1)
Looking for the genes with maximal fold changes in the size factor-normalized counts identifies a large number of very similar genes (i.e. names identical apart from a number, no difference in description on GeneCards) with log2 fold changes less than -4. However these genes are highly variable, and this combined with small sample sizes (I assume) means that they are not reported by DESeq2, even at a 10% FDR.
It seems like there should be a way (most likely using something other than or in addition to DESeq2) to increase effective sample size and therefore power, by somehow grouping these functionally very similar genes. Grouping could be e.g. by annotations, or by a preliminary time-course profile clustering step. I've looked on Bioconductor for this, but haven't found anything. Any suggestions welcome!
Thanks in advance,
Will
Thanks for the quick response. I had a quick search for this combination (goseq then DESeq2) but couldn't see anything obvious. I'm not clear how these would be combined, but perhaps I'm missing something obvious. The approach I'm thinking of would be something like:
- group all measured genes into groups according to GO terms
- use DESeq2 to fit a model to each gene group, maybe with a term for which individual gene as equivalent to a batch effectÂ
Then this would test for DE in each group. Is this what you had in mind?
If so, I don't see how to use goseq for this - that tests for annotations when you have a group of differentially expressed genes, when what I want to do is test for differential expression when I have a group of annotations.
It also seems like it could be statistically a bit tricky, e.g., how to select GO terms without first having differential expression; does the statistical model assumed in DESeq2 still hold for groups of genes; do interactions between GO terms affect any assumptions of independence. I'm sure there are more!