I have eight experimental conditions (groups) and am looking to identify differentially expressed genes between two groups at a time (i.e. group 1 vs group 2). I run the comparison as follows
dds <- DESeqDataSet(data, ~ Group)
dds <- DESeq(dds)
res <- results(dds, contrast=c("Group","Group 1","Group 2"))
Sigres <- res[which(res$padj<0.05),]
This gives me 2440 differentially expressed genes with a p-value<0.05 at an FDR of 0.1 (the default)...Great!...but the problem is that when I try to change the FDR cut-off to something else I still get the same 2440 genes no matter what I set (i.e. alpha=0.5, alpha=0.01, ect.). I attempt to change the FDR alpha value in the code as follows:
res <- results(dds, contrast=c("Group","Group 1","Group 2"), alpha=0.01)
Why is this not working? Thanks-
Hi Michael,
Thank you for your quick response. I have tried your suggested alteration but unfortunately I am still getting the same problem. Further, I think my understanding of the results analysis (that alpha sets the FDR and thus the adjusted p-value) must be wrong because picking a p-value cutoff to identify significance following the test should not be dependent on what the FDR was set at. So I guess my question is how do I change the False Discovery Rate during DESeq2 analysis?
Here is an example of what I have and what I'm getting when I attempt to change the alpha value. For this example I'll compare Group 3 vs Group 5 from the design outlined above because there is both high dispersion variance and high differential expression changes between these groups that adjusting the FDR should have a significant effect on.
First using the default settings with what I presume is an FDR 0.1
Now trying to adjust the FDR to 0.05 as you suggest:
As you can see, whatever I'm doing doesn't seem to be changing anything. I have installed the latest updates and have gone through the help documentation pretty thoroughly. I can't seem to figure out what I'm doing wrong. Any further help would be much appreciated as I'm obviously missing something. Thanks.
Session info in providing the above example, in case it's relevant:
Firstly, check the help page for the summary() function, specifically the 'alpha' argument of summary():
summary() is just printing a table for you, and you need to say what threshold you want.
Secondly, the code after you wrote "First using the default settings with what I presume is an FDR 0.1" is not correct.
As I said previously, you need to use the same number for the 'alpha' argument in results() and then for thresholding res$padj. If you use 0.1 for results()'s 'alpha' and threshold at 0.05, you are achieving a suboptimal independent filtering, and a list with FDR 5%.
If you threshold adjusted p-values at 0.05, then you are not defining a list with FDR 10%. What you have done in that code chunk is to tell the independent filtering procedure that you plan to threshold at 10%, and then you have instead thresholded at 5%, giving you a list with estimated FDR less than 5%, although you have a suboptimal independent filtering procedure because you didn't tell it you were going to filter at 5%.
But the big point to remember is that the thresholding, that is:
...is the line of code which tells you how many genes have expected FDR ≤ alpha, not the 'alpha' argument to results().
Ah...got it. Thanks for that explanation, I appreciate your time-
Mark