Question

DESeq2 'results' function not responding to change in alpha

0

Entering edit mode

polinski.mark • 0

@polinskimark-8740

Last seen 8.7 years ago

Canada

I have eight experimental conditions (groups) and am looking to identify differentially expressed genes between two groups at a time (i.e. group 1 vs group 2). I run the comparison as follows

dds <- DESeqDataSet(data, ~ Group) 

dds <- DESeq(dds) 

res <- results(dds, contrast=c("Group","Group 1","Group 2"))

Sigres <- res[which(res$padj<0.05),]

This gives me 2440 differentially expressed genes with a p-value<0.05 at an FDR of 0.1 (the default)...Great!...but the problem is that when I try to change the FDR cut-off to something else I still get the same 2440 genes no matter what I set (i.e. alpha=0.5, alpha=0.01, ect.). I attempt to change the FDR alpha value in the code as follows:

res <- results(dds, contrast=c("Group","Group 1","Group 2"), alpha=0.01)

Why is this not working? Thanks-

deseq2 • 4.8k views

ADD COMMENT • link updated 8.7 years ago by Michael Love 41k • written 8.7 years ago by polinski.mark • 0

score 0 · Answer 1 · 2015-09-02

0

Entering edit mode

Michael Love 41k

@mikelove

Last seen 1 day ago

United States

hi Mark,

The 'alpha' in results is for you to tell the independent filtering algorithm at what adjusted p-value you will threshold (and this gives you a set where you expect FDR < alpha).

You code should look like this, where the same number is used for 'alpha' argument in results() and then for thresholding res$padj:

alpha <- 0.05
results(dds, alpha=alpha)
sum(res$padj < alpha)

ADD COMMENT • link 8.7 years ago Michael Love 41k

0

Entering edit mode

Hi Michael,

Thank you for your quick response. I have tried your suggested alteration but unfortunately I am still getting the same problem. Further, I think my understanding of the results analysis (that alpha sets the FDR and thus the adjusted p-value) must be wrong because picking a p-value cutoff to identify significance following the test should not be dependent on what the FDR was set at. So I guess my question is how do I change the False Discovery Rate during DESeq2 analysis?

Here is an example of what I have and what I'm getting when I attempt to change the alpha value. For this example I'll compare Group 3 vs Group 5 from the design outlined above because there is both high dispersion variance and high differential expression changes between these groups that adjusting the FDR should have a significant effect on.

> head(counts)

	S86d14nHK	S87d14nHK	S88d14nHK	S89d14nHK	S91d14nHK	S92d14nHK	…..
c192359_g3	576	457	459	402	489	441
c199395_g2	738	694	596	785	625	620
c201183_g6	20	9	2	4	22	12
c189046_g6	82	78	66	53	53	49
c191862_g14	91	93	68	118	85	118
c173740_g1	380	850	1139	1305	638	721

> head(coldata)

	Sample.Name	Group
S86d14nHK	S86d14nHK	Group 1
S87d14nHK	S87d14nHK	Group 1
S88d14nHK	S88d14nHK	Group 1
S89d14nHK	S89d14nHK	Group 1
S91d14nHK	S91d14nHK	Group 1
S92d14nHK	S92d14nHK	Group 1

> (ddsMat=DESeqDataSetFromMatrix(countData = counts, colData = coldata, design = ~ Group))
class: DESeqDataSet 
dim: 60421 36 
exptData(0):
assays(1): counts
rownames(60421): c192359_g3 c199395_g2 ... c193438_g1 c200065_g1
rowRanges metadata column names(0):
colnames(36): S86d14nHK S87d14nHK ... S214d21pHKIHN S216d21pHKIHN
colData names(2): Sample.Name Group

> dds=DESeq(ddsMat)

First using the default settings with what I presume is an FDR 0.1

> res1=results(dds, contrast = c("Group", "Group 3", "Group 5"))

> summary(res1)

out of 60421 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)     : 12963, 21% 
LFC < 0 (down)   : 15897, 26% 
outliers [1]     : 399, 0.66% 
low counts [2]   : 0, 0% 
(mean count < 3.5)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

> res1sig=res1[which(res1$padj < 0.05),]

> nrow(res1sig)
[1] 24223

Now trying to adjust the FDR to 0.05 as you suggest:

> alpha <- 0.05
> res2=results(dds, contrast = c("Group", "Group 3", "Group 5"), alpha = alpha)
> summary(res2)

out of 60421 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)     : 12963, 21% 
LFC < 0 (down)   : 15897, 26% 
outliers [1]     : 399, 0.66% 
low counts [2]   : 0, 0% 
(mean count < 3.5)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

> sum(res$padj < alpha)
[1] NA

> res2sig=res2[which(res2$padj < alpha),]
> nrow(res1sig)
[1] 24223

As you can see, whatever I'm doing doesn't seem to be changing anything. I have installed the latest updates and have gone through the help documentation pretty thoroughly. I can't seem to figure out what I'm doing wrong. Any further help would be much appreciated as I'm obviously missing something. Thanks.

Session info in providing the above example, in case it's relevant:

> sessionInfo()
R version 3.2.1 (2015-06-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] DESeq2_1.8.1              RcppArmadillo_0.5.400.2.0 Rcpp_0.12.0               GenomicRanges_1.20.6     
[5] GenomeInfoDb_1.4.2        IRanges_2.2.7             S4Vectors_0.6.5           BiocGenerics_0.14.0      

loaded via a namespace (and not attached):
 [1] RColorBrewer_1.1-2   futile.logger_1.4.1  plyr_1.8.3           XVector_0.8.0        futile.options_1.0.0 tools_3.2.1         
 [7] rpart_4.1-10         digest_0.6.8         RSQLite_1.0.0        annotate_1.46.1      gtable_0.1.2         lattice_0.20-33     
[13] DBI_0.3.1            proto_0.3-10         gridExtra_2.0.0      genefilter_1.50.0    stringr_1.0.0        cluster_2.0.3       
[19] locfit_1.5-9.1       nnet_7.3-11          grid_3.2.1           Biobase_2.28.0       AnnotationDbi_1.30.1 XML_3.98-1.3        
[25] survival_2.38-3      BiocParallel_1.2.20  foreign_0.8-66       latticeExtra_0.6-26  Formula_1.2-1        geneplotter_1.46.0  
[31] ggplot2_1.0.1        reshape2_1.4.1       lambda.r_1.1.7       magrittr_1.5         scales_0.3.0         Hmisc_3.16-0        
[37] MASS_7.3-44          splines_3.2.1        xtable_1.7-4         colorspace_1.2-6     stringi_0.5-5        acepack_1.3-3.3     
[43] munsell_0.4.2

ADD REPLY • link 8.7 years ago polinski.mark • 0

1

Entering edit mode

Firstly, check the help page for the summary() function, specifically the 'alpha' argument of summary():

help("summary",package="DESeq2")

summary() is just printing a table for you, and you need to say what threshold you want.

Secondly, the code after you wrote "First using the default settings with what I presume is an FDR 0.1" is not correct.

As I said previously, you need to use the same number for the 'alpha' argument in results() and then for thresholding res$padj. If you use 0.1 for results()'s 'alpha' and threshold at 0.05, you are achieving a suboptimal independent filtering, and a list with FDR 5%.

If you threshold adjusted p-values at 0.05, then you are not defining a list with FDR 10%. What you have done in that code chunk is to tell the independent filtering procedure that you plan to threshold at 10%, and then you have instead thresholded at 5%, giving you a list with estimated FDR less than 5%, although you have a suboptimal independent filtering procedure because you didn't tell it you were going to filter at 5%.

But the big point to remember is that the thresholding, that is:

sum(res$padj < alpha, na.rm=TRUE)

...is the line of code which tells you how many genes have expected FDR ≤ alpha, not the 'alpha' argument to results().

ADD REPLY • link 8.7 years ago Michael Love 41k

0

Entering edit mode

Ah...got it. Thanks for that explanation, I appreciate your time-

Mark

ADD REPLY • link 8.6 years ago polinski.mark • 0