DESeq2 'results' function not responding to change in alpha
1
0
Entering edit mode
@polinskimark-8740
Last seen 8.7 years ago
Canada
I have eight experimental conditions (groups) and am looking to identify differentially expressed genes between two groups at a time (i.e. group 1 vs group 2). I run the comparison as follows
dds <- DESeqDataSet(data, ~ Group) 

dds <- DESeq(dds) 

res <- results(dds, contrast=c("Group","Group 1","Group 2")) 

Sigres <- res[which(res$padj<0.05),]

 

This gives me 2440 differentially expressed genes with a p-value<0.05 at an FDR of 0.1 (the default)...Great!...but the problem is that when I try to change the FDR cut-off to something else I still get the same 2440 genes no matter what I set (i.e. alpha=0.5, alpha=0.01, ect.). I attempt to change the FDR alpha value in the code as follows:

res <- results(dds, contrast=c("Group","Group 1","Group 2"), alpha=0.01) 

 

Why is this not working? Thanks-

deseq2 • 4.8k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 1 day ago
United States

hi Mark,

The 'alpha' in results is for you to tell the independent filtering algorithm at what adjusted p-value you will threshold (and this gives you a set where you expect FDR < alpha).

You code should look like this, where the same number is used for 'alpha' argument in results() and then for thresholding res$padj:

alpha <- 0.05
results(dds, alpha=alpha)
sum(res$padj < alpha)

 

ADD COMMENT
0
Entering edit mode

Hi Michael,

Thank you for your quick response. I have tried your suggested alteration but unfortunately I am still getting the same problem. Further, I think my understanding of the results analysis (that alpha sets the FDR and thus the adjusted p-value) must be wrong because picking a p-value cutoff to identify significance following the test should not be dependent on what the FDR was set at. So I guess my question is how do I change the False Discovery Rate during DESeq2 analysis?

Here is an example of what I have and what I'm getting when I attempt to change the alpha value. For this example I'll compare Group 3 vs Group 5 from the design outlined above because there is both high dispersion variance and high differential expression changes between these groups that adjusting the FDR should have a significant effect on.

> head(counts)
  S86d14nHK S87d14nHK S88d14nHK S89d14nHK S91d14nHK S92d14nHK …..
c192359_g3 576 457 459 402 489 441  
c199395_g2 738 694 596 785 625 620  
c201183_g6 20 9 2 4 22 12  
c189046_g6 82 78 66 53 53 49  
c191862_g14 91 93 68 118 85 118  
c173740_g1 380 850 1139 1305 638 721  
 
> head(coldata)
  Sample.Name Group
S86d14nHK S86d14nHK Group 1
S87d14nHK S87d14nHK Group 1
S88d14nHK S88d14nHK Group 1
S89d14nHK S89d14nHK Group 1
S91d14nHK S91d14nHK Group 1
S92d14nHK S92d14nHK Group 1
 
> (ddsMat=DESeqDataSetFromMatrix(countData = counts, colData = coldata, design = ~ Group))
class: DESeqDataSet 
dim: 60421 36 
exptData(0):
assays(1): counts
rownames(60421): c192359_g3 c199395_g2 ... c193438_g1 c200065_g1
rowRanges metadata column names(0):
colnames(36): S86d14nHK S87d14nHK ... S214d21pHKIHN S216d21pHKIHN
colData names(2): Sample.Name Group

> dds=DESeq(ddsMat)

First using the default settings with what I presume is an FDR 0.1

> res1=results(dds, contrast = c("Group", "Group 3", "Group 5"))

> summary(res1)

out of 60421 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)     : 12963, 21% 
LFC < 0 (down)   : 15897, 26% 
outliers [1]     : 399, 0.66% 
low counts [2]   : 0, 0% 
(mean count < 3.5)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

> res1sig=res1[which(res1$padj < 0.05),]

> nrow(res1sig)
[1] 24223

Now trying to adjust the FDR to 0.05 as you suggest:

> alpha <- 0.05
> res2=results(dds, contrast = c("Group", "Group 3", "Group 5"), alpha = alpha)
> summary(res2)

out of 60421 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)     : 12963, 21% 
LFC < 0 (down)   : 15897, 26% 
outliers [1]     : 399, 0.66% 
low counts [2]   : 0, 0% 
(mean count < 3.5)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

> sum(res$padj < alpha)
[1] NA

> res2sig=res2[which(res2$padj < alpha),]
> nrow(res1sig)
[1] 24223

As you can see, whatever I'm doing doesn't seem to be changing anything. I have installed the latest updates and have gone through the help documentation pretty thoroughly. I can't seem to figure out what I'm doing wrong. Any further help  would be much appreciated as I'm obviously missing something. Thanks.

Session info in providing the above example, in case it's relevant:

> sessionInfo()
R version 3.2.1 (2015-06-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] DESeq2_1.8.1              RcppArmadillo_0.5.400.2.0 Rcpp_0.12.0               GenomicRanges_1.20.6     
[5] GenomeInfoDb_1.4.2        IRanges_2.2.7             S4Vectors_0.6.5           BiocGenerics_0.14.0      

loaded via a namespace (and not attached):
 [1] RColorBrewer_1.1-2   futile.logger_1.4.1  plyr_1.8.3           XVector_0.8.0        futile.options_1.0.0 tools_3.2.1         
 [7] rpart_4.1-10         digest_0.6.8         RSQLite_1.0.0        annotate_1.46.1      gtable_0.1.2         lattice_0.20-33     
[13] DBI_0.3.1            proto_0.3-10         gridExtra_2.0.0      genefilter_1.50.0    stringr_1.0.0        cluster_2.0.3       
[19] locfit_1.5-9.1       nnet_7.3-11          grid_3.2.1           Biobase_2.28.0       AnnotationDbi_1.30.1 XML_3.98-1.3        
[25] survival_2.38-3      BiocParallel_1.2.20  foreign_0.8-66       latticeExtra_0.6-26  Formula_1.2-1        geneplotter_1.46.0  
[31] ggplot2_1.0.1        reshape2_1.4.1       lambda.r_1.1.7       magrittr_1.5         scales_0.3.0         Hmisc_3.16-0        
[37] MASS_7.3-44          splines_3.2.1        xtable_1.7-4         colorspace_1.2-6     stringi_0.5-5        acepack_1.3-3.3     
[43] munsell_0.4.2       
 
ADD REPLY
1
Entering edit mode

Firstly, check the help page for the summary() function, specifically the 'alpha' argument of summary():

help("summary",package="DESeq2")

summary() is just printing a table for you, and you need to say what threshold you want.

Secondly, the code after you wrote "First using the default settings with what I presume is an FDR 0.1" is not correct. 

As I said previously, you need to use the same number for the 'alpha' argument in results() and then for thresholding res$padj. If you use 0.1 for results()'s 'alpha' and threshold at 0.05, you are achieving a suboptimal independent filtering, and a list with FDR 5%.

If you threshold adjusted p-values at 0.05, then you are not defining a list with FDR 10%. What you have done in that code chunk is to tell the independent filtering procedure that you plan to threshold at 10%, and then you have instead thresholded at 5%, giving you a list with estimated FDR less than 5%, although you have a suboptimal independent filtering procedure because you didn't tell it you were going to filter at 5%.

But the big point to remember is that the thresholding, that is:

sum(res$padj < alpha, na.rm=TRUE)

...is the line of code which tells you how many genes have expected FDR ≤ alpha, not the 'alpha' argument to results().

ADD REPLY
0
Entering edit mode

Ah...got it.  Thanks for that explanation, I appreciate your time-

 

Mark

ADD REPLY

Login before adding your answer.

Traffic: 799 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6