DESeq2: Question about the alpha in summary() and results()
1
0
Entering edit mode
Jon Bråte ▴ 240
@jon-brate-6263
Last seen 18 months ago
Norway

I'm getting slight differences when setting the alpha to 0.05 with the results() function and the summary() function. Is this correct, or am I doing something wrong?

Thanks, Jon

> res <- results(dds)
> summary(res)

out of 29391 with nonzero total read count
LFC > 0 (up)     : 2218, 7.5%
LFC < 0 (down)   : 2607, 8.9%
outliers [1]     : 0, 0%
low counts [2]   : 11397, 39%
(mean count < 5)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

> res.05 <- results(dds, alpha = 0.05)
> summary(res.05)

out of 29391 with nonzero total read count
LFC > 0 (up)     : 1814, 6.2%
LFC < 0 (down)   : 2210, 7.5%
outliers [1]     : 0, 0%
low counts [2]   : 12536, 43%
(mean count < 7)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

> summary(res, alpha = 0.05)

out of 29391 with nonzero total read count
LFC > 0 (up)     : 1797, 6.1%
LFC < 0 (down)   : 2197, 7.5%
outliers [1]     : 0, 0%
low counts [2]   : 11397, 39%
(mean count < 5)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.2

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] C

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] DESeq2_1.18.1              SummarizedExperiment_1.8.0 DelayedArray_0.4.1
[4] matrixStats_0.52.2         Biobase_2.38.0             GenomicRanges_1.30.0
[7] GenomeInfoDb_1.14.0        IRanges_2.12.0             S4Vectors_0.16.0
[10] BiocGenerics_0.24.0

loaded via a namespace (and not attached):
[1] locfit_1.5-9.1          Rcpp_0.12.14            lattice_0.20-35
[4] tidyr_0.7.2             assertthat_0.2.0        digest_0.6.12
[7] R6_2.2.2                plyr_1.8.4              backports_1.1.1
[10] acepack_1.4.1           RSQLite_2.0             ggplot2_2.2.1
[13] zlibbioc_1.24.0         rlang_0.1.4             lazyeval_0.2.1
[16] rstudioapi_0.7          data.table_1.10.4-3     annotate_1.56.1
[19] blob_1.1.0              rpart_4.1-11            Matrix_1.2-12
[22] checkmate_1.8.5         splines_3.4.3           BiocParallel_1.12.0
[25] geneplotter_1.56.0      stringr_1.2.0           foreign_0.8-69
[28] htmlwidgets_0.9         bit_1.1-12              RCurl_1.95-4.8
[31] munsell_0.4.3           compiler_3.4.3          pkgconfig_2.0.1
[34] base64enc_0.1-3         htmltools_0.3.6         nnet_7.3-12
[37] tibble_1.3.4            gridExtra_2.3           htmlTable_1.11.0
[40] GenomeInfoDbData_0.99.1 Hmisc_4.0-3             XML_3.98-1.9
[43] dplyr_0.7.4             bitops_1.0-6            grid_3.4.3
[46] xtable_1.8-2            gtable_0.2.0            DBI_0.7
[49] magrittr_1.5            scales_0.5.0            stringi_1.1.6
[52] XVector_0.18.0          genefilter_1.60.0       bindrcpp_0.2
[55] latticeExtra_0.6-28     Formula_1.2-2           RColorBrewer_1.1-2
[58] tools_3.4.3             bit64_0.9-7             glue_1.2.0
[61] purrr_0.2.4             survival_2.41-3         yaml_2.1.15
[64] AnnotationDbi_1.40.0    colorspace_1.3-2        cluster_2.0.6
[67] memoise_1.1.0           knitr_1.17              bindr_0.1

deseq2 • 4.2k views
2
Entering edit mode
@mikelove
Last seen 3 days ago
United States

Take a look at what 'alpha' does in ?results:

   alpha: the significance cutoff used for optimizing the independent
filtering (by default 0.1). If the adjusted p-value cutoff
(FDR) will be a value other than 0.1, ‘alpha’ should be set
to that value.

Whereas in ?summary.DESeqResults:

   alpha: the adjusted p-value cutoff. If not set, this defaults to the
‘alpha’ argument which was used in ‘results’ to set the
target FDR for independent filtering, or if independent
filtering was not performed, to 0.1.

So, in results(), alpha is used to change how the independent filtering is performed, which changes the adjusted p-values themselves. In summary(), alpha is just changing the threshold that is used to tally up the DE gene counts.

0
Entering edit mode

I see, thanks!

0
Entering edit mode

I recently repeated an analysis with alpha set to 0.5 to match my FDR cutoff, as recommended on this thread. I expected this to change my results but it did absolutely nothing (see summary below). The stats appear to be identical for both analyses. Does this make sense? For some datasets, might this change not affect filtering AT ALL? Note that changing alpha to 0.01 did affect the results. I think this is telling me that I'm doing it right and that there is no difference if I run the analysis with alpha = 0.05 vs. 0.1. Would you agree?

Thanks!

sessionInfo()

R version 3.4.2 (2017-09-28) Platform: x8664-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 16299) other attached packages: [1] DESeq21.18.1

summary(res_p.05)

out of 23911 with nonzero total read count adjusted p-value < 0.05 LFC > 0 (up) : 1255, 5.2% LFC < 0 (down) : 1157, 4.8% outliers [1] : 0, 0% low counts [2] : 7925, 33% (mean count < 5)

summary(res)

out of 23911 with nonzero total read count adjusted p-value < 0.1 LFC > 0 (up) : 1590, 6.6% LFC < 0 (down) : 1567, 6.6% outliers [1] : 0, 0% low counts [2] : 7925, 33% (mean count < 5)

0
Entering edit mode

I don't follow you. These summary tables are different.

0
Entering edit mode

Ah, yes. They are different in that they give different numbers of DE genes. I assume this is simply due to the different p-value cutoff. The filtering is not different (as I thought it might be) and the actual statistics (lfcS, stat, pvalue, padj). As it is, I could simply filter my results from the default analysis (alpha = 0.1) and obtain the same result as I do when the analysis is done with alpha = 0.05. Does that make sense?

I was under the impression, from your comments here, that the alpha value would alter the filtering.

thanks!

0
Entering edit mode

*actual statistics are identical

0
Entering edit mode

I see - yes the filtering can be different but in this case it is not.

0
Entering edit mode

Gotcha. So it's good practice to use alpha at the same value as your FDR cutoff, but in my case it doesn't change the filtering. Thank you!

0
Entering edit mode

Here are the results with alpha = 0.01. This does change the filtering and it changes the statistics slightly. I think only the adjusted p-value though. I assume this is because of the number of genes being tested..?

summary(res_p.01)

out of 23911 with nonzero total read count adjusted p-value < 0.01 LFC > 0 (up) : 808, 3.4% LFC < 0 (down) : 598, 2.5% outliers [1] : 0, 0% low counts [2] : 7485, 31% (mean count < 4)

0
Entering edit mode

Hi Michael, I have 3 questions if that's ok.

1- I did DE and changed my alpha to 0.05 in results() to match my FDR cut-off. I understood that in summary(), alpha is set automatically to the same value I used when I ran results(). The question I have though, is when using MAplot() do I need to change alpha to match the value I used in results(), which in my case is 0.05?

2- I ran DESeq() with betaPrior=FALSE (which is default) then used results() with addMLE=FALSE (by default) followed by lfcShrink(). If I want to plot the shrunken values, I will need to run MAplot() with MLE=FALSE, right?

3-Finally, in MAplot() how will I know p-adj value that is used to show the significant genes (blue points in the plot)? is it the same as my alpha?

0
Entering edit mode

plotMA has its own argument alpha. You should set it to specify which threshold to show. It is 0.1 unless you change the value.

You don't need to run addMLE=FALSE when you run results, because it already has the MLE results. Likewise you don't have to specify MLE=FALSE it will plot the shrunken values by default.

0
Entering edit mode

Thank you Michael for the quick reply!

regarding addMLE=FALSE, if I don't add it, you stated that it has already the MLE results, so unshrunken values. But I want to plot the shrunken values.

On another thread you explained how to plot shrunken vs unshrunken values. After further reading DESeq2 documentation, what I understood is that:

1- We run DESeq() with default betaPrior=FALSE then results() with default addMLE=FALSE. Then shrunken can be obtained using lfcShrink()

2- If we run DESeq() with betaPrior=TRUE to get unshrunken values, we need to run results() with addMLE=TRUE

I went with option 1 and I am looking to use plotMA to plot shrunken vs unshrunken. Is this not possible?

0
Entering edit mode

Don't run DESeq() with betaPrior=TRUE. It's essentially deprecated. You won't see it mentioned in the vignette or documentation.

res1 <- results(dds) # unshrunken
res2 <- lfcShrink(dds) # shrunken

1
Entering edit mode

Thank you so much Michael! I really appreciate it :)