Entering edit mode
Jon Bråte ▴ 250@jon-brate-6263
Last seen 20 months ago
I'm getting slight differences when setting the alpha to 0.05 with the
results() function and the
summary() function. Is this correct, or am I doing something wrong?
> res <- results(dds) > summary(res) out of 29391 with nonzero total read count adjusted p-value < 0.1 LFC > 0 (up) : 2218, 7.5% LFC < 0 (down) : 2607, 8.9% outliers  : 0, 0% low counts  : 11397, 39% (mean count < 5)  see 'cooksCutoff' argument of ?results  see 'independentFiltering' argument of ?results > res.05 <- results(dds, alpha = 0.05) > summary(res.05) out of 29391 with nonzero total read count adjusted p-value < 0.05 LFC > 0 (up) : 1814, 6.2% LFC < 0 (down) : 2210, 7.5% outliers  : 0, 0% low counts  : 12536, 43% (mean count < 7)  see 'cooksCutoff' argument of ?results  see 'independentFiltering' argument of ?results > summary(res, alpha = 0.05) out of 29391 with nonzero total read count adjusted p-value < 0.05 LFC > 0 (up) : 1797, 6.1% LFC < 0 (down) : 2197, 7.5% outliers  : 0, 0% low counts  : 11397, 39% (mean count < 5)  see 'cooksCutoff' argument of ?results  see 'independentFiltering' argument of ?results
> sessionInfo() R version 3.4.3 (2017-11-30) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS High Sierra 10.13.2 Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib locale:  C attached base packages:  parallel stats4 stats graphics grDevices utils datasets methods base other attached packages:  DESeq2_1.18.1 SummarizedExperiment_1.8.0 DelayedArray_0.4.1  matrixStats_0.52.2 Biobase_2.38.0 GenomicRanges_1.30.0  GenomeInfoDb_1.14.0 IRanges_2.12.0 S4Vectors_0.16.0  BiocGenerics_0.24.0 loaded via a namespace (and not attached):  locfit_1.5-9.1 Rcpp_0.12.14 lattice_0.20-35  tidyr_0.7.2 assertthat_0.2.0 digest_0.6.12  R6_2.2.2 plyr_1.8.4 backports_1.1.1  acepack_1.4.1 RSQLite_2.0 ggplot2_2.2.1  zlibbioc_1.24.0 rlang_0.1.4 lazyeval_0.2.1  rstudioapi_0.7 data.table_1.10.4-3 annotate_1.56.1  blob_1.1.0 rpart_4.1-11 Matrix_1.2-12  checkmate_1.8.5 splines_3.4.3 BiocParallel_1.12.0  geneplotter_1.56.0 stringr_1.2.0 foreign_0.8-69  htmlwidgets_0.9 bit_1.1-12 RCurl_1.95-4.8  munsell_0.4.3 compiler_3.4.3 pkgconfig_2.0.1  base64enc_0.1-3 htmltools_0.3.6 nnet_7.3-12  tibble_1.3.4 gridExtra_2.3 htmlTable_1.11.0  GenomeInfoDbData_0.99.1 Hmisc_4.0-3 XML_3.98-1.9  dplyr_0.7.4 bitops_1.0-6 grid_3.4.3  xtable_1.8-2 gtable_0.2.0 DBI_0.7  magrittr_1.5 scales_0.5.0 stringi_1.1.6  XVector_0.18.0 genefilter_1.60.0 bindrcpp_0.2  latticeExtra_0.6-28 Formula_1.2-2 RColorBrewer_1.1-2  tools_3.4.3 bit64_0.9-7 glue_1.2.0  purrr_0.2.4 survival_2.41-3 yaml_2.1.15  AnnotationDbi_1.40.0 colorspace_1.3-2 cluster_2.0.6  memoise_1.1.0 knitr_1.17 bindr_0.1
I see, thanks!
I recently repeated an analysis with alpha set to 0.5 to match my FDR cutoff, as recommended on this thread. I expected this to change my results but it did absolutely nothing (see summary below). The stats appear to be identical for both analyses. Does this make sense? For some datasets, might this change not affect filtering AT ALL? Note that changing alpha to 0.01 did affect the results. I think this is telling me that I'm doing it right and that there is no difference if I run the analysis with alpha = 0.05 vs. 0.1. Would you agree?
R version 3.4.2 (2017-09-28) Platform: x8664-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 16299) other attached packages:  DESeq21.18.1
out of 23911 with nonzero total read count adjusted p-value < 0.05 LFC > 0 (up) : 1255, 5.2% LFC < 0 (down) : 1157, 4.8% outliers  : 0, 0% low counts  : 7925, 33% (mean count < 5)
out of 23911 with nonzero total read count adjusted p-value < 0.1 LFC > 0 (up) : 1590, 6.6% LFC < 0 (down) : 1567, 6.6% outliers  : 0, 0% low counts  : 7925, 33% (mean count < 5)
I don't follow you. These summary tables are different.
Ah, yes. They are different in that they give different numbers of DE genes. I assume this is simply due to the different p-value cutoff. The filtering is not different (as I thought it might be) and the actual statistics (lfcS, stat, pvalue, padj). As it is, I could simply filter my results from the default analysis (alpha = 0.1) and obtain the same result as I do when the analysis is done with alpha = 0.05. Does that make sense?
I was under the impression, from your comments here, that the alpha value would alter the filtering.
*actual statistics are identical
I see - yes the filtering can be different but in this case it is not.
Gotcha. So it's good practice to use alpha at the same value as your FDR cutoff, but in my case it doesn't change the filtering. Thank you!
Here are the results with alpha = 0.01. This does change the filtering and it changes the statistics slightly. I think only the adjusted p-value though. I assume this is because of the number of genes being tested..?
out of 23911 with nonzero total read count adjusted p-value < 0.01 LFC > 0 (up) : 808, 3.4% LFC < 0 (down) : 598, 2.5% outliers  : 0, 0% low counts  : 7485, 31% (mean count < 4)
Hi Michael, I have 3 questions if that's ok.
1- I did DE and changed my alpha to 0.05 in results() to match my FDR cut-off. I understood that in summary(), alpha is set automatically to the same value I used when I ran results(). The question I have though, is when using MAplot() do I need to change alpha to match the value I used in results(), which in my case is 0.05?
2- I ran DESeq() with betaPrior=FALSE (which is default) then used results() with addMLE=FALSE (by default) followed by lfcShrink(). If I want to plot the shrunken values, I will need to run MAplot() with MLE=FALSE, right?
3-Finally, in MAplot() how will I know p-adj value that is used to show the significant genes (blue points in the plot)? is it the same as my alpha?
plotMAhas its own argument
alpha. You should set it to specify which threshold to show. It is 0.1 unless you change the value.
You don't need to run
addMLE=FALSEwhen you run results, because it already has the MLE results. Likewise you don't have to specify
MLE=FALSEit will plot the shrunken values by default.
Thank you Michael for the quick reply!
addMLE=FALSE, if I don't add it, you stated that it has already the MLE results, so unshrunken values. But I want to plot the shrunken values.
On another thread you explained how to plot shrunken vs unshrunken values. After further reading DESeq2 documentation, what I understood is that:
1- We run
addMLE=FALSE. Then shrunken can be obtained using
2- If we run
betaPrior=TRUEto get unshrunken values, we need to run
I went with option 1 and I am looking to use plotMA to plot shrunken vs unshrunken. Is this not possible?
Don't run DESeq() with betaPrior=TRUE. It's essentially deprecated. You won't see it mentioned in the vignette or documentation.
Thank you so much Michael! I really appreciate it :)