Hello,
I am analysing a RNA-Seq experiment using DESeq2 trying to understand the different log2Fold shrinking algorithms available in DESeq2. After reading the documentation I understood that setting betaPrior=TRUE
when calling DESeq
would use the "normal" log2Fold shrinking as it was used as default in DESeq2 versions before <1.16 and that from v1.16 onwards the default setting would be betaPrior=FALSE
and that the default workflow would be betaPrior=FALSE
and then call lfcShrink
to perform log2Fold shrinking for ranking and visualisation.
I now ran DESeq with betaPrior=FALSE
(as default) and then performed IHW on the results:
>dds<-DESeq(ddsFullCountTable, fitType="parametric", betaPrior=FALSE,parallel=F)
>res<-results(dds, contrast=c("pheno", "ctrl", "treat"), parallel=F, cooksCutoff=TRUE,
+ independentFiltering=TRUE, filterFun=ihw)
>head(res[order(res$padj),])
log2 fold change (MLE): pheno ctrl vs treat
Wald test p-value: pheno ctrl vs treat
DataFrame with 6 rows and 7 columns
baseMean log2FoldChange lfcSE
<numeric> <numeric> <numeric>
comp2268393_c0_seq2 20.2323701630026 -21.166856612115 4.24861896850527
comp2273865_c0_seq2 1643.74563240146 -0.753207633363883 0.153241371180767
comp2277935_c1_seq8 286.910832769305 -0.691222223493662 0.169714446384218
comp2141997_c0_seq1 320.550667604245 -0.59060881408569 0.151468610465752
comp2268577_c0_seq9 1030.35777262206 0.450412167787341 0.118900113344764
Pdu_Vtg_1.1 2766.07425488488 0.363861647087163 0.0960080095221327
stat pvalue padj
<numeric> <numeric> <numeric>
comp2268393_c0_seq2 -4.98205576188957 6.29122985060415e-07 0.0199758626049576
comp2273865_c0_seq2 -4.91517158558562 8.87047341413335e-07 0.0199758626049576
comp2277935_c1_seq8 -4.0728543634335 4.64404654102552e-05 0.0551478686234231
comp2141997_c0_seq1 -3.89921589872398 9.6504696838006e-05 0.085949208773278
comp2268577_c0_seq9 3.78815591606141 0.000151769598702969 0.114241548368527
Pdu_Vtg_1.1 3.78990928880034 0.0001507023086035 0.149298720611182
weight
<numeric>
comp2268393_c0_seq2 1
comp2273865_c0_seq2 1
comp2277935_c1_seq8 12.6425685091306
comp2141997_c0_seq1 12.6425685091306
comp2268577_c0_seq9 11.9668387790622
Pdu_Vtg_1.1 7.57707015104483
However, when when I run the same data set with betaPrior=TRUE
I get different results:
>dds<-DESeq(ddsFullCountTable, fitType="parametric", betaPrior=TRUE,parallel=F)
>res<-results(dds, contrast=c("pheno", "ctrl", "treat"), parallel=F, cooksCutoff=TRUE,
+ independentFiltering=TRUE, filterFun=ihw)
>head(res[order(res$padj),])
log2 fold change (MAP): pheno ctrl vs treat
Wald test p-value: pheno ctrl vs treat
DataFrame with 6 rows and 7 columns
baseMean log2FoldChange lfcSE
<numeric> <numeric> <numeric>
comp2273865_c0_seq2 1643.74563240146 -0.669065523108693 0.136251802675902
comp2277935_c1_seq8 286.910832769305 -0.598085295272818 0.14708009415555
Pdu_Vtg_1.1 2766.07425488488 0.346637211959349 0.0915191874241819
comp2224448_c0_seq1 5846.99678497238 0.278434407963567 0.0815422524249737
comp2252584_c0_seq1 2574.82319882063 -0.306324451519527 0.0914966906834413
gnl|Pdu_trscr_assembly_1|4219 1109.47286054481 -0.517737621443726 0.128467156601779
stat pvalue padj
<numeric> <numeric> <numeric>
comp2273865_c0_seq2 -4.91050767746668 9.08408992313057e-07 0.00281860960251004
comp2277935_c1_seq8 -4.06639184389078 4.77466288487251e-05 0.0865926427665106
Pdu_Vtg_1.1 3.78759057762086 0.000152115238366902 0.100296223175306
comp2224448_c0_seq1 3.41460285536939 0.000638750715994235 0.292933653069786
comp2252584_c0_seq1 -3.3479292992065 0.000814177791399682 0.292933653069786
gnl|Pdu_trscr_assembly_1|4219 -4.03011660831417 5.57491897770106e-05 0.292933653069786
weight
<numeric>
comp2273865_c0_seq2 14.5156081808396
comp2277935_c1_seq8 12.4171081284368
Pdu_Vtg_1.1 22.7696118687405
comp2224448_c0_seq1 14.5156081808396
comp2252584_c0_seq1 22.7696118687405
gnl|Pdu_trscr_assembly_1|4219 1
For me this means that using a log2FoldChange shriking algorithm during running DESeq2 is influencing the DE calls. If I understood the mathematical meaning of the beta prior correctly this makes kind of sense, but I am now not sure, which workflow to use. If I use betaPrior=TRUE
when calling DESeq
I would have the "normal" log2FoldChange shrinking influencing my results (would there be a possibility to use "apeglm" when setting betaPrior=TRUE
?), and when using the default setting, i.e. betaPrior=FALSE
I would have no shirnking impacting on my DE calls, but when using lfcShrink
to visualise and rank data this "shrinking factor" would not be accounted for in the DE calls. Is there any advice on that? Or ideal way to proceed?
Thanks a lot for your help/advice
Sven
> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
Matrix products: default
locale:
[1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252
attached base packages:
[1] parallel stats4 stats graphics grDevices datasets utils methods
[9] base
other attached packages:
[1] IHW_1.10.1 ggplot2_3.1.0 gplots_3.0.1.1
[4] RColorBrewer_1.1-2 DESeq2_1.22.2 SummarizedExperiment_1.12.0
[7] DelayedArray_0.8.0 BiocParallel_1.16.6 matrixStats_0.54.0
[10] Biobase_2.42.0 GenomicRanges_1.34.0 GenomeInfoDb_1.18.2
[13] IRanges_2.16.0 S4Vectors_0.20.1 BiocGenerics_0.28.0
[16] installr_0.21.0 stringr_1.4.0
loaded via a namespace (and not attached):
[1] bitops_1.0-6 bit64_0.9-7 numDeriv_2016.8-1
[4] tools_3.5.0 backports_1.1.3 R6_2.3.0
[7] rpart_4.1-13 KernSmooth_2.23-15 Hmisc_4.2-0
[10] DBI_1.0.0 lazyeval_0.2.1 colorspace_1.4-0
[13] nnet_7.3-12 apeglm_1.4.2 withr_2.1.2
[16] gridExtra_2.3 bit_1.1-14 compiler_3.5.0
[19] fdrtool_1.2.15 htmlTable_1.13.1 slam_0.1-44
[22] caTools_1.17.1.1 scales_1.0.0 checkmate_1.9.1
[25] genefilter_1.64.0 digest_0.6.18 foreign_0.8-71
[28] XVector_0.22.0 base64enc_0.1-3 pkgconfig_2.0.2
[31] htmltools_0.3.6 lpsymphony_1.10.0 bbmle_1.0.20
[34] htmlwidgets_1.3 rlang_0.3.1 rstudioapi_0.9.0
[37] RSQLite_2.1.1 bindr_0.1.1 gtools_3.8.1
[40] acepack_1.4.1 RCurl_1.95-4.11 magrittr_1.5
[43] GenomeInfoDbData_1.2.0 Formula_1.2-3 Matrix_1.2-15
[46] Rcpp_1.0.0 munsell_0.5.0 stringi_1.2.4
[49] MASS_7.3-51.1 zlibbioc_1.28.0 plyr_1.8.4
[52] grid_3.5.0 blob_1.1.1 gdata_2.18.0
[55] crayon_1.3.4 lattice_0.20-38 splines_3.5.0
[58] annotate_1.60.0 locfit_1.5-9.1 knitr_1.21
[61] pillar_1.3.1 geneplotter_1.60.0 XML_3.98-1.17
[64] glue_1.3.0 latticeExtra_0.6-28 data.table_1.12.0
[67] gtable_0.2.0 assertthat_0.2.0 emdbook_1.3.11
[70] xfun_0.4 xtable_1.8-3 coda_0.19-2
[73] survival_2.43-3 tibble_2.0.1 AnnotationDbi_1.44.0
[76] memoise_1.1.0 bindrcpp_0.2.2 cluster_2.0.7-1
>
Hi,
thanks a lot for clarifying this.
Sven