The documentation of estimateDispersions in DESeq2 says
estimateDispersions checks for the case of an analysis with as many samples as the number of coefficients to fit, and will temporarily substitute a design formula ~ 1 for the purposes of dispersion estimation.
However, it appears that the results of estimateDispersions are different depending on the design of the DESeq2 object it receives as input.
As an example for that, the mean of the dispersions resulting from different designs is different, here is sample code :
```{r} suppressPackageStartupMessages(library(DESeq2)) ```
```{r} set.seed(1234) ```
```{r} dds <- makeExampleDESeqDataSet(betaSD = 1) ```
```{r} mycolData <- colData(dds) mycolData$batch <- factor(c(1,1,1,2,2,2,1,1,1,2,2,2)) colData(dds) <- mycolData
colData(dds)
```
DataFrame with 12 rows and 2 columns
condition batch
<factor> <factor>
sample1 A 1
sample2 A 1
sample3 A 1
sample4 A 2
sample5 A 2
... ... ...
sample8 B 1
sample9 B 1
sample10 B 2
sample11 B 2
sample12 B 2
```{r}
dds <- estimateSizeFactors(dds)
dds2<- dds
design(dds2) <- ~ 1
dds2 <- estimateDispersions(dds2)
design(dds) <- ~batch + condition
dds <- estimateDispersions(dds)
print(paste("Mean of dispersions with model ~batch+condition", mean(dispersions(dds),na.rm=T) ) )
print(paste("Mean of dispersions with model ~1", mean(dispersions(dds2),na.rm=T) ) )
```
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
[1] "Mean of dispersions with model ~batch+condition 0.93962274256627"
[1] "Mean of dispersions with model ~1 0.99071610332295"
```{r}
sessionInfo()
```
R version 3.6.0 (2019-04-26)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.6 LTS
Matrix products: default
BLAS: /usr/lib/atlas-base/atlas/libblas.so.3.0
LAPACK: /usr/lib/atlas-base/atlas/liblapack.so.3.0
locale:
[1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
[4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
[7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] DESeq2_1.26.0 SummarizedExperiment_1.16.0
[3] DelayedArray_0.12.0 BiocParallel_1.20.0
[5] matrixStats_0.55.0 Biobase_2.46.0
[7] GenomicRanges_1.38.0 GenomeInfoDb_1.22.0
[9] IRanges_2.20.0 S4Vectors_0.24.0
[11] BiocGenerics_0.32.0
loaded via a namespace (and not attached):
[1] locfit_1.5-9.1 Rcpp_1.0.2 lattice_0.20-38
[4] zeallot_0.1.0 digest_0.6.22 backports_1.1.5
[7] acepack_1.4.1 RSQLite_2.1.2 ggplot2_3.2.1
[10] pillar_1.4.2 zlibbioc_1.32.0 rlang_0.4.1
[13] lazyeval_0.2.2 rstudioapi_0.10 data.table_1.12.6
[16] annotate_1.64.0 blob_1.2.0 rpart_4.1-15
[19] Matrix_1.2-17 checkmate_1.9.4 splines_3.6.0
[22] geneplotter_1.64.0 stringr_1.4.0 foreign_0.8-71
[25] htmlwidgets_1.5.1 RCurl_1.95-4.12 bit_1.1-14
[28] munsell_0.5.0 compiler_3.6.0 xfun_0.10
[31] pkgconfig_2.0.3 base64enc_0.1-3 htmltools_0.4.0
[34] nnet_7.3-12 tibble_2.1.3 gridExtra_2.3
[37] htmlTable_1.13.2 GenomeInfoDbData_1.2.2 Hmisc_4.2-0
[40] XML_3.98-1.20 crayon_1.3.4 bitops_1.0-6
[43] grid_3.6.0 xtable_1.8-4 gtable_0.3.0
[46] DBI_1.0.0 magrittr_1.5 scales_1.0.0
[49] stringi_1.4.3 XVector_0.26.0 genefilter_1.68.0
[52] latticeExtra_0.6-28 vctrs_0.2.0 Formula_1.2-3
[55] RColorBrewer_1.1-2 tools_3.6.0 bit64_0.9-7
[58] survival_2.44-1.1 AnnotationDbi_1.48.0 colorspace_1.4-1
[61] cluster_2.0.8 memoise_1.1.0 knitr_1.25
I see now that the rest of the message gave more context
This is now the behavior for a year so I've removed the paragraph entirely in the devel branch.