Why does dispersion estimation in DESeq2 depend on the design formula?
1
0
Entering edit mode
Sam ▴ 10
@sam-21502
Last seen 12 weeks ago
Jerusalem

The documentation of estimateDispersions in DESeq2 says

estimateDispersions checks for the case of an analysis with as many samples as the number of coefficients to fit, and will temporarily substitute a design formula ~ 1 for the purposes of dispersion estimation.

However, it appears that the results of estimateDispersions are different depending on the design of the DESeq2 object it receives as input.

As an example for that, the mean of the dispersions resulting from different designs is different, here is sample code :

```{r} suppressPackageStartupMessages(library(DESeq2)) ```

```{r} set.seed(1234) ```



```{r} dds <- makeExampleDESeqDataSet(betaSD = 1) ```

```{r} mycolData <- colData(dds) mycolData$batch <- factor(c(1,1,1,2,2,2,1,1,1,2,2,2)) colData(dds) <- mycolData

colData(dds)

```

DataFrame with 12 rows and 2 columns
         condition    batch
          <factor> <factor>
sample1          A        1
sample2          A        1
sample3          A        1
sample4          A        2
sample5          A        2
...            ...      ...
sample8          B        1
sample9          B        1
sample10         B        2
sample11         B        2
sample12         B        2

```{r}
dds <- estimateSizeFactors(dds)


dds2<- dds
design(dds2) <- ~ 1
dds2 <- estimateDispersions(dds2)

design(dds) <- ~batch  + condition
dds <- estimateDispersions(dds)


print(paste("Mean of dispersions with model ~batch+condition", mean(dispersions(dds),na.rm=T) ) )

print(paste("Mean of dispersions with model ~1", mean(dispersions(dds2),na.rm=T) ) )
```

gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
[1] "Mean of dispersions with model ~batch+condition 0.93962274256627"
[1] "Mean of dispersions with model ~1 0.99071610332295"
```{r}
sessionInfo()
```

R version 3.6.0 (2019-04-26)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.6 LTS

Matrix products: default
BLAS:   /usr/lib/atlas-base/atlas/libblas.so.3.0
LAPACK: /usr/lib/atlas-base/atlas/liblapack.so.3.0

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] DESeq2_1.26.0               SummarizedExperiment_1.16.0
 [3] DelayedArray_0.12.0         BiocParallel_1.20.0        
 [5] matrixStats_0.55.0          Biobase_2.46.0             
 [7] GenomicRanges_1.38.0        GenomeInfoDb_1.22.0        
 [9] IRanges_2.20.0              S4Vectors_0.24.0           
[11] BiocGenerics_0.32.0        

loaded via a namespace (and not attached):
 [1] locfit_1.5-9.1         Rcpp_1.0.2             lattice_0.20-38       
 [4] zeallot_0.1.0          digest_0.6.22          backports_1.1.5       
 [7] acepack_1.4.1          RSQLite_2.1.2          ggplot2_3.2.1         
[10] pillar_1.4.2           zlibbioc_1.32.0        rlang_0.4.1           
[13] lazyeval_0.2.2         rstudioapi_0.10        data.table_1.12.6     
[16] annotate_1.64.0        blob_1.2.0             rpart_4.1-15          
[19] Matrix_1.2-17          checkmate_1.9.4        splines_3.6.0         
[22] geneplotter_1.64.0     stringr_1.4.0          foreign_0.8-71        
[25] htmlwidgets_1.5.1      RCurl_1.95-4.12        bit_1.1-14            
[28] munsell_0.5.0          compiler_3.6.0         xfun_0.10             
[31] pkgconfig_2.0.3        base64enc_0.1-3        htmltools_0.4.0       
[34] nnet_7.3-12            tibble_2.1.3           gridExtra_2.3         
[37] htmlTable_1.13.2       GenomeInfoDbData_1.2.2 Hmisc_4.2-0           
[40] XML_3.98-1.20          crayon_1.3.4           bitops_1.0-6          
[43] grid_3.6.0             xtable_1.8-4           gtable_0.3.0          
[46] DBI_1.0.0              magrittr_1.5           scales_1.0.0          
[49] stringi_1.4.3          XVector_0.26.0         genefilter_1.68.0     
[52] latticeExtra_0.6-28    vctrs_0.2.0            Formula_1.2-3         
[55] RColorBrewer_1.1-2     tools_3.6.0            bit64_0.9-7           
[58] survival_2.44-1.1      AnnotationDbi_1.48.0   colorspace_1.4-1      
[61] cluster_2.0.8          memoise_1.1.0          knitr_1.25
deseq2 • 807 views
ADD COMMENT
1
Entering edit mode
@mikelove
Last seen 3 hours ago
United States

Thanks for your post — that sentence in the man page is out of date and needs to be removed, DESeq2 no longer will fit these designs.

As for your second question, yes the design definitely changes the dispersion. This is correct, and desired behavior.

ADD COMMENT
0
Entering edit mode

I see now that the rest of the message gave more context

estimateDispersions checks for the case of an analysis with as many samples as the number of coefficients to fit, and will temporarily substitute a design formula ~ 1 for the purposes of dispersion estimation. Note that analysis of designs without replicates will be removed in the Oct 2018 release: DESeq2 v1.22.0, after which DESeq2 will give an error.

This is now the behavior for a year so I've removed the paragraph entirely in the devel branch.

ADD REPLY

Login before adding your answer.

Traffic: 908 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6