DESeq2 lfcShrink time for analysis
2
0
Entering edit mode
jshouse ▴ 10
@jshouse-10956
Last seen 15 months ago
United States

First, this is a rather large experiment. Roughly 3000 features by 1300 samples. Creating the `dds` object takes 12.5 hours with 20 workers at 3.1 ghz, and resultsNames(dds)  consists of 487 items. 

> resultsNames(dds)[1:8]
  [1] "Intercept"                                             "group_003_CGO_1_vs_MethodBlank_0"                     
  [3] "group_003_CGO_10_vs_MethodBlank_0"                     "group_003_CGO_100_vs_MethodBlank_0"                   
  [5] "group_006_HFO_1_vs_MethodBlank_0"                      "group_006_HFO_10_vs_MethodBlank_0"                    
  [7] "group_006_HFO_100_vs_MethodBlank_0"                    "group_007_HFO_1_vs_MethodBlank_0"                     

 

This treatment set consists of 161 chemicals that we then compare using contrasts for the dose = 100 to the methodblank controls as follows:  results(dds, coef = "group_006_HFO_100_vs_MethodBlank_0", parrallel = TRUE).  This takes about 5 minutes for each contrast

When I try to use lfcShrink(dds, coef="group_006_HFO_100_vs_MethodBlank_0", parallel = TRUE, BPPARAM=SnowParam(18)) , it runs for hours before I interrupt R. While doing so, it pegs all 20 workers at 100%

Am I doing something wrong here? The vignette indicates this shouldn't take very long, especially with parallel workers.

Thanks for your time.

***System Info***

Microsoft R Open 3.4.2
The enhanced R distribution from Microsoft
Microsoft packages Copyright (C) 2017 Microsoft Corporation

Using the Intel MKL for parallel mathematical computing (using 10 cores).

Default CRAN mirror snapshot taken on 2017-10-15.
See: https://mran.microsoft.com/.

> sessionInfo()
R version 3.4.2 (2017-09-28)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] ggplot2_2.2.1              dplyr_0.7.4                BiocParallel_1.12.0        DESeq2_1.18.1              SummarizedExperiment_1.8.0 DelayedArray_0.4.1         matrixStats_0.52.2         Biobase_2.38.0            
 [9] GenomicRanges_1.30.0       GenomeInfoDb_1.14.0        IRanges_2.12.0             S4Vectors_0.16.0           BiocGenerics_0.24.0        RevoUtils_10.0.6           BiocInstaller_1.28.0       RevoUtilsMath_10.0.1      

loaded via a namespace (and not attached):
 [1] tidyr_0.7.2             bit64_0.9-7             splines_3.4.2           Formula_1.2-2           assertthat_0.2.0        latticeExtra_0.6-28     blob_1.1.0              GenomeInfoDbData_0.99.1 yaml_2.1.16            
[10] RSQLite_2.0             backports_1.1.1         lattice_0.20-35         glue_1.2.0              digest_0.6.12           RColorBrewer_1.1-2      XVector_0.18.0          checkmate_1.8.5         colorspace_1.3-2       
[19] htmltools_0.3.6         Matrix_1.2-12           plyr_1.8.4              XML_3.98-1.9            pkgconfig_2.0.1         genefilter_1.60.0       zlibbioc_1.24.0         purrr_0.2.4             xtable_1.8-2           
[28] scales_0.5.0            htmlTable_1.11.0        tibble_1.3.4            annotate_1.56.1         nnet_7.3-12             lazyeval_0.2.1          survival_2.41-3         magrittr_1.5            memoise_1.1.0          
[37] foreign_0.8-69          tools_3.4.2             data.table_1.10.4-3     stringr_1.2.0           locfit_1.5-9.1          munsell_0.4.3           cluster_2.0.6           AnnotationDbi_1.40.0    bindrcpp_0.2           
[46] compiler_3.4.2          rlang_0.1.4             grid_3.4.2              RCurl_1.95-4.8          rstudioapi_0.7          htmlwidgets_0.9         bitops_1.0-6            base64enc_0.1-3         gtable_0.2.0           
[55] DBI_0.7                 R6_2.2.2                gridExtra_2.3           knitr_1.17              bit_1.1-12              bindr_0.1               Hmisc_4.0-3             stringi_1.1.6           Rcpp_0.12.14           
[64] geneplotter_1.56.0      rpart_4.1-11            acepack_1.4.1

 

 

 

deseq2 lfcshrink • 888 views
ADD COMMENT
1
Entering edit mode
@mikelove
Last seen 1 hour ago
United States

I myself use limma-voom when there are hundreds of samples. The NB GLM is overkill and the iterative steps to fit the parameters are unavoidable.

 

ADD COMMENT

Login before adding your answer.

Traffic: 981 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6