Hi,
I'm running lfcShrink to shrink log2 fold-changes (produced by DESeq2) so I can then use them in a gene set enrichment analysis. The design is (39 samples) :-
> pheno.df Group Rate Quartile 4988-043 DN 37.89901 Q1 4988-016 DN 38.92224 Q1 4988-073 DN 26.35618 Q1 4988-062 DN 101.00866 Q4 4988-048 DN 28.87489 Q1 4988-046 DN 29.64660 Q1 4988-080 DN 39.42701 Q2 4988-027 DN 48.10482 Q2 4988-015 DN 35.50630 Q1 4988-076 DN 29.29898 Q1 4988-025 HC 98.10911 Q3 4988-012 HC 77.29580 Q2 4988-032 HC 99.41596 Q4 4988-058 HC 82.57723 Q3 4988-033 HC 99.63906 Q4 4988-030 HC 115.55412 Q4 4988-079 HC 69.73859 Q2 4988-078 HC 84.11147 Q3 4988-069 HC 91.72246 Q3 4988-010 HC 103.40455 Q4 4988-031 HC 79.19619 Q3 4988-066 HC 100.36594 Q4 4988-007 HC 80.73362 Q3 4988-050 HC 96.55857 Q3 4988-072 HC 75.23332 Q2 4988-026 HC 75.85582 Q2 4988-045 HC 99.00126 Q4 4988-063 HC 84.41203 Q3 4988-024 HC 101.97429 Q4 4988-023 DN 42.63891 Q2 4988-013 DN 29.06251 Q1 4988-051 DN 66.90664 Q2 4988-001 DN 77.62536 Q3 4988-038 DN 56.71115 Q2 4988-006 DN 29.15782 Q1 4988-052 DN 26.86037 Q1 4988-040 DN 57.97926 Q2 4988-037 DN 105.57987 Q4 4988-077 HC 119.42443 Q4
The contrast I'm interested in is DN vs. HC ('Group' variable), using either 'Quartile' as a categorical or 'Rate' as a continuous covariate.
> lapply(pheno.df, class)
$Group
[1] "factor"
$Rate
[1] "numeric"
$Quartile
[1] "factor"
dds.Discr <- DESeqDataSetFromMatrix(countData=raw.counts, colData=pheno.df, design= ~Quartile + Group) ref.Level <- "HC" dds.Discr$Group <- relevel(dds.Discr$Group, ref=ref.Level) dds.Discr <- DESeq(dds.Discr) Discr.coef.idx <- which( resultsNames(dds.Discr)=="Group_DN_vs_HC" ) dds.Cont <- DESeqDataSetFromMatrix(countData=raw.counts, colData=pheno.df, design= ~Rate + Group) dds.Cont$Group <- relevel(dds.Cont$Group, ref=ref.Level) dds.Cont <- DESeq(dds.Cont) Cont.coef.idx <- which( resultsNames(dds.Cont)=="Group_DN_vs_HC" )
This all appears to run without any problems, and both:
lfcShrink(dds.Discr, coef=Discr.coef.idx) lfcShrink(dds.Cont, coef=Cont.coef.idx)
using the default shrinkage method are fine.
However, with
lfcShrink(dds.Discr, coef=Discr.coef.idx, type="apeglm") lfcShrink(dds.Cont, coef=Cont.coef.idx, type="apeglm")
both fail immediately with the error:
# using 'apeglm' for LFC shrinkage
# Error in if (intercept == 0) { : missing value where TRUE/FALSE needed
I've taken a look at the 'apeglm' & 'apeglm:::apeglm.single' functions and it appears to suggest that there is a problem with the design matrices, but I can't figure out what it is! Please could anyone help?
Many thanks.
Cheers,
Richard Coulson.
> model.matrix(design(dds.Discr), colData(dds.Discr)) (Intercept) QuartileQ2 QuartileQ3 QuartileQ4 GroupDN 4988-043 1 0 0 0 1 4988-016 1 0 0 0 1 4988-073 1 0 0 0 1 4988-062 1 0 0 1 1 4988-048 1 0 0 0 1 4988-046 1 0 0 0 1 4988-080 1 1 0 0 1 4988-027 1 1 0 0 1 4988-015 1 0 0 0 1 4988-076 1 0 0 0 1 4988-025 1 0 1 0 0 4988-012 1 1 0 0 0 4988-032 1 0 0 1 0 4988-058 1 0 1 0 0 4988-033 1 0 0 1 0 4988-030 1 0 0 1 0 4988-079 1 1 0 0 0 4988-078 1 0 1 0 0 4988-069 1 0 1 0 0 4988-010 1 0 0 1 0 4988-031 1 0 1 0 0 4988-066 1 0 0 1 0 4988-007 1 0 1 0 0 4988-050 1 0 1 0 0 4988-072 1 1 0 0 0 4988-026 1 1 0 0 0 4988-045 1 0 0 1 0 4988-063 1 0 1 0 0 4988-024 1 0 0 1 0 4988-023 1 1 0 0 1 4988-013 1 0 0 0 1 4988-051 1 1 0 0 1 4988-001 1 0 1 0 1 4988-038 1 1 0 0 1 4988-006 1 0 0 0 1 4988-052 1 0 0 0 1 4988-040 1 1 0 0 1 4988-037 1 0 0 1 1 4988-077 1 0 0 1 0 attr(,"assign") [1] 0 1 1 1 2 attr(,"contrasts") attr(,"contrasts")$Quartile [1] "contr.treatment" attr(,"contrasts")$Group [1] "contr.treatment"
> model.matrix(design(dds.Cont), colData(dds.Cont)) (Intercept) Rate GroupDN 4988-043 1 37.89901 1 4988-016 1 38.92224 1 4988-073 1 26.35618 1 4988-062 1 101.00866 1 4988-048 1 28.87489 1 4988-046 1 29.64660 1 4988-080 1 39.42701 1 4988-027 1 48.10482 1 4988-015 1 35.50630 1 4988-076 1 29.29898 1 4988-025 1 98.10911 0 4988-012 1 77.29580 0 4988-032 1 99.41596 0 4988-058 1 82.57723 0 4988-033 1 99.63906 0 4988-030 1 115.55412 0 4988-079 1 69.73859 0 4988-078 1 84.11147 0 4988-069 1 91.72246 0 4988-010 1 103.40455 0 4988-031 1 79.19619 0 4988-066 1 100.36594 0 4988-007 1 80.73362 0 4988-050 1 96.55857 0 4988-072 1 75.23332 0 4988-026 1 75.85582 0 4988-045 1 99.00126 0 4988-063 1 84.41203 0 4988-024 1 101.97429 0 4988-023 1 42.63891 1 4988-013 1 29.06251 1 4988-051 1 66.90664 1 4988-001 1 77.62536 1 4988-038 1 56.71115 1 4988-006 1 29.15782 1 4988-052 1 26.86037 1 4988-040 1 57.97926 1 4988-037 1 105.57987 1 4988-077 1 119.42443 0 attr(,"assign") [1] 0 1 2 attr(,"contrasts") attr(,"contrasts")$Group [1] "contr.treatment"
> sessionInfo() R version 3.4.4 (2018-03-15) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 16.04.4 LTS Matrix products: default BLAS: /usr/lib/libblas/libblas.so.3.6.0 LAPACK: /usr/lib/lapack/liblapack.so.3.6.0 locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets [8] methods base other attached packages: [1] apeglm_1.0.3 DESeq2_1.18.1 [3] SummarizedExperiment_1.8.1 DelayedArray_0.4.1 [5] matrixStats_0.53.1 Biobase_2.38.0 [7] GenomicRanges_1.30.3 GenomeInfoDb_1.14.0 [9] IRanges_2.12.0 S4Vectors_0.16.0 [11] BiocGenerics_0.24.0 loaded via a namespace (and not attached): [1] Rcpp_0.12.17 locfit_1.5-9.1 lattice_0.20-35 [4] digest_0.6.15 plyr_1.8.4 emdbook_1.3.10 [7] backports_1.1.2 acepack_1.4.1 coda_0.19-1 [10] RSQLite_2.1.1 ggplot2_2.2.1 pillar_1.2.3 [13] zlibbioc_1.24.0 rlang_0.2.1 lazyeval_0.2.1 [16] rstudioapi_0.7 data.table_1.11.4 annotate_1.56.2 [19] blob_1.1.1 rpart_4.1-13 Matrix_1.2-14 [22] bbmle_1.0.20 checkmate_1.8.5 splines_3.4.4 [25] BiocParallel_1.12.0 geneplotter_1.56.0 stringr_1.3.1 [28] foreign_0.8-70 htmlwidgets_1.2 RCurl_1.95-4.10 [31] bit_1.1-14 munsell_0.5.0 numDeriv_2016.8-1 [34] compiler_3.4.4 base64enc_0.1-3 htmltools_0.3.6 [37] tcltk_3.4.4 nnet_7.3-12 tibble_1.4.2 [40] gridExtra_2.3 htmlTable_1.12 GenomeInfoDbData_1.0.0 [43] Hmisc_4.1-1 XML_3.98-1.11 MASS_7.3-50 [46] bitops_1.0-6 grid_3.4.4 xtable_1.8-2 [49] gtable_0.2.0 DBI_1.0.0 magrittr_1.5 [52] scales_0.5.0 stringi_1.2.3 XVector_0.18.0 [55] genefilter_1.60.0 latticeExtra_0.6-28 Formula_1.2-3 [58] RColorBrewer_1.1-2 tools_3.4.4 bit64_0.9-7 [61] survival_2.42-4 AnnotationDbi_1.40.0 colorspace_1.3-2 [64] cluster_2.0.7-1 memoise_1.1.0 knitr_1.20
Many thanks Mike.
I tried setting dds$Rate to dds$Rate - dds$Rate[1], and this failed. However with dds$Rate[11] (the first row in the design matrix with GroupDN==0) it works. Though this produced:
Is this an issue and could it have something to do with the message I got with DESeq, please:
Longer term, I will update to the latest version of R & re-install Bioconductor in the next few days.
Cheers,
Richard.
As, I see, yes I see why my proposal didn't work.
The warnings are rows that had a hard time converging, and I bet that its similar to the ones that were difficult for DESeq2. Typically, these rows tend to be genes with mostly zeros and only a few non-zero counts (or I may be wrong, but this is my experience). One option is to remove these rows with a conservative filter:
...before running DESeq(dds).