I obtained negative svalues from the lfcShrink function of DESeq2 using the ashr shrinkage estimator.
The values are very small and close to 0: the smallest is -1.110223e-16
, the largest negative is -1.199592e-19
. Should I set them to zero?
Apart from the analysis that I performed (see code below), should svalue >= 0 in any case or is it expected that in some conditions it is < 0 ?
I used glmGamPoi to fit dispersion and then calculated surrogate variables that I added to the model.
Then, re-estimated the dispersion and performed Wald test and shrinkage with ashr.
dds <- DESeqDataSetFromMatrix(countData = mat, colData = coldata, design = ~ Donor + condition)
dds <- DESeq(dds, betaPrior = F, parallel = T, fitType = "glmGamPoi", sfType = "poscount", minmu = 1e-6)
## compute surrogate variables
rle <- rlog(dds, blind = T)
library(sva)
mod <- model.matrix( ~ Donor + condition, data = coldata)
mod0 <- model.matrix( ~ Donor, data = coldata)
svobj <- sva(dat = assay(rle), mod = mod, mod0 = mod0)
covariates <- svobj$sv
for (n.sv in 1:dim(svobj$sv)[2]) {
colData(dds)[[paste0("SV", n.sv)]] <- svobj$sv[, n.sv]
}
base.design.factors <- "Donor + condition"
design.formula <- paste0("~ ",
paste0(paste0("SV", 1:dim(svobj$sv)[2]),
collapse = " + "),
" + ",
base.design.factors)
## update design formula
design(dds) <- as.formula(design.formula)
## re-estimate disperion
dds <- estimateDispersions(dds, fitType = "glmGamPoi")
## perform test
dds <- nbinomWaldTest(dds, useT = T, minmu = 1e-6, betaPrior = F)
## shrink FC and get svalues
lfc.res <- lfcShrink(dds, contrast = c("condition", "a", "b"), type = "ashr", svalue = T)
dim(lfc.res)
# [1] 6071 4
dim(lfc.res[lfc.res$svalue < 0, ])
# [1] 1957 4
sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.1 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
[6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] grid parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] corrplot_0.84 DEGreport_1.26.0 dplyr_1.0.2 tximport_1.18.0 seriation_1.2-9
[6] ggpubr_0.4.0 ComplexHeatmap_2.6.2 sva_3.38.0 genefilter_1.72.0 mgcv_1.8-33
[11] nlme_3.1-151 BiocParallel_1.24.1 limma_3.46.0 DESeq2_1.30.0 SummarizedExperiment_1.20.0
[16] Biobase_2.50.0 MatrixGenerics_1.2.0 matrixStats_0.57.0 GenomicRanges_1.42.0 GenomeInfoDb_1.26.2
[21] IRanges_2.24.1 S4Vectors_0.28.1 BiocGenerics_0.36.0 scales_1.1.1 ggsci_2.9
[26] ggthemes_4.2.0 RColorBrewer_1.1-2 pheatmap_1.0.12 viridis_0.5.1 viridisLite_0.3.0
[31] ggrepel_0.9.0 ggplot2_3.3.3 DT_0.17 data.table_1.13.6 flexdashboard_0.5.2
loaded via a namespace (and not attached):
[1] readxl_1.3.1 backports_1.2.1 circlize_0.4.12 plyr_1.8.6 ConsensusClusterPlus_1.54.0
[6] splines_4.0.3 digest_0.6.27 foreach_1.5.1 htmltools_0.5.0 magrittr_2.0.1
[11] memoise_1.1.0 cluster_2.1.0 openxlsx_4.2.3 annotate_1.68.0 Nozzle.R1_1.1-1
[16] colorspace_2.0-0 blob_1.2.1 haven_2.3.1 xfun_0.20 crayon_1.3.4
[21] RCurl_1.98-1.2 jsonlite_1.7.2 survival_3.2-7 iterators_1.0.13 glue_1.4.2
[26] registry_0.5-1 gtable_0.3.0 zlibbioc_1.36.0 XVector_0.30.0 GetoptLong_1.0.5
[31] DelayedArray_0.16.0 car_3.0-10 shape_1.4.5 abind_1.4-5 DBI_1.1.0
[36] edgeR_3.32.0 rstatix_0.6.0 Rcpp_1.0.5 xtable_1.8-4 lasso2_1.2-21.1
[41] tmvnsim_1.0-2 clue_0.3-58 foreign_0.8-81 bit_4.0.4 htmlwidgets_1.5.3
[46] httr_1.4.2 ellipsis_0.3.1 farver_2.0.3 pkgconfig_2.0.3 reshape_0.8.8
[51] XML_3.99-0.5 locfit_1.5-9.4 labeling_0.4.2 tidyselect_1.1.0 rlang_0.4.10
[56] AnnotationDbi_1.52.0 munsell_0.5.0 cellranger_1.1.0 tools_4.0.3 generics_0.1.0
[61] RSQLite_2.2.2 broom_0.7.3 evaluate_0.14 stringr_1.4.0 ggdendro_0.1.22
[66] yaml_2.2.1 knitr_1.30 bit64_4.0.5 zip_2.1.1 purrr_0.3.4
[71] compiler_4.0.3 rstudioapi_0.13 curl_4.3 png_0.1-7 ggsignif_0.6.0
[76] tibble_3.0.4 geneplotter_1.68.0 stringi_1.5.3 forcats_0.5.0 lattice_0.20-41
[81] Matrix_1.3-2 psych_2.0.12 vctrs_0.3.6 pillar_1.4.7 lifecycle_0.2.0
[86] GlobalOptions_0.1.2 cowplot_1.1.1 bitops_1.0-6 R6_2.5.0 TSP_1.1-10
[91] gridExtra_2.3 rio_0.5.16 codetools_0.2-18 MASS_7.3-53 rjson_0.2.20
[96] withr_2.3.0 mnormt_2.0.2 GenomeInfoDbData_1.2.4 hms_0.5.3 tidyr_1.1.2
[101] rmarkdown_2.6 carData_3.0-4 Cairo_1.5-12.2 logging_0.10-108
If I use "local" dispersion fit after adding the surrogate variable, then I obtain no more the negative values, so I am switching to this setting now.
Thank you, Michael!
Updating to the latest version (v2.2-51) from the GitHub repo using devtool fixed the problem.
For the record, I was using ashr v2.2-47.