I am trying to do a classic test for significant differential gene expression in edgeR 3.34.0, as implemented as a bioconda package that I am invoking in a versioned environment. I have two different versions of the R script that I am running in batch mode. The first version, which lacks "robust=TRUE" as the third argument of estimateDisp, works perfectly well. The second version, which I am appending below and which has "robust=TRUE" as the third argument of estimateDisp, fails with the following error message:
> geno_comp <- estimateDisp(geno_comp,geno_design,robust=TRUE)
Error in fitFDistRobustly(var, df1 = df, covariate = covariate, winsor.tail.p = winsor.tail.p) :
statmod package required but is not installed
Calls: estimateDisp ... estimateDisp.default -> squeezeVar -> fitFDistRobustly
Execution halted
What am I doing wrong?
Here is the full R script that doesn't quite work, because of that one "robust=TRUE" in an otherwise functional batch file:
# Load edgeR:
library(edgeR)
# Import data:
setwd("/ocean/projects/mcb190015p/schwarze/Acey/2021.07.24/edgeR_09")
edgeR_input_data <- read.delim("/ocean/projects/mcb190015p/schwarze/Acey/2021.07.24/annots/Acey_salmon_data_2021.07.28.01.reads.subset_03.min10.tsv.txt",row.names="Gene")
genotypes <- factor(c("12dpi_noDEX","12dpi_noDEX","12dpi_noDEX","12dpi_DEX","12dpi_DEX","12dpi_DEX","intestines_19dpi_noDEX","intestines_19dpi_noDEX","intestines_19dpi_noDEX","intestines_19dpi_DEX","intestines_19dpi_DEX","intestines_19dpi_DEX","non.intestines_19dpi_noDEX","non.intestines_19dpi_noDEX","non.intestines_19dpi_noDEX","non.intestines_19dpi_DEX","non.intestines_19dpi_DEX","non.intestines_19dpi_DEX"))
geno_comp <- DGEList(counts=edgeR_input_data,group=genotypes)
# Check initial results:
geno_comp$samples
# Filter out weakly expressed genes using the following commands:
keep <- filterByExpr(geno_comp)
geno_comp <- geno_comp[keep, , keep.lib.sizes=FALSE]
geno_comp$samples
# Set up statistical data set:
geno_comp <- calcNormFactors(geno_comp)
geno_design <- model.matrix(~genotypes)
geno_comp <- estimateDisp(geno_comp,geno_design,robust=TRUE)
# 12dpi_noDEX vs. 12dpi_DEX:
comp_12dpi_noDEX_vs_12dpi_DEX <- exactTest(geno_comp, pair=c("12dpi_DEX","12dpi_noDEX"))
deg_12dpi_noDEX_vs_12dpi_DEX <- topTags(comp_12dpi_noDEX_vs_12dpi_DEX, n=Inf, p.value=1)
write.table(as.data.frame(deg_12dpi_noDEX_vs_12dpi_DEX), file="12dpi_noDEX.vs.12dpi_DEX_edgeR_exactTest_all.data_2021.08.03.01.orig.tsv.txt", sep="\t", col.names=NA)
# intestines_19dpi_noDEX vs. non.intestines_19dpi_noDEX:
comp_intestines_19dpi_noDEX_vs_non.intestines_19dpi_noDEX <- exactTest(geno_comp, pair=c("non.intestines_19dpi_noDEX","intestines_19dpi_noDEX"))
deg_intestines_19dpi_noDEX_vs_non.intestines_19dpi_noDEX <- topTags(comp_intestines_19dpi_noDEX_vs_non.intestines_19dpi_noDEX, n=Inf, p.value=1)
write.table(as.data.frame(deg_intestines_19dpi_noDEX_vs_non.intestines_19dpi_noDEX), file="intestines_19dpi_noDEX.vs.non.intestines_19dpi_noDEX_edgeR_exactTest_all.data_2021.08.03.01.orig.tsv.txt", sep="\t", col.names=NA)
# intestines_19dpi_noDEX vs. intestines_19dpi_DEX:
comp_intestines_19dpi_noDEX_vs_intestines_19dpi_DEX <- exactTest(geno_comp, pair=c("intestines_19dpi_DEX","intestines_19dpi_noDEX"))
deg_intestines_19dpi_noDEX_vs_intestines_19dpi_DEX <- topTags(comp_intestines_19dpi_noDEX_vs_intestines_19dpi_DEX, n=Inf, p.value=1)
write.table(as.data.frame(deg_intestines_19dpi_noDEX_vs_intestines_19dpi_DEX), file="intestines_19dpi_noDEX.vs.intestines_19dpi_DEX_edgeR_exactTest_all.data_2021.08.03.01.orig.tsv.txt", sep="\t", col.names=NA)
# non.intestines_19dpi_noDEX vs. non.intestines_19dpi_DEX:
comp_non.intestines_19dpi_noDEX_vs_non.intestines_19dpi_DEX <- exactTest(geno_comp, pair=c("non.intestines_19dpi_DEX","non.intestines_19dpi_noDEX"))
deg_non.intestines_19dpi_noDEX_vs_non.intestines_19dpi_DEX <- topTags(comp_non.intestines_19dpi_noDEX_vs_non.intestines_19dpi_DEX, n=Inf, p.value=1)
write.table(as.data.frame(deg_non.intestines_19dpi_noDEX_vs_non.intestines_19dpi_DEX), file="non.intestines_19dpi_noDEX.vs.non.intestines_19dpi_DEX_edgeR_exactTest_all.data_2021.08.03.01.orig.tsv.txt", sep="\t", col.names=NA)
# Sum up:
sessionInfo()
# end it:
q()
The sessionInfo() output text for the non-failing version of this script (i.e., the one lacking "robust=TRUE" as the third argument to estimateDisp) is as follows:
> sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: CentOS Linux 8 (Core)
Matrix products: default
BLAS/LAPACK: /ocean/projects/mcb190015p/schwarze/anaconda3/envs/edger_3.34.0/lib/libopenblasp-r0.3.17.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] edgeR_3.34.0 limma_3.48.0
loaded via a namespace (and not attached):
[1] compiler_4.1.0 Rcpp_1.0.7 splines_4.1.0 grid_4.1.0
[5] locfit_1.5-9.4 lattice_0.20-44
> # end it:
> q()
> proc.time()
user system elapsed
12.502 0.115 12.670
Yes, installing statmod does solve it! However, for the benefit of anybody (like me) who is still figuring out R ... here's how I got statmod installed within the bioconda environment that I am running.
And, having done that, here's the line I added to my R batch job script to get the "robust=TRUE" argument to work:
The standard recommended procedure to install Bioconductor packages and their dependencies is
You can use the
dependencies
argument ofinstall()
to control whether you get just the specified package, or the package and its dependencies, or everything including optional suggested packages.