Dear Community,
i have just started edgeR properly today. I have a dataset of 59 spatial replicates of one soil type (plot scale). Analysis yielded that 9 of those location contained very differently assembled, but similar communities (i.e. i have two clusters of highly similar communities). Very few OTUs are truly unique, but about 200 are differently expressed between the two types (with many transient OTUs equally present in both).
Aside from the spatial autocorrelation that might play a role here (i deal with that with lme), i simply want to know first which OTUs were differently expressed in the two community types. I used several analyses (DESeq, ANCOM, some non-parametric t-test and anova variations), but i really want to get edgeR to run for comparison.
I followed tutorials in the vignette and here.
I assigned "groups" to the two sample groups:
groups <- as.factor(c(rep("A",50), rep("B", 9)))
and created an edgeR object:
data.tmm <- DGEList(counts=as.matrix(data), group=groups) data.tmm <- edgeR:::calcNormFactors(data.tmm, method="TMM")
My design matrix looked simply like:
des.mat <- model.matrix(~0+groups) str(des.mat)
Q1: Does this already make sense? I have two columns "groupsA" and "groupsB" with all rows being assigned to one of them.
Then i proceeded along the tutorials by identifying the dispersion within the dataset:
data.tmm <- estimateGLMCommonDisp(data.tmm, design=des.mat) data.tmm <- estimateGLMTrendedDisp(data.tmm, design=des.mat) data.tmm <- estimateGLMTagwiseDisp(data.tmm, design=des.mat) plotBCV(data.tmm)
Which i think resembles other plots i have seen. I proceed by fitting the models:
fit <- glmFit(data.tmm, des.mat, coef=groupsA) lrt <- glmLRT(fit) topTags(lrt)
Coefficient: groupsB logFC logCPM LR PValue FDR Ac_4265_2 -18.36728 5.788017 2428.471 0 0
Q2: Regardless of what coef= takes (groupsA or groupsB, or omitted), it is ignored, the topTags always refer to groupsB. What would be the proper input i want to see it sorted for groupsA?
It seems that i have very high logfolds. Q3: If i want to recreate the plot of page 46 in the vignette:
plotMD(lrt) abline(h=c(-1, 1), col="blue")
what would be the correct interval to visualize the non-differently expressed OTUs? How could i find that (the example plot assumes an interval of -1 to 1, is that true for any dataset, and i simply have to tinker with the axis limits?).
Finally:
summary(decideTests(lrt))
yields:
Error in dim(data) <- dim : dims [product 4452] do not match the length of object [16]
Similar errors have been reported here.
Q$: Possibly, and based on the previous explanations, my design matrix hampers the degree of freedoms available. Is that true? How to fix that?
Sorry for bothering with apparently pretty basic questions. Thank you in advance!
sessionInfo() R version 3.4.3 (2017-11-30) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1 Matrix products: default locale: [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C [5] LC_TIME=German_Germany.1252 attached base packages: [1] stats graphics grDevices utils datasets methods [7] base other attached packages: [1] edgeR_3.16.5 limma_3.30.13 loaded via a namespace (and not attached): [1] compiler_3.4.3 tools_3.4.3 splines_3.4.3 grid_3.4.3 [5] locfit_1.5-9.1 lattice_0.20-35
Thank you very much.
Following Chen, Lun & Smyth, 2016 (a very good walkthrough!), i finally got results with the QL-Framework and a defined contrast:
With the glmFit/glmLRT-approach i get 368 upregulated OTUs in Set A. Judging from the plots, this make sense, however, this is more than double i get with default DESeq, ANCOM or t-test variants on the same dataset.