Question

edgeR - problems with coefficients and decideTests

0

Entering edit mode

trichter • 0

@trichter-12658

Last seen 5.6 years ago

Dear Community,

i have just started edgeR properly today. I have a dataset of 59 spatial replicates of one soil type (plot scale). Analysis yielded that 9 of those location contained very differently assembled, but similar communities (i.e. i have two clusters of highly similar communities). Very few OTUs are truly unique, but about 200 are differently expressed between the two types (with many transient OTUs equally present in both).

Aside from the spatial autocorrelation that might play a role here (i deal with that with lme), i simply want to know first which OTUs were differently expressed in the two community types. I used several analyses (DESeq, ANCOM, some non-parametric t-test and anova variations), but i really want to get edgeR to run for comparison.

I followed tutorials in the vignette and here.

I assigned "groups" to the two sample groups:

groups <- as.factor(c(rep("A",50), rep("B", 9)))

and created an edgeR object:

data.tmm <- DGEList(counts=as.matrix(data),
                    group=groups)

data.tmm <- edgeR:::calcNormFactors(data.tmm, method="TMM")

My design matrix looked simply like:

des.mat <- model.matrix(~0+groups)
str(des.mat)

Q1: Does this already make sense? I have two columns "groupsA" and "groupsB" with all rows being assigned to one of them.

Then i proceeded along the tutorials by identifying the dispersion within the dataset:

data.tmm <- estimateGLMCommonDisp(data.tmm, design=des.mat)
data.tmm <- estimateGLMTrendedDisp(data.tmm, design=des.mat)
data.tmm <- estimateGLMTagwiseDisp(data.tmm, design=des.mat)

plotBCV(data.tmm)

Which i think resembles other plots i have seen. I proceed by fitting the models:

fit <- glmFit(data.tmm, des.mat, coef=groupsA)
lrt <- glmLRT(fit)
topTags(lrt)

Coefficient:  groupsB
               logFC   logCPM       LR PValue FDR
Ac_4265_2  -18.36728 5.788017 2428.471      0   0

Q2: Regardless of what coef= takes (groupsA or groupsB, or omitted), it is ignored, the topTags always refer to groupsB. What would be the proper input i want to see it sorted for groupsA?

It seems that i have very high logfolds. Q3: If i want to recreate the plot of page 46 in the vignette:

plotMD(lrt)

abline(h=c(-1, 1), col="blue")

what would be the correct interval to visualize the non-differently expressed OTUs? How could i find that (the example plot assumes an interval of -1 to 1, is that true for any dataset, and i simply have to tinker with the axis limits?).

Finally:

summary(decideTests(lrt))

yields:

Error in dim(data) <- dim : 
  dims [product 4452] do not match the length of object [16]

Similar errors have been reported here.

Q$: Possibly, and based on the previous explanations, my design matrix hampers the degree of freedoms available. Is that true? How to fix that?

Sorry for bothering with apparently pretty basic questions. Thank you in advance!

sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252   
[3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C                   
[5] LC_TIME=German_Germany.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
[1] edgeR_3.16.5  limma_3.30.13

loaded via a namespace (and not attached):
[1] compiler_3.4.3  tools_3.4.3     splines_3.4.3   grid_3.4.3     
[5] locfit_1.5-9.1  lattice_0.20-35

edger • 975 views

ADD COMMENT • link updated 6.8 years ago by Aaron Lun ★ 28k • written 6.8 years ago by trichter • 0

score 2 · Answer 1 · 2018-02-05

There are a number of errors with your code:

You are using an ancient version of edgeR, we are on version 3.20.8 now. Updating your installation will solve the problem with decideTests, which has been updated to work with GLMLRT objects.
glmFit doesn't take a coef= argument, as far as I remember.
You have not defined groupsA for us. I assume you meant "groupsA".
You have used an intercept-free model, but your glmLRT call will default to using coef=ncol(glmfit$design). This means that it will test whether the final coefficient is equal to zero. Do you really want to know whether groupB's average expression is equal to zero? I assume you actually want to know whether groupA-groupB is equal to zero, so use makeContrasts.
You should use the QL framework instead of these old estimateGLM*Disp functions, see EDGE-R exact test vs QL F-test.