Question: EdgeR RNAseq -- best way to deal with random effects
gravatar for emb13
21 days ago by
emb130 wrote:


I am working on an RNAseq project in EdgeR and I have a random effect variable that I am not sure how to best deal with since EdgeR doesn't use mixed models. I have three main questions: 1) how have people dealt with random effects in EdgeR, or do you use different programs to accomplish this? 2) How are blocking and batch effects different in EdgeR (the manual suggests using the same additive model for both), and 3) is it possible to block for a variable that is unreplicated (see below)?

Set up: I have a control and three dose groups (low, medium, high) each with five replicates. Within each treatment group, samples came from one of twelve different mesocosms, with each mesocosm representing a single dose. Thus, there are three mesocosms contributing between one and three individuals to a treatment group. For example, the five high dose individuals had three individuals from the S2h mesocosm and one each from the S3h and C1h mesocosms. After reading the EdgeR manual, I tried blocking for mesocosm, but because I don't have replicates for each mesocosm, I got an error message due to my design matrix not having a full rank:

> mfit <- glmQLFit(e2, design4)
Error in glmFit.default(y, design = design, dispersion = dispersion, offset = offset,  : 
  Design matrix not of full rank.  The following coefficients not estimable:
 groupH groupL groupM

Here is the code that I was using to set this up:

group <- factor(c("C","H","L","M","H","C","L","M","C","M","H","M","L","H","C","L","H","C","M","L"))
mesocosm <- factor(c("C1c", "C1h", "S2l", "S2m", "S2h", "S2c", "S2l", "S2m", "S2c", "S2m", "S2h", "C1m", "S2l", "S2h", "S3c", "S3l", "S3h", "S3c", "S3m", "S3l")) 
design4 <- model.matrix(~mesocosm+group)
mfit <- glmQLFit(e2, design4)
mqlf <- glmQLFTest(fit, coef=4:6)

Thank you for your help!

ADD COMMENTlink modified 21 days ago by Aaron Lun21k • written 21 days ago by emb130

Would I be correct to guess that each "mesocosm" in your experiment is a separate physical field station with its own environmental setup? Does any physical station contribute individuals with more than one dose? In other words, do you really have 12 different physical environments or only 3? I would guess that it would usually be possible to apply more than one dose within the same physical environmental.

ADD REPLYlink modified 19 days ago • written 20 days ago by Gordon Smyth35k

Hi Gordon,

Thank you for your reply. All mesocosms are at the same physical site, but the site is broken down into fenced in "pads" which contain the mesocosms (which are individual cattle tanks). Thus, the S1 mesocosms (S1h, S1m, S1l, S1c) are within one fenced pad. The pads are very close to one another and any sort of environmental variation (i.e. shading from sun) is just as likely across mesocosms as it is across pads.

ADD REPLYlink written 15 days ago by emb130
gravatar for Aaron Lun
21 days ago by
Aaron Lun21k
Cambridge, United Kingdom
Aaron Lun21k wrote:

1. edgeR doesn't use GLMMs so there's no way to handle random effects directly. You can either switch to voom with duplicateCorrelation, or you can subset the data so that only one sample is present from each level of the random effect blocking factor. I don't think this is relevant here, though (see my response to 3).

2. I think of the batch effect as the change in expression between batches, while blocking terms are the analytical constructs (i.e., coefficients in the design matrix) that enable estimation of the batch effect. So when you "block on batch", you're adding a blocking term for each batch to your design matrix, which allows estimation of the batch effect as the fitted values for the corresponding GLM coefficients.

3. Not really. As it stands, mesocosm is fully nested within group, so you can't block on mesocosm if you have group in your model. But I would say that you've formulated mesocosm somewhat incorrectly, because it seems to combine things like "C1", "S2" and "S3" (presumably environmental levels?) with the group levels. This suggests two choices for further analysis. The first is to use the additive model like what you're already doing:

actual.mesocosm <- sub("[lhmc]$", "", mesocosm)
design5 <- model.matrix(~group + actual.mesocosm)

... which assumes that the group and actual mesocosm effects are additive. Alternatively, you could just use mesocosm alone:

# Using no-intercept for ease of interpretation.
design6 <- model.matrix(~ 0 + mesocosm)

... which is effectively an interaction model for group/mesocosm. For the latter, you can test for the average effect of each group by comparing the grand averages across mesocosm levels, e.g.:

con <- makeContrasts((C1h + S2h + S3h)/3 - (C1m + S2m + S3m)/3, levels=design6)
ADD COMMENTlink modified 21 days ago • written 21 days ago by Aaron Lun21k

Hi Aaron,

Thank you so much for your reply. It is correct that dose is completely nested in mesocosm (see my comment to Gordon above). Because of this I have decided to try the third option as this best represents the interaction between group and mesocosm which is apparent in my MDS plot. Is setting the contrasts in this way this the same as taking the average expression for each MSTRG for each mesocosm (average C1 individuals, S3 individuals, and S2 individuals for each dose) reducing my sample size to three instead of five?

ADD REPLYlink written 15 days ago by emb130

Given the clarification about your design, I suspect that the interaction model is no longer appropriate. It seems like the different mesocosms represent a level of replication rather than being factors of interest in their own right. To wit; if someone were to repeat the experiment, does it even make sense to talk about re-using the same mesocosms? Would their "S1" mesocosm correspond to your "S1" mesocosm, or are these just arbitrary labels you used to distinguish the different mesocosms? If the latter, there's not much point studying the interaction effect as it's not something that's reproducible. Only the effect of scientific interest (i.e., group) would be something that could be replicated, in which case you should use the additive design. Any variation between mesocosms will be absorbed into the dispersion, which is a good thing, as this accurately represents the variation that you'd expect to see if someone were to replicate your experiment. (Otherwise you'd get findings that were specific to your mesocosms.)

ADD REPLYlink modified 15 days ago • written 15 days ago by Aaron Lun21k

Thank you again for the feedback, Aaron!

ADD REPLYlink written 14 days ago by emb130
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 416 users visited in the last hour