Question

Is there any model in EdgeR that is close to nested ANOVA?

1

Entering edit mode

deeptiptl74 ▴ 10

@deeptiptl74-13880

Last seen 5.4 years ago

Hi,

I would appreciate some help to find out the correct design matrix for my experiment in EdgeR. The experiment is about thermal acclimation in fish, where I have two groups 1.Control 2.Treatment. From each group I have 2 tanks, so 4 tanks in total, 2 from control and 2 from treatment. I sampled fish tissue from 6 fish from each tank. I am interested in genes which are differentially expressed between the control and treatment. However I have 2 tanks from each group but I am not interested in the difference between the tanks of same group. At the same time I have to take into account the tank effect which I could do with a nested ANOVA. So my question here is Is there any model close to nested ANOVA that fits to my experiment in EdgeR? I looked for different possibility in the manual but most of them are crossed models which does not fit to my experiment. Also I am a newbie here so I might have overlooked on the possibilities. Any help will be greatly appreciated. Thank you in advance.

edger rnaseq • 1.9k views

ADD COMMENT • link updated 5.4 years ago by Gordon Smyth 53k • written 5.4 years ago by deeptiptl74 ▴ 10

score 2 · Answer 1 · 2020-08-06

You can't fit nested models with a blocking factor, because the tanks will be confounded with treatment. In other words, if you have fish from tank 1 and they were only treated one way, then there is no way to say if any differences are due to tank-specific differences or if it's due to the treatment. And edgeR won't let you do it, because the design matrix won't be full rank.

> trt <- gl(2,12)
> tank <- factor(rep(1:4, each = 6))
> library(limma)
> is.fullrank(model.matrix(~trt+tank))
[1] FALSE

On the other hand you could fit a linear mixed model using tank as a random effect. One option would be to use limma-voom and then estimate within-tank correlations using duplicateCorrelation. An alternative would be to use the variancePartition package which would allow you to fit random intercepts for each tank using lme4. I'm not sophisticated enough to say how the two compare, or if one should be preferred over the other. Perhaps Gordon Smyth or Aaron Lun will be along in a bit to say which is the better option.

score 1 · Answer 2 · 2020-08-06

1

Entering edit mode

Gordon Smyth 53k

@gordon-smyth

Last seen 5 hours ago

WEHI, Melbourne, Australia

You only have two possibilities. If the tank effect is negligible then you can ignore it and your experiment becomes a two-group comparision with n=12 fish in each group.

If the tank effect can't be ignored, then I suggest the following approach. It is somewhat ad hoc but also close to optimal.

Sum the genewise counts for all the fish in each tank (for example by using the edgeR function sumTechReps).
You now have a two-group comparison with n=2 in each group.

It is not possible to do much better than this with a nested anova or random effects model because there is no information about the treatment to recover from the tank to tank differences. A nested anova would be equivalent to averaging the values in each tank anyway.

ADD COMMENT • link 5.4 years ago Gordon Smyth 53k

1

Entering edit mode

Hi Gordon,

I'm not sure I understand. Will you explain how summing the counts within each tank is any different from just ignoring the tank effect?

ADD REPLY • link 5.4 years ago James W. MacDonald 68k

1

Entering edit mode

Summing the counts is, in effect, averaging the fish in each tank. You then get a separate value for each tank. The tank-to-tank variation then becomes the error term. The statistical significance of the treatment effect is judged relative to the tank-to-tank variation, so the latter is not being ignored.

Nested anova does the same thing in effect. Here's an example with normally distributed data. First do an nested anova with the tank effect as random:

> set.seed(229240357)
> y <- rnorm(24)
> Tank <- gl(4,6)
> Treatment <- gl(2,12)
> summary(aov(y ~ Treatment + Error(Tank)))

Error: Tank
          Df Sum Sq Mean Sq F value Pr(>F)  
Treatment  1 2.7387  2.7387   12.52 0.0714 .
Residuals  2 0.4376  0.2188                 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Error: Within
          Df Sum Sq Mean Sq F value Pr(>F)
Residuals 20  16.71  0.8356

Now, average the values in each tank to get just four values instead of 24. Doing an anova on just the four tank averages gives the same p-value:

> Tank.Means <- colMeans(matrix(y,6,4))
> Tank.Treatment <- gl(2,2)
> summary(aov(Tank.Means ~ Tank.Treatment))
               Df Sum Sq Mean Sq F value Pr(>F)  
Tank.Treatment  1 0.4564  0.4564   12.52 0.0714 .
Residuals       2 0.0729  0.0365                 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

This averaging approach will work for any balanced repeated measures design.

ADD REPLY • link 5.4 years ago Gordon Smyth 53k

1

Entering edit mode

Ah, I get it. Basically set the within-tank variability to zero by summing. Thanks.

ADD REPLY • link 5.4 years ago James W. MacDonald 68k

0

Entering edit mode

Thank you very much Gordon and James for the valuable comments. It helped a lot. I understand it now :)

ADD REPLY • link 5.4 years ago deeptiptl74 ▴ 10