Question: design microarrays limma
gravatar for roser.navarro
5 weeks ago by
roser.navarro0 wrote:

Dear all,

We are using limma to perform differential expression analysis among 3 different groups.

The problem (or not) is that the design is unbalanced. The sample size per group is very different (7, 9 and 39 samples). So, we decided to do a random selection from the big group to balance the design and to avoid different variances (assume homoscedasticity).

We don't know if it is better to balance the design or not (in terms of statistical power). 

Thanks in advance :-)

ADD COMMENTlink modified 5 weeks ago by Aaron Lun17k • written 5 weeks ago by roser.navarro0
gravatar for Aaron Lun
5 weeks ago by
Aaron Lun17k
Cambridge, United Kingdom
Aaron Lun17k wrote:

There is no reason to subset your big group. It won't help with avoiding differences in the variances; if the largest group has a larger/smaller variance, that will still be the case in any random subset of samples. You could argue that subsetting ensures that the overall variance estimate will not be dominated by the largest group, but this has no obvious benefits for type I error control when the equal-variance assumption is already violated. The only predictable effect is to artificially reduce the precision of your parameter estimates for the largest group, and for the overall variance, which will reduce your detection power.

ADD COMMENTlink written 5 weeks ago by Aaron Lun17k

Thanks a lot for your help!

One more question... If I test the equal-variance hypothesis and we accept that they are equal, is my assumption, related to subsetting, right to avoid domination by the largest group? 

We have to compare the variance within each group (considering all the genes), haven't we? 

ADD REPLYlink written 5 weeks ago by roser.navarro0

If you accept that the variances are equal across groups, it doesn't matter which group dominates (i.e., contributes more residual d.f.) as all groups will be contributing to the estimation of the common variance. So subsetting is unnecessary.

Also, I don't understand what you mean by comparing the variance within each group. You could compute an estimate of the variance for each group, but limma doesn't do that, and it doesn't compare the variance between groups either.

Check out Gordon's comments on the matter of unequal variances : A: Correct assumptions of using limma moderated t-test

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by Aaron Lun17k

Aaron is right -- balance in the experimental design and homoscedasticity are different things. Throwing away data will just reduce statistical power for no reason.

limma is quite capable of figuring out what the sample sizes are in each group and properly accounting for them. You don't have to supervise limma by artificially balancing up the design yourself.

In short, the unequal group sizes are not a problem.

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by Gordon Smyth32k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 301 users visited in the last hour