Question

design microarrays limma

0

Entering edit mode

roser.navarro • 0

@rosernavarro-9660

Last seen 8.3 years ago

Dear all,

We are using limma to perform differential expression analysis among 3 different groups.

The problem (or not) is that the design is unbalanced. The sample size per group is very different (7, 9 and 39 samples). So, we decided to do a random selection from the big group to balance the design and to avoid different variances (assume homoscedasticity).

We don't know if it is better to balance the design or not (in terms of statistical power).

Thanks in advance :-)

limma statistical power design variability microarray • 1.7k views

ADD COMMENT • link updated 8.3 years ago by Aaron Lun ★ 29k • written 8.3 years ago by roser.navarro • 0

score 2 · Answer 1 · 2017-10-11

2

Entering edit mode

Aaron Lun ★ 29k

@alun

Last seen 1 hour ago

The city by the bay

There is no reason to subset your big group. It won't help with avoiding differences in the variances; if the largest group has a larger/smaller variance, that will still be the case in any random subset of samples. You could argue that subsetting ensures that the overall variance estimate will not be dominated by the largest group, but this has no obvious benefits for type I error control when the equal-variance assumption is already violated. The only predictable effect is to artificially reduce the precision of your parameter estimates for the largest group, and for the overall variance, which will reduce your detection power.

ADD COMMENT • link 8.3 years ago Aaron Lun ★ 29k

0

Entering edit mode

Thanks a lot for your help!

One more question... If I test the equal-variance hypothesis and we accept that they are equal, is my assumption, related to subsetting, right to avoid domination by the largest group?

We have to compare the variance within each group (considering all the genes), haven't we?

ADD REPLY • link 8.3 years ago roser.navarro • 0

0

Entering edit mode

If you accept that the variances are equal across groups, it doesn't matter which group dominates (i.e., contributes more residual d.f.) as all groups will be contributing to the estimation of the common variance. So subsetting is unnecessary.

Also, I don't understand what you mean by comparing the variance within each group. You could compute an estimate of the variance for each group, but limma doesn't do that, and it doesn't compare the variance between groups either.

Check out Gordon's comments on the matter of unequal variances : A: Correct assumptions of using limma moderated t-test

ADD REPLY • link 8.3 years ago Aaron Lun ★ 29k

0

Entering edit mode

Aaron is right -- balance in the experimental design and homoscedasticity are different things. Throwing away data will just reduce statistical power for no reason.

limma is quite capable of figuring out what the sample sizes are in each group and properly accounting for them. You don't have to supervise limma by artificially balancing up the design yourself.

In short, the unequal group sizes are not a problem.

ADD REPLY • link 8.3 years ago Gordon Smyth 53k