**50**wrote:

My experimental design consists of four groups and two experimental factors (+/+, +/-, -/+ and -/-). In addition, each group has four biological replicates (the first biological replicate of each group was taken at the same time, the second replicates at another and so on) and a clear batch effect between replicates.

My design matrix looks like this:

design <- model.matrix(~batch+factor.1+factor.2)

(Intercept) batch2 batch3 batch4 factor.1 factor.2 +/+.1 1 0 0 0 0 0 -/+.1 1 0 0 0 1 0 +/-.1 1 0 0 0 0 1 -/-.1 1 0 0 0 1 1 +/+.2 1 1 0 0 0 0 -/+.2 1 1 0 0 1 0 +/-.2 1 1 0 0 0 1 ... ... attr(,"assign") [1] 0 1 1 1 2 3 attr(,"contrasts") attr(,"contrasts")$batch [1] "contr.treatment" attr(,"contrasts")$factor.1 [1] "contr.treatment" attr(,"contrasts")$factor.2 [1] "contr.treatment"

The factor.1 column represents presence/absence of factor 1 and factor.2 column represents presence/absence of factor 2.

So far, I have performed statistical significance testing between presence/absence of factor 1 and presence/absence of factor 2 in isolation:

lrt.factor1 <- glmLRT(fit, coef=5) lrt.factor2 <- glmLRT(fit, coef=6)

However, this seems to mean that I am only correcting for x number of statistical significance tests (where x is the number of genes that survive filtering) in each test, but in total I am actually doing 2x tests spread over two separate applications of the glmLRT() function.

Is this really appropriate? It looks like the correction for multiple testing is artificially weak by this approach (since obviously the first glmLRT() doesn't "know" that I am running a second one after) and would thus produce more false positives / not be able to control FDR at the alleged level.

Is it more correct to do all the tests at once and correct for 2x number of tests instead?

lrt <- glmLRT(fit, coef=5:6)

Does the above command test to see if there are genes that are differentially expressed in *any* of the factor 1 (+/-) or factor 2 (+/-) comparisons and correct for 2x tests?

I have looked in the edgeR user manual and this kind of logic seems to be applied in the Arabidopsis case study when testing for differential expression between *any* of the batches (b1 vs b2, b2 vs. b3, b1 vs. b3) to see if there is a batch effect and then they use glmLRT(fit, coef=2:3), but I have only worked with R for a couple of weeks so a bit plagued by self-doubt...