8 months ago by
Cambridge, United Kingdom
You don't mention the package you're using - edgeR?
voom with limma? DESeq2? The answer to your question is largely agnostic to the exact implementation, but still, it's good practice to say exactly what you're doing.
Anyway, there are two levels to your question. The first is how you parametrize the experimental design for variance modelling. Presumably, the subgroups (a-e, a1-e1) in each "main" group (A/B) have some biological differences. This suggests that you should model each subgroup as a separate entity; otherwise, differences between subgroups will inflate the variances if you treat samples from different subgroups as "replicates". With edgeR or limma, I would use a one-way layout with one coefficient per subgroup:
subgroups <- rep(c("a", "a1", "b", "b1", "c", "c1", "d", "d1", "e", "e1"),
each=2) # Assuming two replicates per group
subgroups <- factor(subgroups)
design <- model.matrix(~0 + subgroups)
colnames(design) <- levels(subgroups)
Now, the other problem is how you should perform the comparisons between your main groups. The simplest approach is to compute the average expression across all subgroups within each main group, and compare them between groups:
con <- makeContrasts((a+b+c+d+e)/5 - (a1+b1+c1+d1+e1)/5, levels=design)
This does exactly what it says, i.e., computes the grand mean of the average expression across subgroups, and tests for changes in the averages between main groups. However, this can be misleading, e.g., if strong upregulation is only present in one subgroup "e", this will drag up the average expression for main group A and make it seem like that there is upregulation in A compared to B. If you want to look for genes that are consistently changing in the same direction for all pairs of subgroups, you can do:
con <- makeContrasts(a - a1, b - b1, c - c1, d - d1, e - e1, levels=design)
This will perform an ANODEV, testing for whether there are any differences in the corresponding pairs of subgroups from each main group. You can then filter for genes with low ANODEV p-values where the signs of the log-fold changes for all pairwise comparisons are the same, e.g., for positive log-fold changes, you would have
a > a1,
b > b1,
c > c1,
d > d1 and
e > e1. This may be easier to interpret as the change in expression between the main groups is consistent across the paired subgroups. (In theory, you could be even more stringent and require significant DE in each pairwise comparison, but this is probably too conservative.)
If you want exact implementation details, see Section 3.2.3 of the edgeR user's guide for some code.