Hi,
I’m working with an RNAseq study design similar to the multi-level experiment example from section 9.7 of the limma user guide. I have a 2-level factor called “Genotype” and a factor called “Family,” which is nested within Genotype (3 families of Genotype “F” and 2 families of Genotype “O”). I also have a 2-level factor called “Treatment.” I’m mainly interested in testing the effects of Treatment within the 2 Genotypes, and the Treatment-by-Genotype interaction, which I’ve managed to do following the example and treating Family as a random effect:
> Targets
Individual Genotype Family Treatment Cage
1 1A_02 F F1 c 1A
2 1A_03 F F1 c 1A
3 1A_04 F F1 c 1A
4 1A_05 F F1 c 1A
5 1A_08 F F1 c 1A
6 1A_10 F F1 c 1A
7 1D_01 F F1 g 1D
8 1D_02 F F1 g 1D
9 1D_04 F F1 g 1D
10 1D_06 F F1 g 1D
11 1D_07 F F1 g 1D
12 1D_08 F F1 g 1D
13 3A_01 F F3 c 3A
14 3A_02 F F3 c 3A
15 3A_04 F F3 c 3A
16 3A_06 F F3 c 3A
17 3A_08 F F3 c 3A
18 3A_10 F F3 c 3A
19 3C_01 F F3 g 3C
20 3C_03 F F3 g 3C
21 3C_04 F F3 g 3C
22 3C_08 F F3 g 3C
23 3C_09 F F3 g 3C
24 3C_10 F F3 g 3C
25 4B_04 F F4 c 4B
26 4B_06 F F4 c 4B
27 4B_07 F F4 c 4B
28 4B_08 F F4 c 4B
29 4B_09 F F4 c 4B
30 4B_10 F F4 c 4B
31 4C_03 F F4 c 4C
32 4C_04 F F4 c 4C
33 4C_05 F F4 c 4C
34 4C_07 F F4 c 4C
35 4C_08 F F4 c 4C
36 4C_14 F F4 c 4C
37 4D_01 F F4 g 4D
38 4D_05 F F4 g 4D
39 4D_07 F F4 g 4D
40 4D_08 F F4 g 4D
41 4D_09 F F4 g 4D
42 4D_13 F F4 g 4D
43 4F_01 F F4 g 4F
44 4F_02 F F4 g 4F
45 4F_03 F F4 g 4F
46 4F_04 F F4 g 4F
47 4F_06 F F4 g 4F
48 4F_08 F F4 g 4F
49 5A_01 O O5 c 5A
50 5A_02 O O5 c 5A
51 5A_03 O O5 c 5A
52 5A_04 O O5 c 5A
53 5A_05 O O5 c 5A
54 5A_08 O O5 c 5A
55 5B_01 O O5 c 5B
56 5B_02 O O5 c 5B
57 5B_04 O O5 c 5B
58 5B_06 O O5 c 5B
59 5B_08 O O5 c 5B
60 5B_10 O O5 c 5B
61 5C_01 O O5 g 5C
62 5C_02 O O5 g 5C
63 5C_03 O O5 g 5C
64 5C_05 O O5 g 5C
65 5C_07 O O5 g 5C
66 5C_09 O O5 g 5C
67 5D_01 O O5 g 5D
68 5D_04 O O5 g 5D
69 5D_05 O O5 g 5D
70 5D_07 O O5 g 5D
71 5D_09 O O5 g 5D
72 5D_11 O O5 g 5D
73 6A_01 O O6 c 6A
74 6A_02 O O6 c 6A
75 6A_04 O O6 c 6A
76 6A_05 O O6 c 6A
77 6A_07 O O6 c 6A
78 6A_10 O O6 c 6A
79 6C_01 O O6 g 6C
80 6C_03 O O6 g 6C
81 6C_04 O O6 g 6C
82 6C_06 O O6 g 6C
83 6C_09 O O6 g 6C
84 6C_14 O O6 g 6C
I set up the design as follows:
> GenTrt <- factor(paste(Targets$Genotype,Targets$Treatment,sep="."))
> design <- model.matrix(~0+GenTrt)
> colnames(design) <- levels(GenTrt)
> v <- voom(dge, design, plot=TRUE)
> corfit <- duplicateCorrelation(v,design,block=Targets$Family)
> corfit$consensus
[1] 0.08579732
I set up and tested the desired contrasts for the model as follows:
> fit <- lmFit(v,design,block=Targets$Family,correlation=corfit$consensus)
> cm <- makeContrasts(
+ cVSg_in_F = F.c-F.g,
+ cVSg_in_O = O.c-O.g,
+ GbyTint = (O.c-O.g)-(F.c-F.g),
+ levels=design)
> cm
Contrasts
Levels cVSg_in_F cVSg_in_O GbyTint
F.c 1 0 -1
F.g -1 0 1
O.c 0 1 1
O.g 0 -1 -1
> fit2 <- contrasts.fit(fit, cm)
> fit2 <- eBayes(fit2)
> topTable(fit2, coef="cVSg_in_F")
> topTable(fit2, coef="cVSg_in_O")
> topTable(fit2, coef="GbyTint")
There is, however, an additional level of complexity. Groups of individuals were raised together in cages (note the “Cage” factor in the Targets object). Originally we intended to have 2 replicate cages for every Family-Treatment combination, but only accomplished this for Family F4 from Genotype F and Family O5 from Genotype O.
Given that we do have replicate cages for both treatments in 2 of the families, I was hoping to include “Cage” as a random effect in the model to account for this potentially important source of variation. Does it make sense to try and accomplish this with limma, and if so, how might I go about it? Or, should I just settle for the analysis I’ve done, in which between-cage effects for a given treatment in Family F4 and in Family O5 are ignored?
I’d be grateful to get feedback on this design, and I thank anyone in advance for her/his time and interest.
Best,
Clay
Keep in mind that it's not a perfect model. The
Family
terms will absorb any differences in total expression of samples for each family, but will not accommodate differences in the treatment effect for each family. If the treatment effect in each family varies, the fitted values for each family/treatment combination may be consistently lower or higher than the observed expression values for the corresponding samples, i.e., the residuals will be correlated across all samples in the same combination. UsingCage
induplicateCorrelation
will not properly account for this effect when multiple cages are present for each combination, as the residuals for each cage are treated as being independent (despite the cages being correlated within the same combination). I can't think of an obvious way to get around this, but hopefully, it shouldn't cause too much damage.Hi Aaron,
Thank you very much for the suggestion and comments. Your points are well taken. As it turns out, correlation of residuals across samples in the same combination may in fact be an issue, as we have other data that suggest a family-by-treatment interaction, which is potentially more pronounced for one of the genotypes.
Thanks again for the help,
Clay
I should add that
duplicateCorrelation
should be able to deal with the situation where there's only one cage for each combination, as correlations between residuals within a single cage will be properly accounted for. It's only when there's several cages that we run into the problem described above. Of course, theGenTrt
-based design in the original post does no better in this regard, so we're not losing anything by includingFamily
into the design.