I am going to analyse micro biome data with the design listed below.
It is an animal trial with pens (SampleGroup) where all the animals in a given pen receive the same treatment. I want to obtain the effect of treatment but need to block out the effect of pen (SampleGroup). How can I achieve this with deseq2 (or other packages) ? So far I have addressed the problem by using pens as the experimental unit. I.e. averaging over all animals in a pen with the use of phyloseq. But are there other and better ways to handle the situation ?
Best regards,
Bjarke
SampleNo Treatment SampleGroup
1 FT0024 T1 G41
2 FT0027 T1 G41
3 FT0026 T1 G41
4 FT0025 T1 G41
5 FT0028 T1 G41
6 FT0033 T1 G44
7 FT0032 T1 G44
8 FT0031 T1 G44
9 FT0029 T1 G44
10 FT0035 T4 G46
11 FT0001 T4 G46
12 FT0037 T4 G46
13 FT0002 T4 G46
14 FT0038 T4 G46
15 FT0034 T4 G46
16 FT0036 T4 G46
17 FT0004 T4 G49
18 FT0006 T4 G49
19 FT0039 T4 G49
20 FT0042 T4 G49
21 FT0040 T4 G49
22 FT0005 T4 G49
23 FT0041 T4 G49
24 FT0003 T4 G49
25 FT0008 T4 G62
26 FT0010 T4 G62
27 FT0044 T4 G62
28 FT0009 T4 G62
29 FT0007 T4 G62
30 FT0046 T4 G62
31 FT0047 T4 G62
32 FT0045 T4 G62
33 FT0043 T4 G62
34 FT0011 T4 G62
35 FT0048 T1 G63
36 FT0013 T1 G63
37 FT0014 T1 G63
38 FT0012 T1 G63
39 FT0015 T1 G67
40 FT0017 T1 G67
41 FT0018 T1 G67
42 FT0016 T1 G67
43 FT0019 T1 G67
44 FT0023 T4 G68
45 FT0021 T4 G68
46 FT0020 T4 G68
47 FT0030 T4 G68
48 FT0022 T4 G68
Thanks for the fast answer Steve. The scenario you describe in the vignette is different than my example. In my pens (types in the vignette) I have only one treatment (condition). When I do as you describe I get this:
Error in DESeqDataSet(se, design = design, ignoreRank) :
the model matrix is not full rank, so the model cannot be fit as specified.
one or more variables or interaction terms in the design formula
are linear combinations of the others and must be removed
The problem is that the experimental design has confounded sample group and treatment. The sample groups associated with treatment T1 do not appear in treatment T4 and vice versa, so you cannot control for sample group and simultaneously estimate the treatment effect.
I recommend in cases like this that people use the duplicateCorrelation() approach within limma + voom, which let's you inform the analysis that certain samples are correlated, in the case that you cannot actually control for these using fixed effect terms due to the confounded design.
Indeed -- yikes, sorry. My brain crisscrossed what my eyes were seeing. Thanks for that.
Many thanks Michael. Really useful ! I will use limma.