We are soon going to be analysing the gut microbial communities of 80 pregnant women (Subjects). Each of those mothers gave 3 samples, one in each of the 3 trimesters of the pregnancy. ~30 of those mothers have T1D, while the others are healthy. I have been reading about performing time series differential abundance analysis using DESeq2. I would like to answer 2 main questions:
1.- Are there differentially abundant taxa (OTUs) in the different trimesters (time: T) in general (not taking into account if the mother has T1D)? Also specifically between T1 and T2 and between T2 and T3?
2.- Are there differentially abundant taxa between mothers with and without T1D over the three trimesters?
FACTORS:
Subjects = Different mothers (IDs are 1 to 80)
T1D = Has T1D or not (T1D and non-T1D)
Time: The different trimesters (T1, T2 and T3)
QUESTIONS:
1.- For number 1 I think I should do a simple design in which I only take into account the factor time, which is the factor that I want to test for changes. —> ~ Time
Question: Should I control for the factor T1D (diseased?), with a model like —> ~ T1D + Time ? Using the default parameter test= “WaldTest” ?
For answering if there are differences between any pair of trimesters (e.g. T1 vs T2) I could actually use contrasts, right?
2.- Here I think it would be more complicated and I think that I could actually apply something similar to what you (DESeq2 team) wrote in: http://www.bioconductor.org/help/workflows/rnaseqGene/#time-course-experiments
Question: Should the design be something like —> ~ Time + T1D +Time:T1D with a reduced design: ~ Time + T1D ? Using the parameter test=“LTR” ?
I could actually also just do a simple model in which I don’t consider the Time and just test if there are differentially abundant taxa in mothers with T1D compared with non-T1D:
Question: If I do this, should I control for the differences between Subjects? —> ~ Subjects + T1D (test=“WaldTest”) I guess that in this case what would potentially be wrong is that I would be ignoring differences that are caused by the factor Time, right?
Thanks in advance and hopefully you have the time to answer this questions
Cheers
Alex
Hi Michael,
Thank you very much for your answer. I checked the section you told me "Model matrix not full rank". From what I understand the problem I'm having with my design that lies within "linear combinations" is the final case in which the experiment has grouped individuals. In my case, the two groups are individuals with and without T1D, in which I'm looking for testing the group specific effect of T1D, while controlling for individual effects.
So, from what I understand I need to add a column that will distinguish the individuals "nested" within a group (i.e. T1D or non-T1D). And then I will be able to test if there are any significant differences at each one of the different trimesters (using contrasts) across T1D and non-T1D groups.
Example:
where:
D = Factor Disease with levels (T1D and nonT1D)
ind = 4 individuals
Time = levels (T1 and T2)
If there were more individuals I would have to add in column ind.n 3, 4 … n
The design would be ~T1D + T1D:ind.n + T1D:Time
Here for answering the question: are there any differences across individuals with T1D and nonTD1 in the second trimester (T2) I do a contrast of DT1D.T2 and DnonT1D.T2.
Hopefully I got this correctly.
My question would be:
What if I have more individuals with D= nonT1D than with T1D? Would this be similar to saying that a level is missing from a factor. In this case the newly created factor “ind.n”? And therefore I will have to apply what is in section 3.12.2 (Levels without samples)? Being the solution to call ‘droplevels’ which will remove levels that don’t have samples with T1D.
Thanks again in advance
Alexandra
hi,
Sorry for the delay on answering this post.
"I'm looking for testing the group specific effect of T1D, while controlling for individual effects."
This is one of the comparison you can't make with your experimental design and fixed effects models, and while controlling for individual.
You can however make the comparison: is the T2 vs T1 effect different across T1D and non-T1D. This is represented by a contrast of interaction terms (DT1D:T2 - DnonT1D:T2)
I'd recommend you consult with a local statistician if you have further questions on what comparisons are and aren't possible with the nested individuals and a fixed effects model.
Hi Michael
Thanks so much for your reply :). I'll check with local statisticians as well, I just wanted to have an idea of how this type of analysis is done in DESeq2 before consulting them.
Cheers