Question

Using DESeq2 for longitudinal analyses of the microbiome

0

Entering edit mode

schulze.a • 0

@schulzea-11754

Last seen 6.9 years ago

Australia

We are soon going to be analysing the gut microbial communities of 80 pregnant women (Subjects). Each of those mothers gave 3 samples, one in each of the 3 trimesters of the pregnancy. ~30 of those mothers have T1D, while the others are healthy. I have been reading about performing time series differential abundance analysis using DESeq2. I would like to answer 2 main questions:

1.- Are there differentially abundant taxa (OTUs) in the different trimesters (time: T) in general (not taking into account if the mother has T1D)? Also specifically between T1 and T2 and between T2 and T3?

2.- Are there differentially abundant taxa between mothers with and without T1D over the three trimesters?

FACTORS:

Subjects = Different mothers (IDs are 1 to 80)
T1D = Has T1D or not (T1D and non-T1D)
Time: The different trimesters (T1, T2 and T3)

QUESTIONS:

1.- For number 1 I think I should do a simple design in which I only take into account the factor time, which is the factor that I want to test for changes. —> ~ Time

Question: Should I control for the factor T1D (diseased?), with a model like —> ~ T1D + Time ? Using the default parameter test= “WaldTest” ?

For answering if there are differences between any pair of trimesters (e.g. T1 vs T2) I could actually use contrasts, right?

2.- Here I think it would be more complicated and I think that I could actually apply something similar to what you (DESeq2 team) wrote in: http://www.bioconductor.org/help/workflows/rnaseqGene/#time-course-experiments

Question: Should the design be something like —> ~ Time + T1D +Time:T1D with a reduced design: ~ Time + T1D ? Using the parameter test=“LTR” ?

I could actually also just do a simple model in which I don’t consider the Time and just test if there are differentially abundant taxa in mothers with T1D compared with non-T1D:

Question: If I do this, should I control for the differences between Subjects? —> ~ Subjects + T1D (test=“WaldTest”) I guess that in this case what would potentially be wrong is that I would be ignoring differences that are caused by the factor Time, right?

Thanks in advance and hopefully you have the time to answer this questions

Cheers

Alex

deseq2 • 2.5k views

ADD COMMENT • link 7.6 years ago • updated 7.5 years ago schulze.a • 0

0

Entering edit mode

Hi Michael,

Thank you very much for your answer. I checked the section you told me "Model matrix not full rank". From what I understand the problem I'm having with my design that lies within "linear combinations" is the final case in which the experiment has grouped individuals. In my case, the two groups are individuals with and without T1D, in which I'm looking for testing the group specific effect of T1D, while controlling for individual effects.

So, from what I understand I need to add a column that will distinguish the individuals "nested" within a group (i.e. T1D or non-T1D). And then I will be able to test if there are any significant differences at each one of the different trimesters (using contrasts) across T1D and non-T1D groups.

Example:

Sample	D	ind	Time	ind.n
1	T1D	1	T1	1
2	T1D	1	T2	1
3	T1D	2	T1	2
4	T1D	2	T2	2
5	nonT1D	3	T1	1
6	nonT1D	3	T2	1
7	nonT1D	4	T1	2
8	nonT1D	4	T2	2

where:

D = Factor Disease with levels (T1D and nonT1D)
ind = 4 individuals
Time = levels (T1 and T2)

If there were more individuals I would have to add in column ind.n 3, 4 … n

The design would be ~T1D + T1D:ind.n + T1D:Time

	(intercept)	DnonT1D	DT1D:ind.n2	DnonT1D:ind.n2	DT1D:T2	DnonT1D:T2
1	1	0	0	0	0	0
2	1	0	0	0	1	0
3	1	0	1	0	0	0
4	1	0	1	0	1	0
5	1	1	0	0	0	0
6	1	1	0	0	0	1
7	1	1	0	1	0	0
8	1	1	0	1	0	1

Here for answering the question: are there any differences across individuals with T1D and nonTD1 in the second trimester (T2) I do a contrast of DT1D.T2 and DnonT1D.T2.

Hopefully I got this correctly.

My question would be:

What if I have more individuals with D= nonT1D than with T1D? Would this be similar to saying that a level is missing from a factor. In this case the newly created factor “ind.n”? And therefore I will have to apply what is in section 3.12.2 (Levels without samples)? Being the solution to call ‘droplevels’ which will remove levels that don’t have samples with T1D.

Thanks again in advance

Alexandra

ADD REPLY • link 7.5 years ago schulze.a • 0

0

Entering edit mode

hi,

Sorry for the delay on answering this post.

"I'm looking for testing the group specific effect of T1D, while controlling for individual effects."

This is one of the comparison you can't make with your experimental design and fixed effects models, and while controlling for individual.

You can however make the comparison: is the T2 vs T1 effect different across T1D and non-T1D. This is represented by a contrast of interaction terms (DT1D:T2 - DnonT1D:T2)

I'd recommend you consult with a local statistician if you have further questions on what comparisons are and aren't possible with the nested individuals and a fixed effects model.

ADD REPLY • link 7.5 years ago Michael Love 42k

0

Entering edit mode

Hi Michael

Thanks so much for your reply :). I'll check with local statisticians as well, I just wanted to have an idea of how this type of analysis is done in DESeq2 before consulting them.

Cheers

ADD REPLY • link 7.5 years ago schulze.a • 0

score 0 · Answer 1 · 2016-10-30

For microbiome data, I can't say if DESeq2 is the best software. The methods for standard RNA-seq DE should work to find differences in mean, but I don't keep up in this literature and don't analyze microbiome data myself. So that's a caveat.

There is a problem which comes up often with users, in that you want to control for different mothers and also compare across groups of mothers. You cannot make such a comparison using a "fixed effects model" because the terms for the individual mothers are linearly dependent with the term for the difference across T1D. You can read about this in the DESeq2 vignette in the section on "Model matrix not full rank". There is a case where you *can* make comparisons, if you are interested in contrasting, for example, the difference between times across T1D and not T1D. This is discussed in the vignette in that section.

For your first goal, I would use a design that takes into account subject and time, that is ~subject + time. You can then use test="LRT" when you call DESeq() with a reduced design of ~subject to find genes which are different at any timepoint, and you can test individual effects by specifying a contrast when you call results(), e.g. contrast=c("time","T3","T1") and test="Wald".

For comparing across T1D and not, you cannot compare in general, but you could look for differences in the difference across time point. Check the vignette section describing how to do this, and then you can add a followup post if you have more questions.