Question

DESeq2 multifactor and paired sample analysis

0

Entering edit mode

amy • 0

@amy-10733

Last seen 7.9 years ago

I am using DESeq2_1.12.0 on R (version 3.3.0) to analyze RNASeq. I have some questions on how to set up the design formula and have looked at a number of multifactor analysis but can't quite find what I am looking for.

I have a total of 24 samples from twelve subjects.

These twelve subjects are divided into two groups (group A and group B).
In group A, samples were taken at day 0 and day 1.
In group B, samples were taken at day 0 and day 5.
The different sampling time between the groups is due to the destructive sampling procedure
Within each group, half of the subject received treatment, while the other half didn't.

Here is how I set up the metadata table, with an added column for paired analysis (paired.subject)

Group	treatment	time	subject	paired.subject
A	control	day0	one	one
A	control	day1	one	one
A	control	day0	two	two
A	control	day1	two	two
A	control	day0	three	three
A	control	day1	three	three
A	treated	day0	four	one
A	treated	day1	four	one
A	treated	day0	five	two
A	treated	day1	five	two
A	treated	day0	six	three
A	treated	day1	six	three
B	control	day0	seven	four
B	control	day5	seven	four
B	control	day0	eight	five
B	control	day5	eight	five
B	control	day0	nine	six
B	control	day5	nine	six
B	treated	day0	ten	four
B	treated	day5	ten	four
B	treated	day0	eleven	five
B	treated	day5	eleven	five
B	treated	day0	twelve	six
B	treated	day5	twelve	six

To figure out the differences between treated vs control within each group between the different time points, I subset the data based on groups and analyzed them separately.

dds <- DESeqDataSetFromMatrix(countData = countdata,
colData = coldata,
design = ~ treatment)

dds$combined <- factor(paste0(dds$treatment, dds$time))

m1 <- model.matrix(~paired.subject + combined, colData(dds))

For example, for group A, difference between treated and control on day 1 was extracted by:

resdds <- results(dds, contrast = list("combined.treated.day1", "combined.control.day1")

My questions are:

Is it possible to compare the differences between (combined.treated.day1-combined.treated.day0) to (combined.control.day1-combined.control.day0)?

I know in the vignette and also in other posts that adding extra samples is typically better for the dispersion estimate DESeq2 design matrix for 6 groups.
However, for this dataset, I don't get any DE genes when I combined groupA and groupB together, which is why I subset them. Any suggestion on what to do if I want to compare between day 1 and day 5?

Thanks for your feedback.

deseq2 • 1.7k views

ADD COMMENT • link 7.9 years ago amy • 0

0

Entering edit mode

amy • 0

@amy-10733

Last seen 7.9 years ago

Hi Michael,

Thank you for the clear explanation.

I tried out the design as you specified. However, I encountered the error: "model matrix is not full rank, one or more variables in the design formula are linear combinations of the others".

I didn't think that the nested.subj was the a linear combination of the others?

Thanks,

Amy

ADD COMMENT • link 7.9 years ago amy • 0

0

Entering edit mode

Can you post colData, so I can see if it matches what I was suggesting?

ADD REPLY • link 7.9 years ago Michael Love 41k

0

Entering edit mode

amy • 0

@amy-10733

Last seen 7.9 years ago

Group	treatment	time	nested.subject
A	control	day0	1
A	control	dayX	1
A	control	day0	2
A	control	dayX	2
A	control	day0	3
A	control	dayX	3
A	treated	day0	1
A	treated	dayX	1
A	treated	day0	2
A	treated	dayX	2
A	treated	day0	3
A	treated	dayX	3
B	control	day0	1
B	control	dayX	1
B	control	day0	2
B	control	dayX	2
B	control	day0	3
B	control	dayX	3
B	treated	day0	1
B	treated	dayX	1
B	treated	day0	2
B	treated	dayX	2
B	treated	day0	3
B	treated	dayX	3

Thank you Mike. Sorry for not providing enough information - just edited this post.

coldata = with(samples, data.frame(shortname = I(shortname), Group = Group, treatment = treatment, time = time, nested.subject = nested.subject))

countdata = counts

dds <- DESeqDataSetFromMatrix(countData = countdata,
colData = coldata,
design = ~ treatment)

dds$treatment <- relevel(dds$treatment, "control")
dds$time <- relevel(dds$time, "day0")

m1 <- model.matrix(~ Group + Group:treatment + Group:treatment:nested.subject + Group:time + Group:treatment:time, colData(dds))

dds = DESeq(dds, full = m1, betaPrior = FALSE, parallel = TRUE)

Let me know if it would be easier to send you the script via email.

What I ended up doing was to remove the nested.subject and really simplify my design matrix to:

m1 <- model.matrix( ~Group + Group:treatment + Group:treatment:Time, colData(dds))

With this new design matrix, I did not observe the model matrix error.

Thanks for your time!

ADD COMMENT • link 7.9 years ago amy • 0

0

Entering edit mode

And what design are you using where you get the "model matrix is not full rank" error? Can you post the code?

ADD REPLY • link 7.9 years ago Michael Love 41k

score 3 · Accepted Answer · 2016-05-19

hi Amy,

The "factor(paste0(" approach is easier when you only want to compare individual pairs of groups (defined by unique combinations of variables), but if you want to compare fold changes, it makes more sense to use a design with interaction terms. You can also extract pairwise comparisons with the interaction design, it's just that typically users find it a bit more challenging to know which terms to extract, and this is why I have the simpler approach described in the vignette. If you have difficulty interpreting the terms in my following suggestion, I'd recommend you try to find a local statistician or person with experience in linear modeling who can help explain the interpretation of these.

You should change the day1 and day5 levels to just dayX, which will represent the post-day0 time point, and is just for convenience in building the design and model matrix. You will nevertheless get separate results for groups A and B.

levels(coldata$time)[2] <- "dayX"
levels(coldata$time)[3] <- "dayX"

Next you will add a column, "nested.subj" which will help us deal with the correlation of samples from the same subject within unique Group x Treatment combinations.

coldata$nested.subj <- factor(rep(rep(1:3, each=2),4))

Finally, you will use a design which controls for the subject differences within Group x Treatment, and let's you pull out the interaction of treatment and time for each group:

~ group + group:treatment + group:treatment:nested.subj + group:time + group:treatment:time

The two terms "groupA:treatmenttreated:timedayX" and "groupB:treatmenttreated:timedayX" are for testing whether the LFC across day is different for treatment vs control, and this is split for group A and group B.