Multifactorial design in DESeq2 - how to answer biological questions
1
0
Entering edit mode
@magdalenaz-9756
Last seen 8.2 years ago

Hi,

This question is inspired by several others which have been posted before, asking about how to think about multifactorial design.  (Multifactorial experimental design in DESeq2DESeq2: with multiple factors and interaction terms won't show all effects )

A few things have been added to DeSeq2 recently, and are not very well documented yet, like the grouping function, and I sense some confusion about how to use it.

I've written this because it would be good to get more general advice about how to think about these problems, and to get biologically relevant results.

 

Here is an example design:

Genotype Treatment3 Treatment5 Replicate
LL 3 5 1
LL 3 5 2
LL 3 ctrl 1
LL 3 ctrl 2
LL

control

5 1
LL control 5 2
LL control ctrl 1
LL control ctrl 2
M 3 5 1
M 3 5 2
M control 5 1
M control 5 2
M 3 ctrl 1
M 3 ctrl 2
M control ctrl 1
M control ctrl 2
OP 3 5 1
OP 3 5 2
OP 3 ctrl 1
OP 3 ctrl 2
OP control 5 1
OP control 5 2
OP control ctrl 1
OP control ctrl 2


level orders:
LL,M,OP
control,3
ctrl, 5
1,2

It would be great to have some general advice on how to work with a dataset like this; for instance droplevels or not? Re-estimate size factors between queries? Should you do different groups and designs for each query, or do a maximum model which contains all resultsNames() you would need for all your desired comparisons? I appreciate these are more  philosophical than practical questions, but it would be really helpful to know, especially for biologists who may not be super-familiar with GLM ,and don't intuitively know what is meant by things like "main effect", "effect for Celltype" or "Intercept"?

For each question (below) I'm interested in, I know some of them are quite "simple", but I really think it would help people understand better what to do with their data, and I've tried to formulate the questions in a "biological", rather than methematical terms - the type of questions my favourite biologists would ask me.

For each one, I'd like to know:

1. What is the best design to choose, and why?  i.e. ~Genotype+Treatment3+Genotype:Treatment3 or ~Genotype+Genotype:Treatment3

2. How to export the result that you want using the results() function?

 

What genes are DE between LL and M?

What genes are DE between LL and  (M and OP)?

What genes are DE between M and (LL and OP)?

What genes are DE between M and the main effect (LL,M,OP)?

What genes are DE between M and OP? (When I do this comparison should I drop LL samples and recalculate or not?)

What genes are DE between M(control,ctrl) and OP(control,ctrl)? (Should I use groups for this?)

What genes are DE between L(control,5) and OP(control,5)?

What genes are DE between control and treatment 3, controlling for the effect of Genotype and Treatment 5?

What genes are DE between control and treatment 3, not controlling for the effect of Genotype and Treatment 5?

What genes are DE between control and treatment 3 in M?

What genes are DE between control and treatment 3 in M, but not in OP?

What genes are DE between control and treatment 3 in M, and in OP, but have the opposite effect (i.e. up in M, and down in OP)?

What genes are DE between control and treatment 5 in M, but not in LL or OP?

Are there genes showing a synergistic effect of combining treatment 3 and treatment 5?

Are there genes showing a synergistic effect of combining treatment 3 and treatment 5, which is different between LL and OP (seen in one but not the others)?

 

 

Lastly, it is possible to do comparisons like:

results(dds, contrast=list("GenotypeM.Treatment3cntl","GenotypeO.Treatment55"))

...but I very much doubt that it gives a meaningful biological result. What to avoid?

 

And what is it actually that you get from asking for results like this (weird):

results(dds, contrast=list(
        c("Treatment3cntl","GenotypeM.Treatment3cntl"),
        c("Treatment3cntl","GenotypeO.Treatment3cntl")))


or this (possibly relevant)?

results(dds, contrast=list(
        c("Treatment3cntl","GenotypeO.Treatment3cntl"),
        c("Treatment33","GenotypeO.Treatment33")))

 

I know this is super-many questions (feel free to just answer a subset), but I feel that there are many people working with this type of datasets (I've got 3 going at the moment), and the current manual doesn't cover it very well, and the results man ?results covers it in mathematical terms, but not really in biological terms.

For example:

# the set Z effect compared to the average of set X and Y
# here we use 'listValues' to multiply the effect sizes for
# set X and set Y by -1/2
results(dds, contrast=list("setZ",c("setX","setY")), listValues=c(1,-1/2))

- Okay, so if I do that, what do I actually get as results? Genes which are differentially expressed between Z and the average of X and Y? or "the set Z effect compared to the average of set X and Y" - what does that mean?

Not that ?results is necessarily the best place to elaborate further, but perhaps this is a good place?

 

Cheers!!!

 

 

 

 

deseq2 multifactorial design • 2.0k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 18 hours ago
United States

hi,

Regarding:

"it would be really helpful to know, especially for biologists who may not be super-familiar with GLM ,and don't intuitively know what is meant by things like "main effect", "effect for Celltype" or "Intercept"?"
...
"the current manual doesn't cover it very well"

I've worked a lot on the vignette documentation in the past year such that I feel that the section on interactions in version 1.10 is now sufficient for any user with a quantitative background to figure out how to approach and extract contrasts using DESeq2.

As a developer of statistical software, I do make time to answer software-related questions on the support site.

However, I can't actually write out all the possible ways to make comparisons for a complex dataset like yours, e.g. the 15 questions you have listed. I wouldn't have time to maintain DESeq2, to do any research myself, or to create new software.

I recommend to answer these questions that you partner with someone with a quantitative background, who has experience with linear modeling, and who would deserve an authorship for the work of implementing your questions. Note that there is nothing special about the way DESeq2 implements the model for an experimental design like this, compared to the linear model functions in base R.

ADD COMMENT

Login before adding your answer.

Traffic: 1073 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6