Question

how to understand DESeq2 results comparison

1

Entering edit mode

Kai_Qi ▴ 20

@kai_qi-22237

Last seen 23 months ago

Chicago, IL, United States

I am going through the manual of DESeq2, there are some places I still don't understand:

dds <- DESeqDataSetFromMatrix(countData =countdata, colData = coldata, design = ~ cell +dex)

in the design the cell are cell type and dex contains untreat and treat. in the results extraction part, the example in the manual is

results(dds, contrast=c("cell", "N061011", "N61311)

My understanding: this is the results from differential expression between 2 cell types: N061011 and N61311, no matter treat or untreat?

puzzle1 :if I want to compare the differential expresssion between treat and untreat in a certain cell type, say, N061011 what should I do?

With this, I was guided to the page of results function in detail at here But it raised me more questions :

what is the difference between design = ~ genotype + condition + genotype:condition and design = ~ genotype + condition

take 2 genotypes and 2 conditons for example; in the manual it is:

dds <- makeExampleDESeqDataSet(n=100,m=12)
dds$genotype <- factor(rep(rep(c("I","II"),each=3),2))

design(dds) <- ~ genotype + condition + genotype:condition
dds <- DESeq(dds) 
resultsNames(dds)

# the condition effect for genotype I (the main effect)
results(dds, contrast=c("condition","B","A"))

# the condition effect for genotype II
# this is, by definition, the main effect *plus* the interaction term
# (the extra condition effect in genotype II compared to genotype I).
results(dds, list( c("condition_B_vs_A","genotypeII.conditionB") ))

# the interaction term, answering: is the condition effect *different* across genotypes?
results(dds, name="genotypeII.conditionB")

After reading this I still have some questions to make it through:

2.How can I know in my real experiments which is the main effects? 3.why the command for condition effect on genotype I and genotype II is so different?

the last command results(dds, name="genotypeII.conditionB), it only mentions genotypeII and conditionB, in he comments it was explained as the condition effect "differences"across genotypes, I am a little bit confused. So my next question is:

4.how to interpret the meaning of resultsNames(dds)?

Too many questions. Thanks for your time! Any advice on these questions or how I can go through it better will be greatly appreciated.

deseq2 • 1.5k views

ADD COMMENT • link updated 3.7 years ago by swbarnes2 ★ 1.3k • written 3.7 years ago by Kai_Qi ▴ 20

0

Entering edit mode

Michael Love 41k

@mikelove

Last seen 3 hours ago

United States

Unfortunately I don’t have sufficient time to explain linear models and experimental designs on the support site these days, but have to limit myself to specific questions about DESeq2 software.

If after reading the vignette section on interactions, you are still not sure of what the terms mean, I would strongly recommend collaborating with a statistician for your data analysis.

ADD COMMENT • link 3.7 years ago Michael Love 41k

0

Entering edit mode

OK. Thank you for advice.

ADD REPLY • link 3.7 years ago Kai_Qi ▴ 20

score 4 · Accepted Answer · 2020-08-23

My understanding: this is the results from differential expression between 2 cell types: N061011 and N61311, no matter treat or untreat?

Yes. But becuase you put dex in the design as well, the software will understnad that some of the variance within each cell type is caused by the differnt dex conditions. So cell + dex is probably a better design than cell alone if you are asking how the cells differ globally.

puzzle1 :if I want to compare the differential expresssion between treat and untreat in a certain cell type, say, N061011 what should I do?

As it says in the vignette, you could use interactions in your design, but that is more difficult to understand. Instead, make a third column that is celltype and dex condition combined, make that your design, and make a contrast with that group. You could also theoretically subset your data so that it only contains samples of the one cell type, but it's preferable to keep all the samples together do better library normalization and dispersion estimates.

what is the difference between design = ~ genotype + condition + genotype:condition and design = ~ genotype + condition

A pretty big difference, though it depends on what command you use for results.

Let's say you wre looking for genes where the change due to dex was different between the cell lines. You could work out fold changes for treated vs untreated in one cell line, fold changes for treated versus untreated in the second cell line, and subtract the one fold change from the other and look for the genes with the biggest differences. But you'd have no p-values, so it would be hard to assess which changes were significant. This is the case where you use interactions.

2.How can I know in my real experiments which is the main effects?

The "main effect" is the one which is set to the refernce level.

3.why the command for condition effect on genotype I and genotype II is so different?

Becuase one is the reference level, and one is not. Make a new ColData as I said above, instead of doing it like this.

the last command results(dds, name="genotypeII.conditionB), it only mentions genotypeII and conditionB, in he comments it was explained as the condition effect "differences"across genotypes,

This is the scenario I described above, where you want to know which genes react to treatment differently between the treatments (like, in cell line 1, the fold change caused by treatment is big, and in cell line 2, the fold change is small)

I strongly recommend that you take your play data set, put the normlized counts into Excel, and get the averages of the different groups, and look at the different ratios you would want. The ratios that you can calculate by hand should be quite close to what DESeq will give you when you are asking DESeq for the right thing (except you probably can't replicate ~ cell + dex in excel very well)