Question

Question about statistical model and experiment design in DEseq2

0

Entering edit mode

hdf87ery • 0

@hdf87ery-8330

Last seen 4.5 years ago

Taiwan

Hi, I like to confirm if the statistical model of Deseq2 is appropriate for my experiment design. Here’s my code.

> dds<-DESeqDataSetFromMatrix(countData=countsTable,colData,formula(~ Transfection*Enrichment ))
> colData(dds)$Transfection<-factor(colData(dds)$Transfection,levels=c("gfp","kd"))
> colData(dds)$Enrichment<-factor(colData(dds)$Enrichment,levels=c("total","nascent"))
> dds$group<-factor(paste0(dds$Transfection, dds$Enrichment))
> design(dds)<-~group
> dds<-DESeq(dds,test = "Wald")
> resultsNames(dds)
[1] "Intercept"        "groupgfpnascent"  "groupgfptotal"    "groupkdnascent" "groupkdtotal"

I generated the following results.

> res_KDvsGFPinNascent<-results(dds, contrast=list(c("groupkdnascent"),c("groupgfpnascent")))
> res_KDvsGFPinTotal<-results(dds, contrast=list(c("groupkdtotal"),c("groupgfptotal")))

But I also want to calculate the fold change and the pvalue of (KDvsGFPinNascent)/(KDvsGFPinTotal). I use the following codes.

res_KDnGFPt_GFPnKDt<-results(dds, contrast=list(c("groupkdnascent","groupgfptotal"),c("groupgfpnascent","groupkdtotal")))

I have confirmed that the fold change of “res_KDnGFPt_GFPnKDt” is equal to (“res_KDvsGFPinNascent”)/ (“res_KDvsGFPinTotal”).

But I got confused, because I found that moslty p<0.05 genes of “res_KDnGFPt_GFPnKDt” were not overlapping with p<0.05 genes in "res_KDvsGFPinNascent" and "res_KDvsGFPinTotal".

I am not sure which one is correct. Please enlighten me how do I calculated the pvalue based on (KDvsGFPinNascent)/(KDvsGFPinTotal) in DEseq2 correctly.
Thank you very much.

deseq2 statistical learning • 1.8k views

ADD COMMENT • link 8.6 years ago • updated 8.5 years ago hdf87ery • 0

score 0 · Answer 1 · 2015-10-05

0

Entering edit mode

Michael Love 41k

@mikelove

Last seen 1 day ago

United States

"But I got confused, because I found that moslty p<0.05 genes of “res_KDnGFPt_GFPnKDt” were not overlapping with p<0.05 genes in "res_KDvsGFPinNascent" and "res_KDvsGFPinTotal"."

This is expected. The test of (KDvsGFPinNascent)/(KDvsGFPinTotal) is a test if the fold change is different in Nascent and Total. You can have upregulation in Nascent and upregulation in Total (so DE in both with adjusted p-value < alpha), but if the upregulation is of the same size, then the test if the fold change is different will not be significant.

ADD COMMENT • link 8.6 years ago Michael Love 41k

0

Entering edit mode

And the opposite can be true as well: you can have weak, positive LFC in Nascent and weak, negative LFC in Total, where neither passes your FDR threshold, although the test of whether the LFC is different could pass your FDR threshold.

ADD REPLY • link 8.6 years ago Michael Love 41k

score 0 · Answer 2 · 2015-10-05

0

Entering edit mode

hdf87ery • 0

@hdf87ery-8330

Last seen 4.5 years ago

Taiwan

Hi Michael,

Thank you for your reply.

I still have some questions about statistical model.

The equation (KDvsGFPinNascent)/(KDvsGFPinTotal) is equal to ("groupkdnascent"/"groupgfpnascent")/ ("groupkdtotal"/"groupgfptotal") [E1] , but it can only be divided one time in DEseq2, so my work around is "res_KDnGFPt_GFPnKDt" which becomes ("groupkdnascent" * "groupgfptotal")/("groupgfpnascent" * "groupkdtotal") [E2] .

I know that E1 and E2 are equal for the fold change calculation. But for the statistical model, are they still equal?

Thank you.

ADD COMMENT • link 8.6 years ago hdf87ery • 0

0

Entering edit mode

It's good to know what's going on when you say contrast=list(c(A,B), c(C,D)).

This is simply taking the estimated coefficients from the fitted model (these are beta in our paper), and taking a linear combination of them:

(A+B) - (C+D).

The Wald statistic is just this linear combination of estimated coefficients divided by its standard error (see paper).

A second note: the ~group design, which I've mentioned in the vignette and on the forum, is for simplifying contrasts for users who do not want to test the difference of differences. In your case, if I were you, I would just use the interaction formula, because then the interaction term gives you the test you are after, without having to worry about this contrast=list(...) formulation. If you include the interaction term, you should set betaPrior=FALSE (this will happen automatically in v1.10 released this week).

ADD REPLY • link 8.6 years ago Michael Love 41k

score 0 · Answer 3 · 2015-10-19

0

Entering edit mode

hdf87ery • 0

@hdf87ery-8330

Last seen 4.5 years ago

Taiwan

Hi,

I like to test genes that upregulated in both KDvsGFPinTotal and KDvsGFPinNascent. In other words, I want to test genes that have the same size of difference. According to this thread C: Confused in the usage of formula in DESeq or DESeq2,

"(1) interaction term, (2) treatment effect in W, (3) treatment effect in M

if a gene is DE in W and M, and the same size of difference: (2) and (3)"

My KDvsGFPinTotal would be

KDvsGFPinTotal_MLE<- results(dds, contrast=c("Transfection","kd","gfp"))

and the KDvsGFPinNascent is

KDvsGFPinNascent_MLE<- results(dds, contrast=list(c("Transfection_kd_vs_gfp","Transfectionkd.Enrichmentnascent")))

How do combine the KDvsGFPinTotal and KDvsGFPinNascent for testing the same size of difference?

Thank you.

ADD COMMENT • link 8.5 years ago hdf87ery • 0

0

Entering edit mode

"How do combine the KDvsGFPinTotal and KDvsGFPinNascent for testing the same size of difference?"

You can do this simply with:

results(dds, name="Transfectionkd.Enrichmentnascent")

The interaction term, "Transfectionkd.Enrichmentnascent", is the additional log fold change in Nascent. If this equals 0, then the effect is the same in Total and Nascent. You can test this term using the 'name' argument. I'd recommend you install DESeq2 v1.10 (the latest version), and check the new section in the vignette on interactions terms which includes a diagram. This should help with the understanding.

ADD REPLY • link 8.5 years ago Michael Love 41k

0

Entering edit mode

Hi Michael,

Thank you for your reply.

According to this thread C: Confused in the usage of formula in DESeq or DESeq2
There are
"if a gene is DE in W and M, but a different size of difference: (1), (2), (3)"
"if a gene is DE in W and M, and the same size of difference: (2) and (3)"

So, no matter I want to test different size of difference or the same size of the difference. I just use the same result table of Transfectionkd.Enrichmentnascent, except the range of LFC. Is that right?
However, I found that most of genes with the same size of difference are less significant in my Transfectionkd.Enrichmentnascent result. As you mentioned, it's testing the significant of difference. So I think the less significant would be expected. But how can I make sure that genes with the same size of the difference are significantly to be the same size of the difference. Thank you very much.

ADD REPLY • link 8.5 years ago hdf87ery • 0

0

Entering edit mode

It's best if you ignore that thread, because there are a lot of confusing questions there, which I tried to answer nonetheless, but which might make things even more confused here.

"So, no matter I want to test different size of difference or the same size of the difference. I just use the same result table of Transfectionkd.Enrichmentnascent, except the range of LFC. Is that right?"

The results table tests the null hypothesis that the effect is the same in Total and Nascent. If the effect is very different, you will get a small p-value.

"However, I found that most of genes with the same size of difference are less significant in my Transfectionkd.Enrichmentnascent result. As you mentioned, it's testing the significant of difference. So I think the less significant would be expected. But how can I make sure that genes with the same size of the difference are significantly to be the same size of the difference. "

If you want to test the null hypothesis that the difference across Total and Nascent is large (so the alternative hypothesis is that the difference is small), you need to use the lfcThreshold argument and altHypothesis="lessAbs".

results(dds, name="Transfectionkd.Enrichmentnascent", lfcThreshold=x, altHypothesis="lessAbs")

You need to define an 'x' value that makes sense for your experiment. You can look at an MA-plot for the 'res' object (see vignette).

Regarding the use of lfcThreshold and altHypothesis, you can search the vignette for the description of these arguments.

You'll probably also benefit from reading the section of the DESeq2 paper which covers lfc thresholds:

http://www.genomebiology.com/2014/15/12/550

ADD REPLY • link 8.5 years ago Michael Love 41k