Prefering Grouping variables or Interaction terms in Experimental Design
1
0
Entering edit mode
katilase • 0
@katilase-13785
Last seen 5.2 years ago

Hi,

I am using DEseq2 to analyse RNAsequencing data in a timecourse setup. I am however unsure about setting up the correct experimental design. In particular, I struggle to decide when to combine factors and use the group to extract results and when to use interaction terms. I have tried to find answers by reading extensively on experimental design, however it only made me less confident that I have the correct experimental design for my analyses. I would be grateful for advice on how to properly setup the experimental design in the following case:

I am interested in differences of a particular gene knockout in comparison to wild type controls at different timepoints after injury. For this I have generated RNA sequencing data in biological triplicates. At the moment, I am combining factors running design = ~ condition such as I would have for the gene knockout: KO_timepoint1, KO_timepoint_2, KO_timepoint_3 and equivalently for the wild type control: wt_timepoint1, wt_timepoint_2, wt_timepoint_3

With this I think I can get all kinds of needed results by using contrasts e.g.

res=results(dds,contrast = c("condition","KO_timepoint1","wt_timepoint1")))

or

res=results(dds,contrast = c("condition","KO_timepoint2","KO_timepoint1")))

and so on. However, I have read in posts and vignette that an experimental design such as ~ genotype+time+genotype:time is used. Sometimes also using LRT. I am now unsure if I need to change my experimental design.

I would be grateful for advice on which strategy is most suitable in my case and how the two different designs would influence results. Thanks a lot in advance!

Best, Katja

 

deseq2 • 1.5k views
ADD COMMENT
0
Entering edit mode
@ryan-c-thompson-5618
Last seen 9 weeks ago
Icahn School of Medicine at Mount Sinai…

The one thing I've learned that's not obvious when you're first learning about design matrices is that two different design matrices can represent the same design. When using the likelihood ratio test and similar tests, combining genotype and time into a single "group" variable is equivalent to using ~genotype + time + genotype:time.  Both designs will give you the exact same (un-shrunken) log-fold changes and exact same p-values for the same tests, so it doesn't matter which one you use as long as you form the correct contrasts in each case. We call these different ways of parametrizing the same design. You can also think of them as the same solution space but with 2 different coordinate systems.

However, when log-fold change shrinkage comes in to the picture, these designs may no longer be equivalent, because the direction of the shrinkage can depend on the choice of parametrization. I don't have much experience with the implementation of log-fold change shrinkage in DESeq2, so you should wait for Mike Love's comments on that subject.

ADD COMMENT
0
Entering edit mode

Now that DESeq() doesn't do shrinkage, it will be equivalent with DESeq() and results().

It's true that shrinkage involves an interdependency between coefficients (as you can see in the LASSO trace plots). The way we are moving with lfcShrink is to focus on individual coefficients at a time with new shrinkage estimators, using the 'coef' argument. We will continue to support the old-DESeq2-style of shrinkage with the 'contrast' argument.

ADD REPLY
0
Entering edit mode

Thanks for the very fast reply and clarifying that both ways are producing the same design. To make sure I got it right, you also say that in this case choosing Wald's or LRT gives the same results?

Regarding the shrinkage, I read in a recent post that it is recommended to use lfcShrink for bulk RNA sequencing data. I am currently using it for my data as well again in combination with contrasts.

LFCres=lfcShrink(dds,contrast = c("condition","KO_timepoint2","KO_timepoint1"), res=res)

Would you not advise to do that? and would I then have to switch to the interaction design since designs are not equivalent anymore?

ADD REPLY
0
Entering edit mode

At the moment you can only use the group design and 'contrast' with lfcShrink, because I haven't implemented it for interaction terms, but in the next release you can use 'coef'. We will have recommendations for which to use in the next release notes.

ADD REPLY
0
Entering edit mode

Sorry but I still have two follow up questions:

1.) I am still a bit confused if I should now apply shrinking of foldchanges in my example or not? If designs are not equivalent anymore when foldchanges are shrunk (e.g. by using betaPrior=TRUE or lfcShrink), do I then need to change my design or should I simply not shrink foldchanges?

2.) In my previous analyses using the "old" DEseq2 versions, I kept the defaults (betaPrior=TRUE and test="Wald") using the same design as described above. Would I then here need to use the LRT- test instead or turn off shrinking of foldchanges?

 

 

ADD REPLY
0
Entering edit mode

Hmm, maybe this is getting too complicated. You can do either design and use DESeq() and results(). This will give you maximum likelihood estimates of LFC and adjusted p-values.

Then, if you use the ~group design, you can additionally shrink the LFC for visualization. This doesn't change the p-values or adjusted p-values. This last step is up to you. Some users find the moderated LFCs useful for visualization.

(2) There is no "need" to do anything in particular. My general advice is to use the latest version of DESeq2 with the defaults. The defaults for a software are basically the suggestions of the authors put into code. This would mean, version 1.16 of DESeq2 (current release as of August 2017) with DESeq() and results(). You can choose to do whatever you like really.

ADD REPLY

Login before adding your answer.

Traffic: 884 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6