Question: DESeq2: Is it necessary to include all terms and interactions in LRT tests?
gravatar for chimeric
9 months ago by
chimeric0 wrote:

Hi there, I am wondering if you could comment if I am setting up and interpreting the LRT test correctly.  I am curious if it's necessary to include all terms and interactions, or if I can only include the terms I'm interested in.

For example, I have a time course experiment where I have 2 tissue types and 2 treatments sampled over time.

~ time + tissue_type + treatment

I would like to find genes that are significantly expressed due to the treatment, regardless of tissue type or time point.  

To find these, can I simply reduce to the following:

~ time + tissue_type

On the other hand, if I wanted to identify the genes that respond differently to the treatment based on tissue type, can I just the interaction term of interest, and then reduce it, eg.

Full: ~ time + tissue_type + treatment + tissue_type:treatment
Reduced: ~ time + tissue_type + treatment

Thank you for your advice!

ADD COMMENTlink modified 9 months ago by Gavin Kelly550 • written 9 months ago by chimeric0
gravatar for Gavin Kelly
9 months ago by
Gavin Kelly550
United Kingdom / London / Francis Crick Institute
Gavin Kelly550 wrote:

It looks to me like you've got the correct model specification, your reduced model should include all the nuisance terms that you want to normalise out of your data; the full model should include those, plus the (usually one) term (either main or interaction), that you hypothesize might be having an effect.  So you've correctly answered the two questions you pose with the two different sets of full+reduced models.

As both tissue and treatment are limited to two levels here, you should get equivalent results for if you fit the full models, and do a Wald test on the final coefficient, as that will give the effect size of the treatment, or the difference in effect sizes of treatments between tissues, respectively. This wouldn't hold if you had more than two levels in the factor. To clarify, you'd just supply one model (the one you identify as 'full') for each question, and have no need of the 'reduced' model. Which approach you use is personal preference; some people find one approach easier to understand.

The mandatory advice when I see 'time' used as a term in a model - you may want to check if 'time' is a factor or a number in your data, as the interpretation of the results will differ according to which you're doing (either removing indvidual timepoint-specific effects, or removing a linear trend of log-expression on time, respectively).


ADD COMMENTlink written 9 months ago by Gavin Kelly550
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 420 users visited in the last hour