Question

Two factor design including interaction effect with DESeq2 in R

0

Entering edit mode

Sam • 0

@335dcf37

Last seen 5 months ago

United States

Hello!

I am working on a project where I have a fully-crossed two-factor design, with two levels per factor and four biological replicates per treatment combination (16 biological replicates total across 4 total treatments). The two factors are rearing temperature of the samples (30C or 16C exposure for most of development) and testing temperature of the samples (30C or 16C for just the 6 hours leading up to RNA sampling). I am interested in understanding what genes are differentially expressed as a result of differences in rearing temperature, which genes are differentially expressed as a result of testing temperature, and which genes have expression patterns that depend on the interaction of those two factors (i.e., for which genes does the effect of testing temperature depend on prior rearing temperature or vice versa). I have never run a two factor differential expression analysis, so I have a few questions about how to set up this design and how to interpret the results tables. 1) when setting this design up in the creation of the DESeq object, does the order of the two treatments matter? I care equally about each of these two factors so I don't want to bias my results in a way that favors one of them. So for example, right now my code is:

dds <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable, directory = directory, design = ~ RearingTemp + TestingTemp + RearingTemp:TestingTemp)

Then I run this dds object in the DESeq function (data <- DESeq(dds)). Then running resultsNames(data) gets me the names of "Intercept", "RearingTemp_30_vs_16", "TestingTemp_30_vs_16", and "RearingTemp30.TestingTemp30"

2) To pull out the relevant data for the effect of each of my two factors (rearing temperature and testing temperature), do I then just run the results() function with each of those names? So, specifically, res_rearingtemp <- results(data,name="RearingTemp_30_vs_16") and res_testingtemp <- results(data,name="TestingTemp_30_vs_16"). So, is the res_rearingtemp table just showing how each gene is affected just by rearing temperature, then? Does this effect account for differences in testing temperature in any way?

3) For the interaction term, I pulled out the results table using the code res_interaction <- results(data,name="RearingTemp30.TestingTemp30"). Is this the correct way to determine for which genes is the effect of rearing temperature depending on the testing temperature and vice versa? How do I interpret positive vs negative log2FC values for this interaction results table? What does it mean if a gene shows up as DE in both the interaction results table and either the rearing temperature or testing temperature results tables?

I know this question has probably been asked before (see post from ~6 years ago at DESeq2 interaction term in two-factor design, which contrasts?), but I'm having trouble applying the responses for similar posts to my specific experimental design and motivating questions.

Thank you very much for any advice you have.

DESeq2 RNASeqData deseq2 ExperimentalDesign • 778 views

ADD COMMENT • link updated 6 months ago by swbarnes2 ★ 1.3k • written 6 months ago by Sam • 0

score 1 · Answer 1 · 2023-10-23

1

Entering edit mode

swbarnes2 ★ 1.3k

@swbarnes2-14086

Last seen 11 hours ago

San Diego

So, is the res_rearingtemp table just showing how each gene is affected just by rearing temperature, then? Does this effect account for differences in testing temperature in any way?

From the vignette:

The key point to remember about designs with interaction terms is that, unlike for a design ~genotype + condition, where the condition effect represents the overall effect controlling for differences due to genotype, by adding genotype:condition, the main condition effect only represents the effect of condition for the reference level of genotype

That will return the result for the differences caused by rearing temp in the reference level of testing temp only

Which is useful, but not necessarily what you want.

How do I interpret positive vs negative log2FC values for this interaction results table?

Wen using the interaction resultsnName, think of the logFC column as really being the ratio of ratios, or the difference in log fold changes. Like, at rearing temp 30, changing the testing temp causes a two fold change in a gene, but at rearing temp 16, changing the testing temp causes a 4 fold change.

ADD COMMENT • link 6 months ago swbarnes2 ★ 1.3k

0

Entering edit mode

Thank you very much for your response. So in a design of ~ RearingTemp + TestingTemp + RearingTemp:TestingTemp, is there a way to pull out a results table that gives information about how gene expression changes in response to rearing temperature (while controlling for the effects of testing temperature) and in response to testing temperature (while controlling for the effects of rearing temperature)? Or, if I am interested in getting a list of genes that are affected by rearing temperature and a list that are affected by testing temperature, do I need to rerun the design to be ~ RearingTemp + TestingTemp (i.e., remove the interaction)?

ADD REPLY • link 6 months ago Sam • 0

1

Entering edit mode

Read the vignette. Read the part I quoted.

You can use just ~rearing to compare all the rearing 30's to all the rearing temp 16's. There might be some big variability caused by testing temp, but with this design, the algorithm will just think there's a lot of variability.

~rearing + testing you can still test all the rearing 30's to all the rearing 16's, but the software will understand that some of the variability comes from the grouping by testing temp, and it will attempt to model that. This same method is used for correcting for batch. You just add + batch to your grouping of interest, and the software will try to model away difference caused by batch. So you should get stronger results, because it's correcting for the different testing temps. This is probably better than ~rearing alone if your question of the moment is just about changes caused by rearing temp.

You could also do like the vignette says, and make a new column of rearing_testing, and use that to compare just the two testing temps that were reared at 30, for example, but you might lose power by only looking at only half your samples at a time.

The best use for interactions, IMO, is as I described above, to get the differences of log fold changes between comparisons.

ADD REPLY • link 6 months ago swbarnes2 ★ 1.3k