Question

differentail expression analysis with interaction terms

0

Entering edit mode

Assa Yeroslaviz ★ 1.5k

@assa-yeroslaviz-1597

Last seen 4 months ago

Germany

Hi, I'm working with a data set with three time points (1d, 5d, 10d and four treatments + ctrl treat1, treat2, treat3, treat4). For each of the 15 combinations I have triplicates (in total 45)

If I understood it correctly both edger and deseq2 works with this interactions terms to combine multiple factors (They use different commands, but the interactions are similar). In this case the full model would be (~Treat + day + Treat:day) and the reduced model (~Treat + Time).

To take the example from the edger manual's contrast matrix - what would be the difference between this two contrasts?

DrugvsPlacebo.0h = Drug.0h-Placebo.0h,
DrugvsPlacebo.1h = (Drug.1h-Drug.0h)-(Placebo.1h-Placebo.0h),

If I want to test for changes between treat and ctrl for each TP should I use the first contrast and do this (after combining the columns treatment and day from the sample information table:

treat1vsWT.1d = treat1.1d-WT.1d
...
treat2vsWT.1d =  treat2.1d-WT.1d
...
treat3vsWT.10d =  treat3.10d-WT.10d
...

which would give me 12 different pair-wise comparisons.

But what is different in the second contrast in the example above?

Another question is what would happen, if I use the given full and reduced model to get this

design <- model.matrix(~Treat + Treat:day, data=sampleInfo)

Now i will have many coefficients. I f I'm looking for genes changing over all time points, I would combine the coefficients into one vector. Let's say I would like to find all genes that significantly changed between the control and treat1 on all days. Would this be the correct syntax?

qlf <- glmQLFTest(fit, coef=c(7,12,17))

Would This give me the genes changed over all time points? Does this mean these genes are significantly changed in all time points independently?

Thanks

edger deseq2 interaction design matrix • 1.1k views

ADD COMMENT • link updated 7.0 years ago by James W. MacDonald 68k • written 7.0 years ago by Assa Yeroslaviz ★ 1.5k

score 0 · Answer 1 · 2019-02-15

With that many different combinations the number of interactions gets so large that trying to get a high level assessment of what is going on becomes (IMO) almost impossible. You would probably be better off using a spline fit and testing for the interaction that way. If there are lots of genes, you can then use something like k-means to cluster the genes that have a significant interaction between the spline and treatment and present results as sets of genes that react similarly, over time, to a given treatment, which is much easier to explain than trying to get a cohesive picture from 15 different interactions.