I've got an expression data set with a wild type and a knock-down genotype both treated with two different dosages of a drug.

I'd like to get the genes that are more effected by treatment (ideally across dosages) in one genotype compared to the other. Since both the genotype and the treatment contribute to differences in expression I'd have to "correct" for the effect of the genotype.

Conceptually, does it come down to a variable intercept model where I look for a significant difference in slopes or how do I best model this? (should be possile in limma or would I need some kind of mixed-effects model using lme4 etc. ?)

How would I build the formula for the model in limma and how would I then extract the relevant p-values for the contrast mentioned above? (a code example with the variable names genotype & dosage would be awesome).

Hello James and thanks for your input! I had had a look into this example in the limma userguide and calculated the contrasts as difference of differences as suggested in section 9.5.1 for the two dosage-levels separately. However my example has two (potentially more) levels of rising dosage of the drug, so I was wondering if there is a way to model those together in oder to gain power ...

If you just have two levels, it doesn't really make sense to do anything but a two-factor ANOVA. If you have multiple doses, then things get more complicated. You can assume that increasing levels of drug result in linear changes in gene expression, in which case you would fit a linear model with drug dose as a continuous variable.

But maybe increasing levels of drug cause the gene expression to plateau, or maybe even cause the expression to go back down. In that case a straight line doesn't make sense, and you would want to account for curvature. You could hypothetically use a quadratic or cubic polynomial to account for that, but given enough doses you could just fit a spline. That's 9.6.2 in the limma User's Guide. The example uses time rather than dose, but it doesn't really matter - both time and dose are just continuous covariates.

Another reason why I posted the question originally was that the result of the calculation for the difference of differences didn't look right. There were a lot of genes reported significantly changed which are usually pretty constant in expression across experiments. So perhaps just as a summary of what I did - for only one treatment-level to keep it simple:

With fac being a factor with one level for every combination of genotype and treatment:

design=model.matrix(~0 + fac)

colnames(design) = levels(fac)

fit = lmFit( exressionset, design )

contrastMatrix = makeContrasts(diff=treatment_genotypeB-nontreatment_genotypeB)-(treatment_genotypeA-nontreatment_genotypeA),levels=design)

fitc = contrasts.fit(fit,contrasts=contrastMatrix)

fitc = eBayes(fitc)

tbl = topTable(fitc,coef='diff',number=Inf)

Does that look OK to you? As I said, the results look a bit odd ...

Seems OK to me. But there is a difference between thinking the results are off because biologically they don't make sense and thinking the model specification is wrong somehow. I don't see a problem with the model specification, so you should probably be doing some exploratory data analysis. It's usually instructive to plot the gene expression for those genes that are differentially expressed, which you need to do anyway if you are to interpret the interaction (the fold change doesn't really tell you anything).

Thanks again for checking, much appreciated!

One thing I still don't get my head around is the relationship between the contrasts defined above in limma and an interaction term in a standard R linear model. I have an idea what an interaction is in a standard R model where you have two independent variables like in the model

lm(~ genotype + treatment + genotype:treatment,data=d)

All the beneficial niceties that limma adds aside - are the p-values of the interaction term of the standard R model output the same as the ones I'd calculate via the contrast defined above?

How are they conceptually different or the same?

... and sorry for my ignorance *LoL*

They are identical. You can do something like

And then spend the time to figure out what each of the coefficients in that model represent. In the end you will find that trt2:dose2 is (trt2_dose2 - trt2_dose1) - (trt1_dose2 - trt1_dose1). So the conventional model that R will specify gives you the 'correct' interaction term for free.

But the other two terms are trt2_dose1 - trt1_dose1 and trt1_dose2 - trt1_dose1, and you then have to do the algebra to figure out the contrasts you need to do in order to get something like trt2_dose2 - trt2_dose1, which would be trivial to specify using the parameterization that you generated using your 'fac' variable.

I can't thank you enough for your input, James ... that clears up things A LOT for me !!!