Hi everyone,
I have RNA- Seq data from 7 samples with 4 different treatments(A,B,C,D) and one control(Z) (7samples X 5conditions). Now I want to use limma- voom to compute differentially expressed genes(up- regulated and down- regulated) across all four conditions. My question is how do I construct the model.matrix so that I can query genes that are up- regulated and down- regulated in all four conditions. I am not interested in pairwise comparisons, I am interested in genes that are differentially expressed in all four conditions(A,B,C,D vs Z).
Please throw some light on how to construct the model.matrix and contrast.fit.
Thanks you.
Hi James,
Thanks for your response. So you are saying I should use model.matrix(~treatments) and use contrast.fit(A-Z, B-Z, C-Z, D-Z) to compute pairwise comparisons between control and the other four conditions individually and then use decideTests to query differentially expressed genes across all four conditions?
If you use a treatments contrast design (which is what
model.matrix(~treatments)
will do), then if you have already set Z as the baseline, the coefficients will be, by default, A-Z, B-Z, etc, and you don't need a contrasts matrix.Thanks a lot for your input James. I have one last question- I also want to adjust for age variable. Would model.matrix(~treatments+age) work to remove the effect of age from the results?
That's a slightly complex question. One way to interpret that model is that you are now fitting four parallel lines, where the slope of the lines tell you how much the gene expression changes for each unit increase in age, and the difference in intercepts is the difference between treatments. So you have now incorporated extra information about the age of your subjects, but you assume that the effect of age is the same, regardless of treatment.
But what if it's not? You could imagine a gene where age has very little effect on expression for untreated subjects, but where a particular treatment makes the gene expression highly dependent on age. In that case you wouldn't want to assume that the slopes for all the treatments are identical (because they aren't), and you would want to include an interaction term (treatment:age), which allows each treatment to have a separate slope.
This gets sort of tricky. You can fit an over-specified model that includes the interaction terms and then do an F-test to see if any of the genes have a significant interaction term. If not, you can then drop the interaction and assume that you can constrain to equal slopes. If some genes have a significant interaction term you can just go with the over-specified model for all genes, or you could hypothetically fit different models for those genes with a significant interaction and for those without.
There are competing interests here. The over-specified model is using more degrees of freedom than necessary, so you lose power. But fitting separate models may affect the empirical Bayes hyperparameter you are estimating, which may cause you to lose power as well.
Thanks James. I am thinking of applying regression equation lm(exp.mat~Age) and use the residual values for differential expression analysis. Do you think it is a correct approach to adjust for age?