Question

Differentially expressed genes across multiple conditions

0

Entering edit mode

vam3002 • 0

@vam3002-15067

Last seen 5.8 years ago

Hi everyone,

I have RNA- Seq data from 7 samples with 4 different treatments(A,B,C,D) and one control(Z) (7samples X 5conditions). Now I want to use limma- voom to compute differentially expressed genes(up- regulated and down- regulated) across all four conditions. My question is how do I construct the model.matrix so that I can query genes that are up- regulated and down- regulated in all four conditions. I am not interested in pairwise comparisons, I am interested in genes that are differentially expressed in all four conditions(A,B,C,D vs Z).

Please throw some light on how to construct the model.matrix and contrast.fit.

Thanks you.

differential gene expression limma voom multiple treatments • 2.0k views

ADD COMMENT • link updated 5.8 years ago by James W. MacDonald 65k • written 5.8 years ago by vam3002 • 0

score 0 · Answer 1 · 2018-07-18

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 1 day ago

United States

If I understand your question correctly, you can't. Instead you should use a model matrix that makes all the comparisons you care about and then you can use decideTests to select the genes that are significant in all four comparisons.

ADD COMMENT • link 5.8 years ago James W. MacDonald 65k

0

Entering edit mode

Hi James,

Thanks for your response. So you are saying I should use model.matrix(~treatments) and use contrast.fit(A-Z, B-Z, C-Z, D-Z) to compute pairwise comparisons between control and the other four conditions individually and then use decideTests to query differentially expressed genes across all four conditions?

ADD REPLY • link 5.8 years ago vam3002 • 0

0

Entering edit mode

If you use a treatments contrast design (which is what model.matrix(~treatments) will do), then if you have already set Z as the baseline, the coefficients will be, by default, A-Z, B-Z, etc, and you don't need a contrasts matrix.

ADD REPLY • link 5.8 years ago James W. MacDonald 65k

0

Entering edit mode

Thanks a lot for your input James. I have one last question- I also want to adjust for age variable. Would model.matrix(~treatments+age) work to remove the effect of age from the results?

ADD REPLY • link 5.8 years ago vam3002 • 0

1

Entering edit mode

That's a slightly complex question. One way to interpret that model is that you are now fitting four parallel lines, where the slope of the lines tell you how much the gene expression changes for each unit increase in age, and the difference in intercepts is the difference between treatments. So you have now incorporated extra information about the age of your subjects, but you assume that the effect of age is the same, regardless of treatment.

But what if it's not? You could imagine a gene where age has very little effect on expression for untreated subjects, but where a particular treatment makes the gene expression highly dependent on age. In that case you wouldn't want to assume that the slopes for all the treatments are identical (because they aren't), and you would want to include an interaction term (treatment:age), which allows each treatment to have a separate slope.

This gets sort of tricky. You can fit an over-specified model that includes the interaction terms and then do an F-test to see if any of the genes have a significant interaction term. If not, you can then drop the interaction and assume that you can constrain to equal slopes. If some genes have a significant interaction term you can just go with the over-specified model for all genes, or you could hypothetically fit different models for those genes with a significant interaction and for those without.

There are competing interests here. The over-specified model is using more degrees of freedom than necessary, so you lose power. But fitting separate models may affect the empirical Bayes hyperparameter you are estimating, which may cause you to lose power as well.

ADD REPLY • link 5.8 years ago James W. MacDonald 65k

0

Entering edit mode

Thanks James. I am thinking of applying regression equation lm(exp.mat~Age) and use the residual values for differential expression analysis. Do you think it is a correct approach to adjust for age?

ADD REPLY • link 5.8 years ago vam3002 • 0