Question

Thoughts on how to model gene expression data from two groups in a clinical trial

0

Entering edit mode

S • 0

@0b4d0d5b

Last seen 17 months ago

Norway

Hi,

I am trying to analyze microarray data from a clinical trial where two groups were randomized to 2 different diet types (control and experimental diet) and trying to determine if the experimental group shows significant gene expression changes than control group from before the diet to after the diet. So here, instead of log normalized intensity data, I am using log2 ratio of end of study intensity and baseline intensity as the dependent variable. I was wondering can this be modeled using limma? To my best understanding, limma expects log normalized gene expression and not ratio of log normalized gene expression at two different time points. If not, should I just use simple linear model for this?

Microarray DifferentialExpression • 1.6k views

ADD COMMENT • link 3.6 years ago S • 0

score 0 · Answer 1 · 2022-06-07

0

Entering edit mode

Gordon Smyth 53k

@gordon-smyth

Last seen 19 hours ago

WEHI, Melbourne, Australia

limma was originally written for two-color microarrays, so it is perfectly capable of analysing log-ratios. However I don't think you should be forming log-ratios in the first place. Unless you have two-color microarrays, you should keep all the control and experimental microarrays separate and analyse the data using a linear model that keeps track of experimental vs control comparisons. There is no need to form log-ratios manually.

You question sounds similar to the study that I advised you on some months ago: Adjust for base line gene expression in limma model. You haven't described your experimental design in detail, but it seems to me that you might be seeking a bespoke solution for an experiment that really could be analysed more easily by standard methods.

ADD COMMENT • link 3.6 years ago Gordon Smyth 53k

0

Entering edit mode

Hi,

Thank you again for your reply. In my experimental design, I have two groups of individuals (human samples) fed with two separate diets over a period of 18 weeks. One is a control diet and another is an experimental diet. Both the groups were profiled for gene expression before (week 0) and after the diet (week 18). Now I want to find which of the genes are differentially induced or suppressed in the experimental group as compared to the control group. Would an appropriate model in this case be a comparison of the two groups one at baseline and other at end of study and see any new genes are differentially expressed at end of study that is not seen at beginning of study?

ADD REPLY • link 3.6 years ago S • 0

0

Entering edit mode

No, that's not an appropriate approach. The correct approach is as I indicated to you 5 months ago. I'll spell it out more explicitly here.

Let's suppose your data is like this:

> Subject <- gl(6,2,12)
> Time <- gl(2,1,12)
> Diet <- gl(2,6,12)
> levels(Time) <- c("Baseline","Diet")
> levels(Diet) <- c("Control","Experimental")
> data.frame(Patient,Time,Diet)
   Subject     Time         Diet
1        1 Baseline      Control
2        1     Diet      Control
3        2 Baseline      Control
4        2     Diet      Control
5        3 Baseline      Control
6        3     Diet      Control
7        4 Baseline Experimental
8        4     Diet Experimental
9        5 Baseline Experimental
10       5     Diet Experimental
11       6 Baseline Experimental
12       6     Diet Experimental

Form the design matrix like this:

> ControlvsBaseline <- (Time=="Diet" & Diet=="Control")
> ExperimentalvsBaseline <- (Time=="Diet" & Diet=="Experimental")
> design <- model.matrix(~Subject)
> design <- cbind(design,ControlvsBaseline,ExperimentalvsBaseline)

Then use a limma analysis to find genes DE for the contrast: ExperimentalvsBaseline - ControlvsBaseline.