Question: limma with paired samples and a continuous variable
gravatar for gil.hornung
9 days ago by
gil.hornung0 wrote:


I have expression data from patients that underwent treatment with a drug, before and after treatment. I also have a measurement of something in their blood, which I call "score". The samples metadata looks something like this:

patient . treatment. score
A before 0.31
A after 0.44
B before 0.39
B after 0.28

I expect that gene expression will be related to the "score", i.e. for some genes higher scores mean higher expression and vice versa. Nonetheless, I am interested in those cases where this relationship changes before in after treatment, i.e. before treatment higher scores implies higher expression but after treatment there is no such relationship. To my understanding I need the interaction term between treatment and score.

If score would be a categorical variable, then I guess (but not sure) that I would try the steps below, based on the limma manual:

design <- model.matrix(~0+treatment+score+treatment:score, data=sample_metadata)
corfit <- duplicateCorrelation(eset, design, block=sample_metadata$patient)
limma_fit <- lmFit(eset, design, block=sample_metadata$patient, correlation=corfit$consensus)
fit <- eBayes(limma_fit)

Does this sound reasonable? Is it the same for the continuous case as well? Finally, if I would like to look at the results from a single gene, how should I visualize them?

Thank you,


limma • 76 views
ADD COMMENTlink modified 9 days ago by Aaron Lun23k • written 9 days ago by gil.hornung0
Answer: limma with paired samples and a continuous variable
gravatar for Aaron Lun
9 days ago by
Aaron Lun23k
Cambridge, United Kingdom
Aaron Lun23k wrote:

Let's set up an example to illustrate:

score <- runif(10)
patient <- gl(5, 2)
treatment <- factor(rep(c("B", "A"), 5), c("B", "A")) 
# ... to set 'before' as reference

I would do:

design <- model.matrix(~0 + patient + treatment + treatment:score)

The first 5 coefficients are the patient-specific blocking factors and represent the average expression in each patient before treatment for a hypothetical case where the score is zero. The treatmentA coefficient represents the average log-fold change upon treatment, again in the hypothetical case of score zero. The final two terms represent the log-fold change per unit increase in the score (whatever that might mean) within the before-treatment or after-treatment groups, respectively.

Your comparison of interest would then simply be:

colnames(design) <- make.names(colnames(design))
con <- makeContrasts(treatmentA.score - treatmentB.score, levels=design)

Visualization is straightforward:

  • Fit a linear model to the expression profile for a gene.
  • Set all patient blocking factors to zero. Possibly also set treatmentA to zero, if you really only care about the trend within each treatment condition.
  • Multiply the coefficients by the design matrix and add on the residuals to get "de-patient-ified" observations for plotting with respect to score.
ADD COMMENTlink modified 9 days ago • written 9 days ago by Aaron Lun23k

Thank you Aaron for the fast and helpful reply! One question: I thought that the "patients" is more of "random effects" than "fixed effect" (although, to be honest, I never really understood the difference between the two). This is why I though of using duplicateCorrelation and block. What do you suppose is better?

ADD REPLYlink modified 9 days ago • written 9 days ago by gil.hornung0

I would treat patient as a fixed effect here - see the first point in this thread.

ADD REPLYlink written 9 days ago by Aaron Lun23k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 262 users visited in the last hour