Question: Limma design/contrast correct?
0
gravatar for rasmus.agren
5 months ago by
rasmus.agren0 wrote:

Hi,

I wonder if the design/contrast I'm using in Limma is the best one for my experiment. In particular, I'm not sure of how duplicateCorrelation works here. I though I'd post the design here and hopefully someone can give me some hints.

The research question is quite straight forward, but there are some complications with the design. The experiment studies treated/untreated (male and female) with a drug and then before/after an intervention. The questions we'd like to answer are:

  • How does the transcriptional profile differ between treated/untreated?

  • Are there genes that respond to the intervention differently in treated/untreated? This is the most relevant question.

  • Are there sex differences in the response to the intervention?

Then there are two complications:

  • The samples were taken in 5 batches, and those are confounded with sex (3 for male and two for female). The female samples were also generated by another hospital and much later, so the batch effect between (1/2/3) and (4/5) is larger than between 1/2/3 or 4/5.

  • For male, both samples for a subject (i.e. the before/after intervention samples) are analysed in the same batch. For female, the samples have been randomized to batch, so there are some subjects which have their "before" sample in one batch and their "after" sample in another.

  • We need to account for a number of covariates; age, BMI and so on. Importantly, we'd like to account for a type of medication that's only being used by a subset of the treated subjects (not the same treatment that the study is about). All these factors are properties of subjects rather than samples.

My current reasoning is that we cannot say anything about sex differences, since it's confounded by batch. Blocking by subject would be fine if we were only interested in the effect of the intervention, but then we cannot say anything about the interaction effect with treated/untreated (since they would be colinear). It would also fail to handle the batch effect for those subjects where the before/after samples were in different batches. Also, we'd like to quantify the effect of the covariates as a sanity check. Based on this, the only solution I could see was to view subject as a random effect. The model I'm using is:

design <- model.matrix(~ 0 + factor(intervention) + factor(treated_untreated) + factor(batch) + bmi + factor(other_medication))
corfit <- duplicateCorrelation(exprs(es), design, block = pData(es)$Subject)
fit <- lmFit(exprs(es), design, block = pData(es)$Subject, correlation = corfit$consensus)

Note that other_medication is a confounding medication that I'd like to remove the effect of. It's FALSE for all untreated and some of the treated. I then test for differentially expressed genes with contrast, e.g.

contrast <- makeContrasts(interventionAfter - interventionBefore, levels=design)
fit2 <- contrasts.fit(fit, contrast)
fit2 <- eBayes(fit2, trend = TRUE)

Does this make sense? Will other_medication be corrected for as I'd like even though there are no untreated subjects who are on that drug or will it get some of the "treated signal"? Are there any complications with using duplicateCorrelaction like this when some paired samples are split between batches? Thanks a lot!

ADD COMMENTlink modified 5 months ago by Aaron Lun24k • written 5 months ago by rasmus.agren0

It really helps if you post some example code to generate something close to your design. Otherwise it's hard to tell from words. Did each patient receive the drug and the intervention, and samples were collected before and after each one? Were all patients treated with the drug, but only some of them got the intervention?

Help us out and do something like:

batch <- rep(1:5, each=10) 
sex <- rep(c("M", "M", "M", "F", "F"), each=10)
drug <- ???
intervention <- ???
patient_id <- ???

... to give us an idea of what you're really talking about. Words are cheap, but code is real.

ADD REPLYlink modified 5 months ago • written 5 months ago by Aaron Lun24k

Sure, it would look something like this (skipping all confounding variables other than batch). patient_id would be the random factor that is called pData(es)$Subject in my initial post.

# 5 batches, 4 subjects per batch, 2 samples per subject
batch <- rep(1:5, each = 8)

# First 3 batches are male, last 2 are female
sex <- c(rep("M", 8*3), rep("F", 8*2))

# Half of the subjects are on the drug
drug <- rep(c("Yes", "Yes", "No", "No"), 8*5/4)

# All subjects have measurements before and after an intervention
intervention <- rep(c("Before", "After"), 8*5/2)

patient_id <- c(rep(paste0("pat", 1:12), each = 2), # All males (12 subjects) have paired samples in the same batch
  rep(paste0("pat", 13:14), each = 2), # The same is true for 2 of the females in batch 4
  paste0("pat", 15:18), paste0("pat", 18:15), # 4 female subjects are split between batch 4 and 5.
  rep(paste0("pat", 19:20), each = 2)) # 2 the females in batch 5 have paired samples in the same batch

# Note that sex isn't used here
design <- model.matrix(~ 0 + factor(intervention) + factor(drug) + factor(batch))
ADD REPLYlink modified 5 months ago • written 5 months ago by rasmus.agren0
Answer: Limma design/contrast correct?
3
gravatar for Aaron Lun
5 months ago by
Aaron Lun24k
Cambridge, United Kingdom
Aaron Lun24k wrote:

I would recommend doing something like this:

g <- factor(paste0(sex, drug, intervention))
b <- factor(batch)
design <- model.matrix(~0 + g + b) # plus other confounding covariates
design <- design[,setdiff(colnames(design), "b4")] # get to full rank

... and block on patient_id in duplicateCorrelation(). The first 8 coefficients represent the average log-expression of each combination of sex/drug/intervention, while the remaining coefficients represent the uninteresting batch effects. This approach avoids making additive assumptions about how the sex/drug/intervention factors interact with each other.

How does the transcriptional profile differ between treated/untreated?

If you want to test for treated/untreated differences in females before interventions, you would do:

con <- makeContrasts(gFYesBefore -  gFNoBefore, levels=design)

If you want to test for treated/untreated differences in females, averaging across their intervention status, you would do:

con <- makeContrasts((gFYesAfter + gFYesBefore)/2 -  
    (gFNoAfter + gFNoBefore)/2, levels=design)

There's at least a few other ways to cut this particular pie. You could do an ANOVA with:

con <- makeContrasts(
    gFYesAfter - gFNoAfter, # post-intervention differences
    gFYesBefore - gFNoBefore, # pre-intervention differences
    levels=design)

... and look for significant genes where the log-fold change was in the same direction in both contrasts. Or you could do separate tests for each contrast and intersect the resulting lists of DE genes that change in the same direction. This logic can be extended to the males as well, giving you a total of 4 contrasts to intersect.

The choice of testing strategy depends on how stringent you want to be. A simple treated versus untreated comparison no longer exists if there are interactions between drug treatment, the intervention and sex. (And if there aren't... well, your remaining two questions won't have very interesting answers.)

In all of these contrasts, we're always comparing within each sex. Thus, there are no problems with the fact that batch is fully confounded with sex.

Are there genes that respond to the intervention differently in treated/untreated? This is the most relevant question.

Easy, two-way interactions. Here we're looking for differences in the intervention response with/without treatment in females:

con <- makeContrasts((gFYesAfter -  gFYesBefore) - (gFNoAfter -  gFNoBefore), levels=design)

The same applies for males, I won't bother writing it. Again, if you want to combine these results across males and females, you'll have to decide whether you want to average the effects, use an ANOVA, or intersect the DE lists.

As before, we're comparing within each sex, so the confounding batch is not a problem.

Are there sex differences in the response to the intervention?

Yet another two-way interaction. For differences in intervention response between sexes for the drug-treated group:

con <- makeContrasts((gFYesAfter -  gFYesBefore) - (gMYesAfter -  gMYesBefore), levels=design)

... and same again for the untreated group. Now, this does involve comparing between sexes, but actually, we don't have to worry about the fact that batch and sex are confounded. This is because we are computing the intervention response within each sex, and then comparing those responses between sexes. Any batch effect cancels out when we compute the response, making this contrast valid.

Will other_medication be corrected for as I'd like even though there are no untreated subjects who are on that drug or will it get some of the "treated signal"?

This should be okay, provided the subjects on this other medication are not all of the male subjects (or all of the female subjects).

Are there any complications with using duplicateCorrelaction like this when some paired samples are split between batches?

No, this is fine.

ADD COMMENTlink written 5 months ago by Aaron Lun24k

Thanks a lot for such an exhaustive answer! I will need some time to digest this, but I wonder if you could explain what you mean with "This approach avoids making additive assumptions about how the sex/drug/intervention factors interact with each other.". I actually already use your way of formulating the model and setting up contrasts for drug and intervention, but I thought it would be easier to explain with the model I used here. I was under the impression that the two formulations would be identical, but that isn't the case then?

ADD REPLYlink written 5 months ago by rasmus.agren0

No, those two formulations are not identical. The two design matrices have different numbers of coefficients:

f1 <- factor(rep(c("A", "B"), each=4))
f2 <- factor(rep(1:2, 4))
model.matrix(~f1 + f2) # (1) additive

g <- paste0(f1, f2)
model.matrix(~0 + g) # (2) cell-means layout

If you wanted an equivalent model to 2 in terms of f1 and f2, you should do:

model.matrix(~0 + f1*f2) # (3) interaction

Here, you can go from 3 to 2 (and vice versa) with linear combinations of the column vectors.

ADD REPLYlink written 5 months ago by Aaron Lun24k

Thanks! I answered too quickly and forgot that the example model I posted here didn't include an interaction term. Now I understand your point. My initial question was badly formulated because the model didn't answer the list of "questions we'd like to answer". That's because the purpose of the post was to ask about duplicateCorrelation so I used a minimal example for that, but you were kind enough to also address those other issues. Anyways, thanks again!

ADD REPLYlink written 5 months ago by rasmus.agren0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 179 users visited in the last hour