Search
Question: limma: how to use pairing and adjust for covariates
1
18 months ago by
gregory.l.stone10 wrote:

I am interested in looking at the differences in the effect of a condition on each sex. The contrast I am interested in is (Female.post - Female.pre) - (Male.post - Male.pre). I am uncertain as to how best incorporate pairing into my study and adjust for covariates.

One way I have tried to incorporate pairing is to randomize the patient effect by using duplicateCorrelation(). My design is therefore:

f = paste(dge$samples$sex, dge$samples$condition, sep = ".")
f = factor(f, levels = c("Male.pre", "Male.post", "Female.pre", "Female.post"))

d = model.matrix(~ 0 + f + age + dm + axc, colData)

cont = makeContrasts(M.PvP="fMale.post-fMale.pre", F.PvP="fFemale.post-fFemale.pre", sexDiff="(fFemale.post-fFemale.pre) - (fMale.post-fMale.pre)", levels = d)

where colData is my list of covariates, dge&samples is the colData object in my dge object containing all sample an count info, M.PvP is the effect of condition on males, F.PvP is the effect of condition of females, and sexDiff is the contrast of interest (the difference in the effect of condition on each sex). I get around 274 genes for the contrast of interest.

However, I am concerned that using duplicateCorrelation() in addition to adjusting for covariates in the design matrix is over-adjusting and assumes a linear relationship between my outcome and my covariates.

So another strategy I have tried is to block for the pairing effect in the design matrix as such:

design <- model.matrix(~ patient + sex:condition, colData)
design <- design[,!grepl("pre", colnames(design))]
colnames(design) <- sub(":", "_", colnames(design))
con <- makeContrasts(sexDiff = "sexFemale_conditionpost - sexMale_conditionpost", levels=design)

However, when I go to do my fit, I get the error "In contrasts.fit(fit, con) : row names of contrasts don't match col names of coefficients". Additionally, I only get 6 genes in my contrast of interest.

What strategy should I be going with? Thank you so much for taking the time to help!

modified 18 months ago by Aaron Lun21k • written 18 months ago by gregory.l.stone10

If you got an error in your second approach, how did you still manage to test for DE genes with that contrast? In any case, provide a minimum working example that can recapitulate the error.

When I make my contrast matrix (following con <- makeContrasts(sexDiff = "sexFemale_conditionpost - sexMale_conditionpost", levels=design)), I get the following message: Warning message: In makeContrasts(sexDiff = "sexFemale_conditionpost - sexMale_conditionpost",  : Renaming (Intercept) to Intercept

When I make my fit based on my contrast matrix (fit = lmfit(voom, design) fit2 = contrasts.fit(fit, con)) I get the following:

Warning message:
In contrasts.fit(fit, con) :
row names of contrasts don't match col names of coefficients

Limma still allows me to create these objects

That's a warning, not an error.

1
18 months ago by
Aaron Lun21k
Cambridge, United Kingdom
Aaron Lun21k wrote:

I would go with your second approach, i.e., block on patient in the design matrix. duplicateCorrelation is somewhat anticonservative so the greater number of DE genes in your first approach is not a surprise. There are also a number of assumptions (e.g., homogeneity of patient effects) and compromises (e.g., using the same consensus correlation for all genes) involved in using duplicateCorrelation, in addition to the linearity issues you've already mentioned. All of this is avoided with the second approach, albeit with some loss of power as information is spent on estimating the patient effects. Fewer DE genes, perhaps, but better safe than sorry.

When I block on patient in the design I now find I get the following message when normalizing with voom:

> v = voomWithQualityWeights(dge, design, plot=FALSE)
Coefficients not estimable: sexMale.conditionpost sexFemale.conditionpost

The same message prints repeatedly. I have samples for each condition, though. How can I correct this?

It seems like you haven't removed the pre* coefficients in the second design matrix. If that's not the case, please post a minimum working example that reproduces the warning.

You're right! I apologize for my carelessness. However, if I am not specifying an intercept, don't I need the pre terms in my design so that I can make a contrast like the following: con <- makeContrasts(sexDiff = "(sexFemale_conditionpost - sexFemale_conditionpre) - (sexMale_conditionpost - sexMale_conditionpre)", levels=design_noInt)?

No. Once you've pruned the design matrix, the interaction terms are the post/pre log-fold changes.

So in the case when my design matrix is ~ patient + sex:condition, I should remove the pre terms from the design and then construct my contrast matrix: makeContrasts(sexDiff = "sexFemale_conditionpost - sexMale_conditionpost", levels=design)

Now, in the case when my design matrix is ~ 0 + patient + sex:condition, I also should remove the pre terms and construct the same contrast matrix.

I'm now confused as to what the difference is between the designs ~ patient + sex:condition and ~ 0 + patient + sex:condition. I'm sure I have misunderstood something you've said along the way. I apologize for my confusion, and thank you so much for being so generous with your time!

The only difference (in the pruned designed matrices) is that, if you have an intercept, it represents the average log-expression in the first patient, and all other patient terms represent the log-fold change relative to the first patient. Without an intercept, each patient term directly represents the average log-expression in that patient. In both cases, the interpretation of the (interesting) interaction terms is unchanged.

Ok thank you, that makes a lot of sense. So just that I am clear, in both cases, sexMale_conditionpost represents the log-fold change from pre to post in males?

I would assume so. Check the design matrix to be sure; you should see entries of 1 for post-treatment male samples, and zero for everything else.

Ok great. Thank you very much for all of your help!

I am implementing the design you've recommended and am concerned that I am not getting the contrast I am interested in. The columns Male.post and Female.post in my design matrix have a 1 for all male and female samples, respectively, taken post condition. When I contrast these, Female.post-Male.post, I initially thought that the contrast I was doing was (Female.post-Female.pre)-(Male.post-Male.pre), but the interaction between sex and pre samples were removed to make the design full-rank, making me believe that I am getting the change between the sexes at time point post, not the change between the sexes between time points pre->post, which is what I'm interested in. Am I interpreting this incorrectly? My design is ~ patient + sex:condition

Is this a situation where I would use a nested factorial design? If so how would I go about doing that?

Male.post represents the log-fold change of post over pre in males. Female.post represents the log-fold change of post over pre in females. The contrast Female.post - Male.post tests the null hypothesis that the post/pre log-fold changes are the same between males and females. If this is not clear, I can only suggest that you consult a local statistician to help with your understanding.