Multi-level experiment with continuous varibles and confounders in limma
1
0
Entering edit mode
kdiamanti • 0
@kdiamanti-16999
Last seen 11 weeks ago
Sweden

Dear limma team,

To a large extent this question is a continuation of this post.

The main question concerns an experimental design that involves the same samples across multiple tissues (e.g. T1, T2 and T3), confounding factors (e.g. bmi and age) and a condition that can be both continuous and/or discrete with more than 2 levels (e.g. L1, L2, L3).

Question 1: When the condition is discrete and one wants to account for confounding factors then how correct would be the following design:

Treat = factor(paste(Condition, Tissue, sep = "."))

design <- model.matrix(~ 0 + Treat + bmi + age, data = covars)

corfit <- duplicateCorrelation(data, design, block = covars$Subject) fit <- lmFit(data, design, block = covars$Subject, correlation = corfit$consensus) cm <- makeContrasts( # pair-wise comparisons among conditions for the same tissue (T1) L1vsL2forT1 = TreatL1.T1 - TreatL2.T1, L1vsL3forT1 = TreatL1.T1 - TreatL3.T1, L2vsL3forT1 = TreatL2.T1 - TreatL3.T1, # pair-wise comparisons among tissues for the same condition (L1) T1vsT2forL1 = TreatL1.T1 - TreatL1.T2, T1vsT3forL1 = TreatL1.T1 - TreatL1.T3, T2vsT3forL1 = TreatL1.T2 - TreatL1.T3, # compare one tissue to all others for the same condition T1vsAllforL1 = TreatL1.T1 - (TreatL1.T2 + TreatL1.T2) / 2, T1vsAllforL2 = TreatL2.T1 - (TreatL2.T2 + TreatL2.T2) / 2, T1vsAllforL3 = TreatL3.T1 - (TreatL3.T2 + TreatL3.T2) / 2, levels = design) fit2 <- contrasts.fit(fit, cm) fit2 <- eBayes(fit2) topTable(fit = fit2, coef = "L1vsL2forT1")  Additionally, would it be correct to say that the significant genes that are unique for T1vsAllforL1 when compared to T1vsAllforL2 and T1vsAllforL3, could be (presumably) condition-specific responses for tissue T1? Question 2: Now when the condition is continuous (e.g. C1): design <- model.matrix(~ 0 + Tissue + Tissue:C1 + age + bmi, data = covars) colnames(design) <- c("T1", "T2", "T3", "age", "bmi", "T1.C1", "T2.C1", "T3.C1") corfit <- duplicateCorrelation(data, design, block = covars$Subject)

fit <- lmFit(data, design, block = covars$Subject, correlation = corfit$consensus)

cm <- makeContrasts(
# association of the tissue C1 to gene expression of tissue T1
T1 = T1.C1,

# pair-wise comparison of tissues associations of C1 to gene expression
T1vsT2 = T1.C1 - T2.C1,

# tissue-specific associations of C1 to gene expression
T1vsAll = T1.C1 - (T2.C1 + T3.C1) / 2,
levels = design)

fit2 <- contrasts.fit(fit, cm)
fit2 <- eBayes(fit2)
topTable(fit = fit2, coef = "T1vsAll")


And the same as above here: Genes that are significant in T1vsAll, does it mean that they (presumably) are condition-specific responses for tissue T1?

Question 3: In couple of other posts (1 and 2) it was recommended by Aaron that continuous outcomes/conditions, such as C1 in this case, might benefit from splines. How could this extra parameter be applied in this case?

Thanks beforehand.

Klev

0
Entering edit mode
Aaron Lun ★ 27k
@alun
Last seen 3 hours ago
The city by the bay

## Question 1

Looks sensible to me.

## Question 2

Are you sure you have your column names in the right order there?

Genes that are significant in T1vsAll, does it mean that they (presumably) are condition-specific responses for tissue T1?

Yes, for some definition of "condition-specific". In this case, you're just comparing against the average, so it is possible for, say, T1.C1 and T2.C1 to be the same and T3.C1 to be very different, and then you'll reject the null because T1.C1 is different from the average of T2.C1 and T3.C1.

For another defintition of "condition-specific", you should do pairwise comparisons between T1.C1 and T2.C1, etc. and then take the intersection of genes that are DE in all those pairwise comparisons involving T1.C1. You might possibly do this in a directional manner, e.g., only consider genes with positive significant differences in associations, to get genes where the association is significantly more positive in T1 than in the other treatments.

For yet another definition of "condition-specific", you might only want genes that are significant in T1.C1 and are not significant (at all) in the other treatments. This would require you to test each T*.C1 term and then take the subset of genes that are significant in T1.C1 and not in the the other treatments. That's a bit of a hack because DE methods are not good at finding non-DE genes, but that's the way it goes.

## Question 3

Replace Tissue:C1 with Tissue:X where X <- splines::ns(C1, df=5). Or however many df you can afford, depending on your sample size. This improves the flexibility of the fit to non-linear relationships with respect to C1 but obviously it means that you cannot describe the association in simple "positive/negative" terms anymore.