Question

Help with model.matrix and creating the right contrast matrix for limma

0

Entering edit mode

rober204 • 0

@b48e8083

Last seen 7 weeks ago

Canada

Hi Limma experts, This is my first time using Limma and I am struggling to determine which contrasts to construct to answer my research questions/ if I have chosen the correct design matrix to most effectively answer my questions. The two questions I have:

Is there a treatment effect?
Is there a temporal effect?

My data and study design: Treatment (control, low, high), and Time (day minus 5, day 5, day 15, and day 30). The same pool of subjects are repeatedly sampled at each time point within each treatment, but they are sampled independently of each other (there is no way to know if it is the same individuals are being sampled at each time point). It's important to note that the experiment started on day 0, so day minus 5 (m5) is a temporal reference, prior to exposure.

I have decided to use model.matrix(~0 + group) after referencing Law (2020), and constructed groups: C_m5, C_5, C_15, C_30, L_m5, L_5, L_15, L_30, H_m5, H_5, H_15, H_30. However, there are a few other options I believe could be more useful for my case including model.matrix(~treatment*time) or model.matrix(~0+treatment+treatment:time), but I do not understand the workings of contrasts well enough to know which is best.

For my current model I chose (model.matrix(~0 + group)), I created a contrast matrix with all pairwise comparisons between groups... which looks pretty disturbing as you'll see below. I'm very unsure if this is the most effective way to make the contrasts, or if i would be better off making contrasts in another fashion, such as using interactions between treatment and time with one of the other models.

I'll Include the code of what I have thus far for limma, it's pretty ugly with all of the contrasts, which makes me feel it is wrong. I am hoping that someone is able to provide some feedback on what i have done, and if there is a different direction I should be taking. Further, with the results from decideTests, I am unable to make a vennDiagram, I will show the error below, but i am assuming it is because of how I made the contrasts, so it is not picking up my two factors (treatment and time).

Thank you in advanced for taking the time to help me today!

group <- interaction(treatment, time)

mm <- model.matrix(~0 + group) 
fit <- lmFit(log_mucus_cL, mm) 

#all pairwise comparisons...
cm <- makeContrasts(groupC._5vsgroupH._5 = groupH._5 - groupC._5,
                    groupC._5vsgroupL._5 = groupL._5 - groupC._5,
                    groupC._5vsgroupC.15 = groupC.15 - groupC._5,
                    groupC._5vsgroupH.15 = groupH.15 - groupC._5,
                    groupC._5vsgroupL.15 = groupL.15 - groupC._5,
                    groupC._5vsgroupC.30 = groupC.30 - groupC._5,
                    groupC._5vsgroupH.30 = groupH.30 - groupC._5,
                    groupC._5vsgroupL.30 = groupL.30 - groupC._5,
                    groupC._5vsgroupC.m5 = groupC._5 - groupC.m5,
                    groupC._5vsgroupH.m5 = groupC._5 - groupH.m5,
                    groupC._5vsgroupL.m5 = groupC._5 - groupL.m5,
                    groupH._5vsgroupL._5 = groupH._5 - groupL._5,
                    groupH._5vsgroupC.15 = groupH._5 - groupC.15,
                    groupH._5vsgroupH.15 = groupH.15 - groupH._5,
                    groupH._5vsgroupL.15 = groupH._5 - groupL.15,
                    groupH._5vsgroupC.30 = groupH._5 - groupC.30,
                    groupH._5vsgroupH.30 = groupH.30 - groupH._5,
                    groupH._5vsgroupL.30  = groupH._5 - groupL.30,
                    groupH._5vsgroupC.m5  = groupH._5 - groupC.m5,
                    groupH._5vsgroupH.m5  = groupH._5 - groupH.m5,
                    groupH._5vsgroupL.m5  = groupH._5 - groupL.m5,
                    groupL._5vsgroupC.15  = groupL._5 - groupC.15,
                    groupL._5vsgroupH.15  = groupH.15 - groupL._5,
                    groupL._5vsgroupL.15  = groupL.15 - groupL._5,
                    groupL._5vsgroupC.30  = groupL._5 - groupC.30,
                    groupL._5vsgroupH.30  = groupH.30 - groupL._5,
                    groupL._5vsgroupL.30  = groupL.30 - groupL._5,
                    groupL._5vsgroupC.m5  = groupL._5 - groupC.m5,
                    groupL._5vsgroupH.m5  = groupL._5 - groupH.m5,
                    groupL._5vsgroupL.m5  = groupL._5 - groupL.m5,
                    groupC.15vsgroupH.15  = groupC.15 - groupH.15,
                    groupC.15vsgroupL.15  = groupC.15 - groupL.15,
                    groupC.15vsgroupC.30  = groupC.30 - groupC.15,
                    groupC.15vsgroupH.30  = groupH.30 - groupC.15,
                    groupC.15vsgroupL.30  = groupL.30 - groupC.15,
                    groupC.15vsgroupC.m5  = groupC.15 - groupC.m5,
                    groupC.15vsgroupH.m5  = groupC.15 - groupH.m5,
                    groupC.15vsgroupL.m5  = groupC.15 - groupL.m5,
                    groupH.15vsgroupL.15  = groupH.15 - groupL.15,
                    groupH.15vsgroupC.30  = groupH.15 - groupC.30,
                    groupH.15vsgroupH.30  = groupH.30 - groupH.15,
                    groupH.15vsgroupL.30  = groupH.15 - groupL.30,
                    groupH.15vsgroupC.m5  = groupH.15 - groupC.m5,
                    groupH.15vsgroupH.m5  = groupH.15 - groupH.m5,
                    groupH.15vsgroupL.m5  = groupH.15 - groupL.m5,
                    groupL.15vsgroupC.30 =  groupL.15 - groupC.30,
                    groupL.15vsgroupH.30 =  groupH.30 - groupL.15,
                    groupL.15vsgroupL.30 =  groupL.30 - groupL.15,
                    groupL.15vsgroupC.m5 =  groupL.15 - groupC.m5,
                    groupL.15vsgroupH.m5 =  groupL.15 - groupH.m5,
                    groupL.15vsgroupL.m5 =  groupL.15 - groupL.m5,
                    groupC.30vsgroupH.30 =  groupH.30 - groupC.30,
                    groupC.30vsgroupL.30 =  groupL.30 - groupC.30,
                    groupC.30vsgroupC.m5 =  groupC.30 - groupC.m5,
                    groupC.30vsgroupH.m5 =  groupC.30 - groupH.m5,
                    groupC.30vsgroupL.m5 =  groupC.30 - groupL.m5,
                    groupH.30vsgroupL.30 =  groupH.30 - groupL.30,
                    groupH.30vsgroupC.m5 =  groupH.30 - groupC.m5,
                    groupH.30vsgroupH.m5 =  groupH.30 - groupH.m5,
                    groupH.30vsgroupL.m5 =  groupH.30 - groupL.m5,
                    groupL.30vsgroupC.m5 =  groupL.30 - groupC.m5,
                    groupL.30vsgroupH.m5 =  groupL.30 - groupH.m5,
                    groupL.30vsgroupL.m5 =  groupL.30 - groupL.m5,
                    groupC.m5vsgroupH.m5 =  groupC.m5 - groupH.m5,
                    groupC.m5vsgroupL.m5 =  groupC.m5 - groupL.m5,
                    groupH.m5vsgroupL.m5 =  groupH.m5 - groupL.m5,
                    levels = mm)

# Fit the contrasts
fit2 <- contrasts.fit(fit, contrasts = cm)

# Calculate the t-statistics for the contrasts
fit2 <- eBayes(fit2)

# Summarize results
results <- decideTests(fit2)
summary(results)

 library(VennDiagram)
vennDiagram(results)

error messages
#Error in matrix(0, noutcomes, ncontrasts) :
#invalid 'nrow' value (too large or NA)

sessionInfo( )

limma • 851 views

ADD COMMENT • link 7 months ago • updated 4 months ago rober204 • 0

score 0 · Answer 1 · 2023-09-13

0

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 46 minutes ago

WEHI, Melbourne, Australia

Your question is not about the use of limma software or even about statistics. Your question is about how to conduct a scientific analysis of your specific experiment.

The purpose of forming a contrast and conducting a statistical test is to answer a scientific question. The problem with the contrasts that you have currently formed is that they have no relationship to the scientific questions that you have posed. Hence you will just get a mess of uninterpretable and possibly conflicting information. The reason why you got an error from vennDiagram is that there is no such thing as a Venn Diagram for 66 contrasts -- such a thing is just geometrically impossible.

If I was analysing an experiment like this, I would test for genes that vary over time for each treatment. That would lead to three tests, one for each treatment. Then I test for genes whose pattern over time varies between the treatments. That is one more test. That approach would involve four tests in total rather than the 66 tests you have now.

Or you could turn it around and instead test for treatment differences at each time. That would be four tests. It's up to you, depending on your scientific hypotheses and on the conclusions you want to draw from the data.

ADD COMMENT • link 7 months ago Gordon Smyth 50k

0

Entering edit mode

Hi Gordon, thanks for taking time to respond to my question. Yep, I'm trying to do the latter for my analysis. I know how I want my scientific analysis to look, so my question is how to construct the contrasts to answer those questions, and if I was using the right model to do it effectively. As I said, I just started using this package, the knowledge gap here is in generating the contrasts to answer my question. In my mind, testing for treatment differences at each time would look like this, which is 12 tests instead of the 4 you stated, so clearly I am missing something about contrasts:

cm <- makeContrasts(
  #day -5
  HvsC_m5 = H.m5 - C.m5,
  HvsL_m5 = H.m5 - L.m5,
  LvsC_m5 = L.m5 - C.m5,

  #day 5
  HvsC_5 = H._5 - C._5,
  HvsL_5 = H._5 - L._5,
  LvsC_5 = L._5 - C._5,

  #day 15
  HvsC_15 = H.15 - C.15,
  HvsL_15 = H.15 - L.15,
  LvsC_15 = L.15 - C.15,

  #day 30 
  HvsC_30 = H.30 - C.30,
  HvsL_30 = H.30 - L.30,
  LvsC_30 = L.30 - C.30,
)

Further, this doesn't test for temporal differences within treatments, which would be further tests. Perhaps there is an example of writing contrasts in a similar matter that you can direct me to?

Again, thank you for your time.

ADD REPLY • link 7 months ago rober204 • 0

0

Entering edit mode

To test for DE at day -5 you could use:

topTable(fit2, coef=c("LvsC_m5","HvsC_m5"))

That yields one anova test for each gene. You could choose any pair of contrasts at the same time to get the same test results.

There are many equivalent ways to setup the design matrix and contrasts for the same purpose, but this way is an easy as any and has the advantage of being explicit.

ADD REPLY • link 7 months ago Gordon Smyth 50k

0

Entering edit mode

Hi Gordon,

I was hoping for clarification on your comment, I don't understand what is happening exactly in your script above.

Does this mean there is an anova test comparing the mean expression of each gene for each contrast specified (LvsC_m5 and HvsC_m5)? So when you include two or more coefficients in the topTable function, an anova is conducted for each gene between each coefficient?

I'll provide an example of what I'm trying to figure out: I am trying to understand if there is a temporal and treatment effect in the high treatment relative to the control on day 30 vs day 5. I put the two contrasts in the topTable below, but I just don't know if this format is the correct one to answer my question.

mm_b <- model.matrix(~0 + group_b) 

cmatrix_b <- makeContrasts(

  #Control across time 
  C15vsC_5 = C.15 - C._5,
  C30vsC_5 = C.30 - C._5,
  C30vsC15 = C.30 - C.15,

  #low across time
  L15vsL_5 = L.15 - L._5,
  L30vsL_5 = L.30 - L._5,
  L30vsL15 = L.30 - L.15,

  #high across time 
  H15vsH_5 = H.15 - H._5,
  H30vsH_5 = H.30 - H._5,
  H30vsH15 = H.30 - H.15,

   levels = mm_b
)

fit_b <- lmFit(log_mucus_cL_b, mm_b, weights = array_correct) 

fit_b <- contrasts.fit(fit_b, contrasts = cmatrix_b)

fit_b <- eBayes(fit_b, trend = TRUE)


d30vsd5 <- topTable(fit_b, coef= c("H30vsH_5","C30vsC_5"), number = nrow(fit_b), adjust.method = "fdr")

So, if my understand is correct, the results of "d30vsd5" show the top DE genes when comparing both treatment (high vs control), and time (day 30 vs day 5)?

Cheers.

ADD REPLY • link 5 months ago rober204 • 0

0

Entering edit mode

Does this mean there is an anova test comparing the mean expression of each gene for each contrast specified (LvsC_m5 and HvsC_m5)? So when you include two or more coefficients in the topTable function, an anova is conducted for each gene between each coefficient?

No, that isn't what "anova" means. anova means that a single test is done for all possible comparisons between the three treatment groups. This is described in the limma documentation.

ADD REPLY • link 5 months ago Gordon Smyth 50k

0

Entering edit mode

In my example I provided in the most recent comment (d30vsd5), is the result of this topTable showing the top DE genes from both the H30vsH_5 and C30vsC_5 contrasts? If not, what is the topTable showing?

Cheers

ADD REPLY • link 4 months ago rober204 • 0