Question

Question on constructing design matrix and defining contrasts for analysis

0

Entering edit mode

mohammedtoufiq91 ▴ 10

@mohammedtoufiq91-17679

Last seen 3 months ago

United States

Hi,

I am trying to construct a design matrix and makeContrasts similarly like we utilize in limma and other RNA-Seq analysis packages like DESeq2 and EdgeR for group comparison analysis. I would like to use the design matrix in WGCNA analysis for trait association to modules and want to exactly replicate the matrix exactly makeContrasts in limma (see below). The data has 4 donors, and consists of paired samples. Referring to the example of EdgeR documentation, "Comparisons both between and within subjects", I created a similar model (Design_method_1), in addition, created another one similar to makeContrasts (Design_method_2). I am not sure which i right one, can somebody suggest which design method seems to be appropriate? I am providing the examples below for reference.

Thank you,

Mohammed

makeContrasts in limma

Cont.matrix <- makeContrasts(
  "mock.B.vs.A" = mock_B - mock_A,
  "Hi.B.vs.A" = Hi_B - Hi_A,
  "Hi.B.vs.mock.B" =  Hi_B - mock_B,
  "Hi.A.vs.mock.A" = Hi_A - mock_A,
  "Hi.B.vs.mock.B_VS_Hi.A.vs.mock.A" = (Hi_B - mock_B) - (Hi_A - mock_A),
  levels=design)

Design_method_1

#>       Samples idD1 idD2 idD3 idD4 mock.B.vs.A Hi.B.vs.A Hi.B.vs.mock.B
#> 1  D1_mock.B1    1    0    0    0           1         0              0
#> 2  D2_mock.B1    0    0    0    1           1         0              0
#> 3  D3_mock.B1    0    1    0    0           0         0              0
#> 4  D4_mock.B1    0    0    1    0           0         0              0
#> 5    D1_Hi.B1    1    0    0    0           0         1              1
#> 6    D2_Hi.B1    0    0    0    1           0         1              1
#> 7    D3_Hi.B1    0    1    0    0           0         0              0
#> 8    D4_Hi.B1    0    0    1    0           0         0              0
#> 9  D1_mock.B5    1    0    0    0           1         0              0
#> 10 D2_mock.B5    0    0    0    1           1         0              0
#> 11 D3_mock.B5    0    1    0    0           0         0              0
#> 12 D4_mock.B5    0    0    1    0           0         0              0
#> 13   D1_Hi.B5    1    0    0    0           0         1              1
#> 14   D2_Hi.B5    0    0    0    1           0         1              1
#> 15   D3_Hi.B5    0    1    0    0           0         0              0
#> 16   D4_Hi.B5    0    0    1    0           0         0              0
#>    Hi.A.vs.mock.A Hi.B.vs.mock.B_VS_Hi.A.vs.mock.A
#> 1               0                                0
#> 2               0                                0
#> 3               0                                0
#> 4               0                                0
#> 5               0                                1
#> 6               0                                1
#> 7               1                                1
#> 8               1                                1
#> 9               0                                0
#> 10              0                                0
#> 11              0                                0
#> 12              0                                0
#> 13              0                                1
#> 14              0                                1
#> 15              1                                1
#> 16              1                                1

Design_method_2

#>       Samples idD1 idD2 idD3 idD4 mock.B.vs.A Hi.B.vs.A Hi.B.vs.mock.B
#> 1  D1_mock.B1    1    0    0    0           1         0             -1
#> 2  D2_mock.B1    0    0    0    1           1         0             -1
#> 3  D3_mock.B1    0    1    0    0          -1         0              0
#> 4  D4_mock.B1    0    0    1    0          -1         0              0
#> 5    D1_Hi.B1    1    0    0    0           0         1              1
#> 6    D2_Hi.B1    0    0    0    1           0         1              1
#> 7    D3_Hi.B1    0    1    0    0           0        -1              0
#> 8    D4_Hi.B1    0    0    1    0           0        -1              0
#> 9  D1_mock.B5    1    0    0    0           1         0             -1
#> 10 D2_mock.B5    0    0    0    1           1         0             -1
#> 11 D3_mock.B5    0    1    0    0          -1         0              0
#> 12 D4_mock.B5    0    0    1    0          -1         0              0
#> 13   D1_Hi.B5    1    0    0    0           0         1              1
#> 14   D2_Hi.B5    0    0    0    1           0         1              1
#> 15   D3_Hi.B5    0    1    0    0           0        -1              0
#> 16   D4_Hi.B5    0    0    1    0           0        -1              0
#>    Hi.A.vs.mock.A Hi.B.vs.mock.B_VS_Hi.A.vs.mock.A
#> 1               0                               -1
#> 2               0                               -1
#> 3              -1                                1
#> 4              -1                                1
#> 5               0                                1
#> 6               0                                1
#> 7               1                               -1
#> 8               1                               -1
#> 9               0                               -1
#> 10              0                               -1
#> 11             -1                                1
#> 12             -1                                1
#> 13              0                                1
#> 14              0                                1
#> 15              1                               -1
#> 16              1                               -1

^{Created on 2023-05-30 with [reprex v2.0.2](https://reprex.tidyverse.org)}

wgcna DESeq2 edgeR limma design • 1.3k views

ADD COMMENT • link updated 22 months ago by Gordon Smyth 52k • written 22 months ago by mohammedtoufiq91 ▴ 10

Gordon Smyth · Answer 1 · 2023-05-31

1

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 1 hour ago

WEHI, Melbourne, Australia

Sorry, I am not able to follow your question. I don't understand the design of your experiment -- you would need to provide the targets frame. Neither of the design matrices look at all correct -- they seem to have more treatment columns than there are treatments, which naturally can't be correct.

ADD COMMENT • link 22 months ago Gordon Smyth 52k

0

Entering edit mode

Gordon Smyth thank you, yes, please find the targets frame below. Basically, I want to create a design matrix and binarize by 0 and 1 to be used in WGCNA analysis to compare module between/within groups

dput(targetframe)
structure(list(Donor = c("D1", "D2", "D3", "D4", "D1", "D2", 
                         "D3", "D4", "D1", "D2", "D3", "D4", "D1", "D2", "D3", "D4"), 
               Treatment = c("mock", "mock", "mock", "mock", "Hi", "Hi", 
                             "Hi", "Hi", "mock", "mock", "mock", "mock", "Hi", "Hi", "Hi", 
                             "Hi"),
                Category = c("B", "B", "A", "A", "B", "B", "A", "A", 
                             "B", "B", "A", "A", "B", "B", "A", "A"), 
                Treatment_Category = c("mock_B", "mock_B", "mock_A", "mock_A", "Hi_B", "Hi_B", "Hi_A", "Hi_A", 
                             "mock_B", "mock_B", "mock_A", "mock_A", "Hi_B", "Hi_B", "Hi_A", "Hi_A")),
                class = "data.frame", row.names = c("D1_mock.B1", 
                            "D2_mock.B1", "D3_mock.B1", "D4_mock.B1", "D1_Hi.B1", "D2_Hi.B1", 
                            "D3_Hi.B1", "D4_Hi.B1", "D1_mock.B5", "D2_mock.B5", "D3_mock.B5", 
                            "D4_mock.B5", "D1_Hi.B5", "D2_Hi.B5", "D3_Hi.B5", "D4_Hi.B5"))
#>            Donor Treatment Category Treatment_Category
#> D1_mock.B1    D1      mock        B             mock_B
#> D2_mock.B1    D2      mock        B             mock_B
#> D3_mock.B1    D3      mock        A             mock_A
#> D4_mock.B1    D4      mock        A             mock_A
#> D1_Hi.B1      D1        Hi        B               Hi_B
#> D2_Hi.B1      D2        Hi        B               Hi_B
#> D3_Hi.B1      D3        Hi        A               Hi_A
#> D4_Hi.B1      D4        Hi        A               Hi_A
#> D1_mock.B5    D1      mock        B             mock_B
#> D2_mock.B5    D2      mock        B             mock_B
#> D3_mock.B5    D3      mock        A             mock_A
#> D4_mock.B5    D4      mock        A             mock_A
#> D1_Hi.B5      D1        Hi        B               Hi_B
#> D2_Hi.B5      D2        Hi        B               Hi_B
#> D3_Hi.B5      D3        Hi        A               Hi_A
#> D4_Hi.B5      D4        Hi        A               Hi_A

Earlier, I partially referred edgeR documentation to create design_1 (see below example):

## Create a design using model.matrix

design <- model.matrix(~0+Donor.id)

## To find genes responding to the category B in mock treatment:
design <- cbind(design, mock.B.vs.A= Treatment=="mock" & Category=="B")

## To find genes responding to the category B in hi-dose treatment:
design <- cbind(design, hi.B.vs.A= Treatment=="Hi" & Category=="B")
etc.,

ADD REPLY • link updated 22 months ago by Gordon Smyth 52k • written 22 months ago by mohammedtoufiq91 ▴ 10

1

Entering edit mode

Basically, I want to create a design matrix and binarize by 0 and 1 to be used in WGCNA analysis to compare module between/within groups

Sorry, that makes little sense to me. You can't pretend that the experiment consists of independent groups, because it doesn't. It is a paired experiment with two (A and B) donor groups.

The edgeR User's Guide Section 3.5 "Comparisons both between and within subjects" tells you exactly how to analyse this experiment. It tells you exactly which contrasts can be formed and which cannot.

Although you don't say, I guess that you have done a limma analysis setting donor as a random effect using duplicateCorrelation. You cannot reproduce this analysis in edgeR. In edgeR, it is impossible to make direct comparisons between A and B treatments. It is only possible to make direct comparisons between treatments given to the same donors, i.e., hi.A vs mock.A and hi.B vs mock.B.

I partially referred edgeR documentation to create design_1 (see below example): design <- model.matrix(~0+Donor.id), design <- cbind(design, mock.B.vs.A= Treatment=="mock" & Category=="B")

You have not interpreted the edgeR documentation correctly. The coefficient you have formed is actually mock.B.vs.hi.B rather than mock.B.vs.A.

design <- cbind(design, hi.B.vs.A= Treatment=="Hi" & Category=="B")

This coefficient is not hi.B.vs.A but hi.B.vs.mock.B. You cannot define both of these contrasts in the same design matrix because they are 100% correlated with one another.

You need to follow the edgeR User's Guide documentation exactly. Or alternatively, use limma.

ADD REPLY • link 22 months ago Gordon Smyth 52k

0

Entering edit mode

Gordon Smyth Yes, I realized while re-running through the WGCNA workflow and forget to account for the paired experiment. For general gene level analysis I was able to use mix of edgeR for pre-processing and normalization and limma for differential analysis via defining makeContrasts and was quite promising.

I was curious to try WGCNA analysis to identify module sets, and significance of modules to phenotypic traits/groups, however, I could not find a particular case for paired experiment set-up to associate module set to phenotypic traits. I could find a more independent group analysis examples and was carried away by them and started constructing design matrices arbitrarily. Apologies for the confusion caused.

Perhaps as a workaround, I could instead use limma for differential analysis of identified module set and use the similar set-up as makeContrasts. Maybe this I should try [Reference].

ADD REPLY • link 22 months ago mohammedtoufiq91 ▴ 10

1

Entering edit mode

PS. Just as aside, using dput to show a data.frame is unnecessary overkill. Just typing targetsframe at the prompt would do the job.