Question

limma random effects and multi level experiments

0

Entering edit mode

David R ▴ 90

@david-rengel-6321

Last seen 4 months ago

European Union

Hi,

I have a question regarding limma-driven microarray analysis.

I have a total of 24 arrays. My experimental design consists in two fixed factors, i.e. "matrix" (F or G) and "dim" (2 or 3), and what I consider a Random Factor, i.e. "donor". There are six donors, so that each donor has provided a matrix * dim treatment. That is: 4 possible combinations matrix * dim * 6 donors = 24 arrays.

I run the following code:

    eset <-new ("ExpressionSet", exprs = as.matrix(data.norm))
    treat <- factor (substr(colnames(data.norm),1,2))
    design <- model.matrix (~ 0 + treat)
    colnames(design) <- c("F2","F3","G2","G3")
    corfit <- duplicateCorrelation (eset, design, block = factor(substr(colnames(data.norm),4,6)))
    corfit$consensus
    fit <- lmFit(eset, design, block = factor (substr(colnames(data.norm),4,6)), correlation = corfit$consensus)

    cont.matrix <- makeContrasts(
      F3.F2 = F3 - F2,
      G3.G2 = G3 - G2,
      G2.F2 = G2 - F2,
      G3.F3 = G3 - F3,
      inter = (G3 - G2) - (F3 - F2),
      levels = design
    )

fit2 <- contrasts.fit (fit, cont.matrix)
eb <- eBayes(fit2)

My main question is: is it right to run such code? I say this because on page 50 of the limma user's guide, it says regarding multi level designs that "Since we need to make comparaisons BOTH within AND between subjects, it is necessary to treat Patient as a Random Effect" (capital letters are mine). This is because in the example given in the document, patients were either "Normal" or "Diseased", but not both. In my design, however, a single donor covers all fixed effect combinations. I still consider my donors as random, but I wonder if the limma code I used is adapted.

A subsidiary question would deal with the fact that when running lmer models on every gene some of them present singular fits, that is over fitting. This is not visible when running the code shown above. I am more used to run lme or lmer models out of the transcriptomics studies, where thousands of models, i.e. one per gene, are fitted.

Thanks for your help.

Best, David

limma multi-level random effect • 3.1k views

ADD COMMENT • link 5.0 years ago David R ▴ 90

0

Entering edit mode

Hi David,

To me, it seems like you were using sample ID (i.e. column names) instead of the donor ID for the blocking variable in the duplicateCorrelation. Perhaps I am wrong. Other than that, I do not see any problem with your code.

ADD REPLY • link 5.0 years ago mikhael.manurung ▴ 270

0

Entering edit mode

Set up a metadata table. Otherwise we're just guessing what your code actually does.

ADD REPLY • link 5.0 years ago Aaron Lun ★ 28k

score 0 · Answer 1 · 2019-04-22

0

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 6 hours ago

WEHI, Melbourne, Australia

The whole purpose of treating a blocking variable as "random" in a linear model is to recover inter-block information about the treatments. For a balanced design like yours, in which every donor receives every treatment, there is no inter-block information to recover because all the blocks receive the same treatments.

Hence, for your design, there is no point is running duplicateCorrelation, or lmer for that matter. Just include donor as an additive factor in the design matrix. Then the treatments will be compared within donors. Simple and robust.

The situation is very much analogous to paired t-tests, just with four treatments per donor instead of two.

ADD COMMENT • link 5.0 years ago Gordon Smyth 50k

0

Entering edit mode

Hi Gordon, Thanks again for your suggestion. I have tried to implement it but I came across with this issue: the "reference" level of my treatment disappears under “donor + treatment” design. It obviously stays if I implement a “treatment + donor” design. I mean "reference" level because I do not want to make any reference level explicit. This is why I had chosen a design matrix with no intercept. Please find here below my code:

eset <- new ("ExpressionSet", exprs = as.matrix(data.norm))

I may choose one of these three designs:

design_intercept <-  model.matrix (~ design.exp$donor + design.exp$treat)
design_dt <- model.matrix (~ 0 + design.exp$donor + design.exp$treat)
design_td <- model.matrix (~ 0 + design.exp$treat + design.exp$donor)

If I chose "design_dt", I'd continue as follows:

colnames(design_dt) <- sub("design.exp.", "", colnames(design_dt))

fit <- lmFit(eset, design)

cont.matrix <- makeContrasts(
  F3.F2 = treatF3 - treatF2,
  G3.G2 = treatG3 - treatG2,
  G2.F2 = treatG2 - treatF2,
  G3.F3 = treatG3 - treatF3,
  inter = (treatG3 - treatG2) - (treatF3 - treatF2),
  levels = design_dt)

But treatF2 does not exist as such anymore, and I do not wish F2 (nor any other level) to be the reference. Any hint on this would be much appreciated. Best,

David

ADD REPLY • link 5.0 years ago David R ▴ 90

0

Entering edit mode

...Continued from previous comment (above)

In others termes: it seems that by doing donor + treat, there are too many parameters to be estimated and one of them is left out. I hav obviously found myself in this scenario before but I do not recall it in a situation where I need to build contrasts without a reference. Is there any way around it? Thanks again. David

ADD REPLY • link 5.0 years ago David R ▴ 90

1

Entering edit mode

Using a treatment contrasts parameterization you will always have a baseline in this model. Generally you would do

> df <- data.frame(treat = rep(c("F2","F3","G2","G3"), 6), donor = gl(4,6))
> df
   treat donor
1     F2     1
2     F3     1
3     G2     1
4     G3     1
5     F2     1
6     F3     1
7     G2     2
8     G3     2
9     F2     2
10    F3     2
11    G2     2
12    G3     2
13    F2     3
14    F3     3
15    G2     3
16    G3     3
17    F2     3
18    F3     3
19    G2     4
20    G3     4
21    F2     4
22    F3     4
23    G2     4
24    G3     4
> model.matrix(~0+treat+donor, df)
   treatF2 treatF3 treatG2 treatG3 donor2 donor3 donor4
1        1       0       0       0      0      0      0
2        0       1       0       0      0      0      0
3        0       0       1       0      0      0      0
4        0       0       0       1      0      0      0
5        1       0       0       0      0      0      0
6        0       1       0       0      0      0      0
7        0       0       1       0      1      0      0
8        0       0       0       1      1      0      0
9        1       0       0       0      1      0      0
10       0       1       0       0      1      0      0
11       0       0       1       0      1      0      0
12       0       0       0       1      1      0      0
13       1       0       0       0      0      1      0
14       0       1       0       0      0      1      0
15       0       0       1       0      0      1      0
16       0       0       0       1      0      1      0
17       1       0       0       0      0      1      0
18       0       1       0       0      0      1      0
19       0       0       1       0      0      0      1
20       0       0       0       1      0      0      1
21       1       0       0       0      0      0      1
22       0       1       0       0      0      0      1
23       0       0       1       0      0      0      1
24       0       0       0       1      0      0      1

In which case the baseline is donor 1, and you can make comparisons between treatments to your heart's content.

ADD REPLY • link 5.0 years ago James W. MacDonald 65k

0

Entering edit mode

Thanks a lot James. That was very helpful. Including the gl function, of which I was unaware! Best. David

ADD REPLY • link 5.0 years ago David R ▴ 90

score 0 · Answer 2 · 2019-04-29

0

Entering edit mode

David R ▴ 90

@david-rengel-6321

Last seen 4 months ago

European Union

Thanks Mikhael, Aaron and Gordon for your kind replies. I apoligize for my late own reply. I will proceed as Gordon suggests. Best, David.

ADD COMMENT • link 5.0 years ago David R ▴ 90