**20**wrote:

Dear Community,

I’m very new to the limma statistical methodology, and especially the construction and understanding of design matrix. In detail, I tried to interpret with some 'fake' data in R the example of multifactorial design in the example of 9.7 section page 48. Thus, I created a matrix below, and the 2 conditions like the example in the tutorial:

> set.seed(81) > mat <- matrix(data=sample(1:100), nrow=10, ncol=10) > head(mat) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 16 40 20 84 13 90 72 82 52 14 [2,] 91 9 29 47 80 21 34 15 87 74 [3,] 68 18 67 46 17 94 63 93 92 70 [4,] 1 48 83 77 33 2 81 97 8 71 [5,] 78 36 86 6 58 73 49 85 61 38 [6,] 98 89 56 79 88 23 44 69 31 64 > pdat <- data.frame(Tissue=rep(c("A", "B"), times=5), Condition=rep(c("Diseased","Normal"),each=5)) > pdat Tissue Condition 1 A Diseased 2 B Diseased 3 A Diseased 4 B Diseased 5 A Diseased 6 B Normal 7 A Normal 8 B Normal 9 A Normal 10 B Normal new.set <- new("ExpressionSet", exprs=mat) > head(exprs(new.set)) 1 2 3 4 5 6 7 8 9 10 1 16 40 20 84 13 90 72 82 52 14 2 91 9 29 47 80 21 34 15 87 74 3 68 18 67 46 17 94 63 93 92 70 4 1 48 83 77 33 2 81 97 8 71 5 78 36 86 6 58 73 49 85 61 38 6 98 89 56 79 88 23 44 69 31 64 > pData(new.set) <-pdat > head(pData(new.set)) Tissue Condition 1 A Diseased 2 B Diseased 3 A Diseased 4 B Diseased 5 A Diseased 6 B Normal > pairs <- factor(rep(1:5, each=2)) > Treat <- factor(paste(new.set$Condition, new.set$Tissue, sep=".")) > Treat [1] Diseased.A Diseased.B Diseased.A Diseased.B Diseased.A Normal.B Normal.A Normal.B [9] Normal.A Normal.B Levels: Diseased.A Diseased.B Normal.A Normal.B > design <- model.matrix(~0 +Treat) > colnames(design) [1] "TreatDiseased.A" "TreatDiseased.B" "TreatNormal.A" "TreatNormal.B" > design TreatDiseased.A TreatDiseased.B TreatNormal.A TreatNormal.B 1 1 0 0 0 2 0 1 0 0 3 1 0 0 0 4 0 1 0 0 5 1 0 0 0 6 0 0 0 1 7 0 0 1 0 8 0 0 0 1 9 0 0 1 0 10 0 0 0 1 attr(,"assign") [1] 1 1 1 1 attr(,"contrasts") attr(,"contrasts")$Treat [1] "contr.treatment" > colnames(design) <- levels(Treat) > dupcor <- duplicateCorrelation(new.set, design, block=pairs) > fit <- lmFit(new.set, design, block=pairs, correlation=dupcor$consensus) > cm <- makeContrasts(Disease.A=Diseased.A-Normal.A , levels=design) > fit2 <- contrasts.fit(fit, cm) > fit3 <- eBayes(fit2, trend=TRUE) > top2 <- topTable(fit3, coef=1, number=nrow(fit3), adjust.method="fdr", sort.by="none") > head(top2) logFC AveExpr t P.Value adj.P.Val B 1 -49.677974 48.3 -1.97576747 0.07300290 0.3650145 -4.562943 2 8.592188 48.7 0.30626035 0.76495236 0.8815297 -4.608752 3 -28.543152 62.8 -1.44092558 0.17662419 0.5887473 -4.581712 4 2.134082 50.1 0.06339996 0.95055105 0.9505510 -4.610196 5 17.663417 57.0 0.79132671 0.44497885 0.7111324 -4.600623 6 47.382520 64.1 2.68642944 0.02065864 0.2065864 -4.538251

So, my main questions are the following:

- If my samples are paired(like the tutorial in the limma users guide), I would proceed as the example using duplicate correlation. Thus, what is the main difference, from including the pairs into the design matrix ??
- If I would like to move by including the pairs into the design matrix(maybe not correct but to correctly understand the approach):

# alternative methodology

> f <- relevel(Treat, ref="Normal.A") > design1 <- model.matrix(~0 + pairs + f) > design1 pairs1 pairs2 pairs3 pairs4 pairs5 fDiseased.A fDiseased.B fNormal.B 1 1 0 0 0 0 1 0 0 2 1 0 0 0 0 0 1 0 3 0 1 0 0 0 1 0 0 4 0 1 0 0 0 0 1 0 5 0 0 1 0 0 1 0 0 6 0 0 1 0 0 0 0 1 7 0 0 0 1 0 0 0 0 8 0 0 0 1 0 0 0 1 9 0 0 0 0 1 0 0 0 10 0 0 0 0 1 0 0 1 attr(,"assign") [1] 1 1 1 1 1 2 2 2 attr(,"contrasts") attr(,"contrasts")$pairs [1] "contr.treatment" attr(,"contrasts")$f [1] "contr.treatment" > fit <- lmFit(new.set, design1) > fit2 <- eBayes(fit, trend=TRUE)

....

So, the first 5 coefficients named pairs are the blocking factors for each “object”—but what the coefficients fDiseasedA, fDiseasedB represent exactly?

I mean, for instance, fDiseasedA represents the average log2 fold change between groups Diseased A and Normal A ?

Moreover, why the level "Normal.B" is missing from the design1?

Or it depends on the first level of the f factor ? when I leave it as default, has:

> Treat [1] Diseased.A Diseased.B Diseased.A Diseased.B Diseased.A Normal.B Normal.A Normal.B [9] Normal.A Normal.B Levels: Diseased.A Diseased.B Normal.A Normal.B

Or I have to somehow relevel my Treat factor, in order to have the samed resulted log2 fold change for Diseased.A –Normal.A, as in the first defined difference in cm?

My final question is, if I want to have the same returned log2 fold change of Diseased.A-Normal.A with duplicate correlation & contrasts.fit from the first design matrix, how I should proceed with the second approach—that is when I include the pairs into the model.matrix(second design matrix)?

I can understand that the statistics would change, but the log2 fold change of the same compared groups would be the same, right?

Any explanations or feedback on these questions would be very helpful, as I'm a beginner in analyzing microarray data with limma, and I would like firstly to understand important aspects of formulating various comparisons.

Thank you,

Constantinos