Hi all,
I'm having a hard time figuring out where my contrast matrix has gone wrong. I'm interested in testing interactions for a multi-factorial RNA-seq project. The project is ultimately a 2x4x4 factorial design, but for the sake of simplicity and making sure I'm using the package appropriately, I've subsetted my data and am analyzing as a 2x4 factorial design. I'm interested in the interaction of parent treatment (group, levels: Control, Exposed) and development (stage, levels: 19, 28, 35, HA) on embryonic gene expression, and am trying to define a contrast that defines the interaction across all 4 stages of development.
> head(z$samples)
files group lib.size norm.factors
0017ARSC00281 0017ARSC00281_htseq.count_v2 C 1501758 1.2739166
0018ARSC00351 0018ARSC00351_htseq.count_v2 C 1247792 1.0500738
0021ARSE00281 0021ARSE00281_htseq.count_v2 E 1041493 1.1739898
0022ARSE00351 0022ARSE00351_htseq.count_v2 E 846621 0.8519006
0025ARSC00191 0025ARSC00191_htseq.count_v2 C 582379 0.7811282
0026ARSE00191 0026ARSE00191_htseq.count_v2 E 1122251 1.0042548
parenttreat stage
0017ARSC00281 C 28
0018ARSC00351 C 35
0021ARSE00281 E 28
0022ARSE00351 E 35
0025ARSC00191 C 19
0026ARSE00191 E 19
Based on the limma user guide and various posts on this forum, I've set up and compared two different design matrices: 1) Classic factorial design (y~factor1*factor2), and 2) analyzing as a single factor (y~combined). The number of DEG for the interaction is different between the two methods, and leads me to think that my contrasts are improperly defined.
#Classic factorial design:
design1 <- model.matrix(~group*stage, data=z$samples)
colnames(design1)[1] <- "Intercept"
colnames(design1) <- gsub("group", "", colnames(design1))
colnames(design1) <- gsub(":", ".", colnames(design1))
colnames(design1) <- gsub("-", ".", colnames(design1))
> design1
Intercept E stage28 stage35 stageHA E.stage28 E.stage35 E.stageHA
0017ARSC00281 1 0 1 0 0 0 0 0
0018ARSC00351 1 0 0 1 0 0 0 0
0021ARSE00281 1 1 1 0 0 1 0 0
0022ARSE00351 1 1 0 1 0 0 1 0
0025ARSC00191 1 0 0 0 0 0 0 0
0026ARSE00191 1 1 0 0 0 0 0 0
0037ARSC00282 1 0 1 0 0 0 0 0
0038ARSE00282 1 1 1 0 0 1 0 0
0041ARSC00192 1 0 0 0 0 0 0 0
0043ARSE00192 1 1 0 0 0 0 0 0
0053ARSC00352 1 0 0 1 0 0 0 0
0055ARSE00352 1 1 0 1 0 0 1 0
0057ARSC00193 1 0 0 0 0 0 0 0
0058ARSC00194 1 0 0 0 0 0 0 0
0059ARSE00193 1 1 0 0 0 0 0 0
0060ARSE00194 1 1 0 0 0 0 0 0
0061ARSC00283 1 0 1 0 0 0 0 0
0062ARSC00284 1 0 1 0 0 0 0 0
0063ARSE00283 1 1 1 0 0 1 0 0
0089ARSC00195 1 0 0 0 0 0 0 0
0090ARSE00195 1 1 0 0 0 0 0 0
0091ARSC00285 1 0 1 0 0 0 0 0
0092ARSE00285 1 1 1 0 0 1 0 0
0093ARSC00353 1 0 0 1 0 0 0 0
0094ARSC00354 1 0 0 1 0 0 0 0
0095ARSE00353 1 1 0 1 0 0 1 0
0096ARSE00354 1 1 0 1 0 0 1 0
0097ARSC00HA1 1 0 0 0 1 0 0 0
0098ARSC00HA2 1 0 0 0 1 0 0 0
0099ARSE00HA1 1 1 0 0 1 0 0 1
0100ARSE00HA2 1 1 0 0 1 0 0 1
0137ARSE00286 1 1 1 0 0 1 0 0
0138ARSC00355 1 0 0 1 0 0 0 0
0139ARSE00355 1 1 0 1 0 0 1 0
0140ARSC00HA3 1 0 0 0 1 0 0 0
0141ARSC00HA4 1 0 0 0 1 0 0 0
0142ARSC00HA5 1 0 0 0 1 0 0 0
0143ARSC00HA3 1 0 0 0 1 0 0 0
0144ARSE00HA4 1 1 0 0 1 0 0 1
0176ARSC00196 1 0 0 0 0 0 0 0
attr(,"assign")
[1] 0 1 2 2 2 3 3 3
attr(,"contrasts")
attr(,"contrasts")$group
[1] "contr.treatment"
attr(,"contrasts")$stage
[1] "contr.treatment"
zq1 <- voomWithQualityWeights(z, design1, plot=TRUE)
fit1 <- lmFit(zq1,design1)
fit1 <- eBayes(fit1)
plotSA(fit1, main="Final model: mean-variance trend")
results1 <- decideTests(fit1, adjust.method="BH")
> summary(results1)
Intercept E stage28 stage35 stageHA E.stage28 E.stage35 E.stageHA
-1 0 2 1095 3359 2800 0 21 12
0 8762 23382 20023 13256 15012 23426 23403 23412
1 14664 42 2308 6811 5614 0 2 2
## pt_devixn
ixn1 <- contrasts.fit(fit1, coef=6:8)
ixn1 <- eBayes(ixn1)
ixn1results <- decideTests(ixn1)
summary(ixn1results)
E.stage28 E.stage35 E.stageHA
-1 0 21 12
0 23426 23403 23412
1 0 2 2
## Analyzing as a single factor
combi <- factor(paste(z$samples$group, z$samples$stage, sep="."))
cbind(z$samples, combi=combi)
design2 <- model.matrix(~combi, data=z$samples)
colnames(design2) <- levels(combi)
colnames(design2) <- gsub("combi", "", colnames(design2))
colnames(design2)[1] <- "Intercept"
> design2
Intercept C.28 C.35 C.HA E.19 E.28 E.35 E.HA
0017ARSC00281 1 1 0 0 0 0 0 0
0018ARSC00351 1 0 1 0 0 0 0 0
0021ARSE00281 1 0 0 0 0 1 0 0
0022ARSE00351 1 0 0 0 0 0 1 0
0025ARSC00191 1 0 0 0 0 0 0 0
0026ARSE00191 1 0 0 0 1 0 0 0
0037ARSC00282 1 1 0 0 0 0 0 0
0038ARSE00282 1 0 0 0 0 1 0 0
0041ARSC00192 1 0 0 0 0 0 0 0
0043ARSE00192 1 0 0 0 1 0 0 0
0053ARSC00352 1 0 1 0 0 0 0 0
0055ARSE00352 1 0 0 0 0 0 1 0
0057ARSC00193 1 0 0 0 0 0 0 0
0058ARSC00194 1 0 0 0 0 0 0 0
0059ARSE00193 1 0 0 0 1 0 0 0
0060ARSE00194 1 0 0 0 1 0 0 0
0061ARSC00283 1 1 0 0 0 0 0 0
0062ARSC00284 1 1 0 0 0 0 0 0
0063ARSE00283 1 0 0 0 0 1 0 0
0089ARSC00195 1 0 0 0 0 0 0 0
0090ARSE00195 1 0 0 0 1 0 0 0
0091ARSC00285 1 1 0 0 0 0 0 0
0092ARSE00285 1 0 0 0 0 1 0 0
0093ARSC00353 1 0 1 0 0 0 0 0
0094ARSC00354 1 0 1 0 0 0 0 0
0095ARSE00353 1 0 0 0 0 0 1 0
0096ARSE00354 1 0 0 0 0 0 1 0
0097ARSC00HA1 1 0 0 1 0 0 0 0
0098ARSC00HA2 1 0 0 1 0 0 0 0
0099ARSE00HA1 1 0 0 0 0 0 0 1
0100ARSE00HA2 1 0 0 0 0 0 0 1
0137ARSE00286 1 0 0 0 0 1 0 0
0138ARSC00355 1 0 1 0 0 0 0 0
0139ARSE00355 1 0 0 0 0 0 1 0
0140ARSC00HA3 1 0 0 1 0 0 0 0
0141ARSC00HA4 1 0 0 1 0 0 0 0
0142ARSC00HA5 1 0 0 1 0 0 0 0
0143ARSC00HA3 1 0 0 1 0 0 0 0
0144ARSE00HA4 1 0 0 0 0 0 0 1
0176ARSC00196 1 0 0 0 0 0 0 0
attr(,"assign")
[1] 0 1 1 1 1 1 1 1
attr(,"contrasts")
attr(,"contrasts")$combi
[1] "contr.treatment"
zq2 <- voomWithQualityWeights(z, design2, plot=FALSE)
fit2 <- lmFit(zq2,design2)
fit2 <- eBayes(fit2)
results2 <- decideTests(fit2, adjust.method="BH")
summary(results2)
Intercept C.28 C.35 C.HA E.19 E.28 E.35 E.HA
-1 0 1095 3359 2800 2 1058 2738 2160
0 8762 20023 13256 15012 23382 20109 15744 16948
1 14664 2308 6811 5614 42 2259 4944 4318
ixn.con <- makeContrasts(ixn=(Intercept-C.28-C.35-C.HA)-(E.19-E.28-E.35-E.HA), levels=design2)
ixn2 <- contrasts.fit(fit2, ixn.con)
ixn2 <- eBayes(ixn2)
ixn2results <- decideTests(ixn2)
summary(ixn2results)
ixn
-1 90
0 15483
1 7853
Model 1 and its contrast outputs ~25 DEG for the interaction, which makes more biological sense to me than the ~8000 DEG from Model 2. I realize that's not the best barometer for assessing the output, so some guidance on how to best understand these coefficients are also appreciated. Most importantly, I'd like to figure out the appropriate contrasts for Model 2 because it'll be much easier to interpret coefficients when adding the third factor of interest.
I apologize in advance if I've missed something obvious in the manual or from the forum posts. Any feedback on where I'm going wrong is greatly appreciated.
Best,
Jane

Thanks for the clarification, Aaron. It looks like the separate up/down statistics from decideTests look slightly different between the two approaches, as well as the results of topTable. Why do these differences appear between the two approaches? Thanks again for the input!
ixn1 <- contrasts.fit(fit1, coef=6:8) ixn1 <- eBayes(ixn1) ixn1results <- decideTests(ixn1) con <- makeContrasts( (E.28 - C.28) - E.19, # equivalent to coef=6 in first design (E.35 - C.35) - E.19, # equivalent to coef=7 (E.HA - C.HA) - E.19, # equivalent to coef=8 levels=design2) ixn2 <- contrasts.fit(fit2, con) ixn2 <- eBayes(ixn2) ixn2results <- decideTests(ixn2) > summary(ixn1results) E.stage28 E.stage35 E.stageHA -1 0 21 12 0 23426 23403 23412 1 0 2 2 > summary(ixn2results) (E.28 - C.28) - E.19 (E.35 - C.35) - E.19 (E.HA - C.HA) - E.19 -1 0 7 11 0 23426 23417 23413 1 0 2 2 > topTable(ixn1, coef=NULL, p.value=0.05, number=Inf) E.stage28 E.stage35 E.stageHA AveExpr F P.Value 105927004 -1.3874870 -4.719365 -5.437884 2.360316 14.592496 3.218448e-07 105917938 -3.8249407 -3.825325 -3.949919 4.545534 13.454110 8.528391e-07 105917629 -0.3984093 4.076354 3.762134 2.662013 13.200461 1.064246e-06 105918650 -1.0450096 -1.816854 -5.731129 2.867703 11.790502 3.756017e-06 105933641 -2.0301397 -4.489063 -5.790309 2.764337 10.778847 9.594163e-06 105933824 -2.1884430 -3.473024 -6.345559 2.939039 10.698377 1.034997e-05 105934711 -2.3767461 -4.053166 -5.559814 3.600467 10.220027 1.630708e-05 105931270 -2.8159178 -3.998344 -5.300175 3.714562 10.110305 1.811641e-05 105918686 0.4689484 1.770305 -3.460255 3.649248 10.087677 1.851463e-05 105935213 -2.2799742 -4.038241 -5.443232 3.753408 9.972584 2.068461e-05 adj.P.Val 105927004 0.007539537 105917938 0.008310339 105917629 0.008310339 105918650 0.021997115 105933641 0.040409714 105933824 0.040409714 105934711 0.048191528 105931270 0.048191528 105918686 0.048191528 105935213 0.048455768 > topTable(ixn2, coef=NULL, p.value=0.05, number=Inf) X.E.28...C.28....E.19 X.E.35...C.35....E.19 X.E.HA...C.HA....E.19 105927004 -1.3874870 -4.719365 -5.4378843 105917629 -0.3984093 4.076354 3.7621337 105918686 0.4689484 1.770305 -3.4602551 105918650 -1.0450096 -1.816854 -5.7311285 105940307 -0.9256202 -3.509328 0.2412498 105933824 -2.1884430 -3.473024 -6.3455592 105933641 -2.0301397 -4.489063 -5.7903094 105938640 -0.7989726 -1.978063 -6.0888987 105917938 -3.8249407 -3.825325 -3.9499190 AveExpr F P.Value adj.P.Val 105927004 2.360316 14.66891 3.018007e-07 0.006075775 105917629 2.662013 14.03037 5.187207e-07 0.006075775 105918686 3.649248 12.28329 2.402906e-06 0.018763494 105918650 2.867703 11.73154 3.963942e-06 0.023214827 105940307 3.731491 11.46705 5.053676e-06 0.023677483 105933824 2.939039 10.73695 9.980305e-06 0.035137343 105933641 2.764337 10.68318 1.049950e-05 0.035137343 105938640 4.370644 10.43982 1.322212e-05 0.038717675 105917938 4.545534 10.20284 1.657764e-05 0.043149766This seems to be caused by a rather esoteric problem with the approximation of weights in
contrasts.fit, see the documentation of that function for more details. Usually the approximation is good enough that you don't see any differences in the DE results with the different parametrizations. This isn't the case here, but even so, the differences in the number of DE genes (and probably their p-values) is probably small enough to be ignored.