Hi all,
I'm having a hard time figuring out where my contrast matrix has gone wrong. I'm interested in testing interactions for a multi-factorial RNA-seq project. The project is ultimately a 2x4x4 factorial design, but for the sake of simplicity and making sure I'm using the package appropriately, I've subsetted my data and am analyzing as a 2x4 factorial design. I'm interested in the interaction of parent treatment (group, levels: Control, Exposed) and development (stage, levels: 19, 28, 35, HA) on embryonic gene expression, and am trying to define a contrast that defines the interaction across all 4 stages of development.
> head(z$samples) files group lib.size norm.factors 0017ARSC00281 0017ARSC00281_htseq.count_v2 C 1501758 1.2739166 0018ARSC00351 0018ARSC00351_htseq.count_v2 C 1247792 1.0500738 0021ARSE00281 0021ARSE00281_htseq.count_v2 E 1041493 1.1739898 0022ARSE00351 0022ARSE00351_htseq.count_v2 E 846621 0.8519006 0025ARSC00191 0025ARSC00191_htseq.count_v2 C 582379 0.7811282 0026ARSE00191 0026ARSE00191_htseq.count_v2 E 1122251 1.0042548 parenttreat stage 0017ARSC00281 C 28 0018ARSC00351 C 35 0021ARSE00281 E 28 0022ARSE00351 E 35 0025ARSC00191 C 19 0026ARSE00191 E 19
Based on the limma user guide and various posts on this forum, I've set up and compared two different design matrices: 1) Classic factorial design (y~factor1*factor2), and 2) analyzing as a single factor (y~combined). The number of DEG for the interaction is different between the two methods, and leads me to think that my contrasts are improperly defined.
#Classic factorial design:
design1 <- model.matrix(~group*stage, data=z$samples) colnames(design1)[1] <- "Intercept" colnames(design1) <- gsub("group", "", colnames(design1)) colnames(design1) <- gsub(":", ".", colnames(design1)) colnames(design1) <- gsub("-", ".", colnames(design1))
> design1 Intercept E stage28 stage35 stageHA E.stage28 E.stage35 E.stageHA 0017ARSC00281 1 0 1 0 0 0 0 0 0018ARSC00351 1 0 0 1 0 0 0 0 0021ARSE00281 1 1 1 0 0 1 0 0 0022ARSE00351 1 1 0 1 0 0 1 0 0025ARSC00191 1 0 0 0 0 0 0 0 0026ARSE00191 1 1 0 0 0 0 0 0 0037ARSC00282 1 0 1 0 0 0 0 0 0038ARSE00282 1 1 1 0 0 1 0 0 0041ARSC00192 1 0 0 0 0 0 0 0 0043ARSE00192 1 1 0 0 0 0 0 0 0053ARSC00352 1 0 0 1 0 0 0 0 0055ARSE00352 1 1 0 1 0 0 1 0 0057ARSC00193 1 0 0 0 0 0 0 0 0058ARSC00194 1 0 0 0 0 0 0 0 0059ARSE00193 1 1 0 0 0 0 0 0 0060ARSE00194 1 1 0 0 0 0 0 0 0061ARSC00283 1 0 1 0 0 0 0 0 0062ARSC00284 1 0 1 0 0 0 0 0 0063ARSE00283 1 1 1 0 0 1 0 0 0089ARSC00195 1 0 0 0 0 0 0 0 0090ARSE00195 1 1 0 0 0 0 0 0 0091ARSC00285 1 0 1 0 0 0 0 0 0092ARSE00285 1 1 1 0 0 1 0 0 0093ARSC00353 1 0 0 1 0 0 0 0 0094ARSC00354 1 0 0 1 0 0 0 0 0095ARSE00353 1 1 0 1 0 0 1 0 0096ARSE00354 1 1 0 1 0 0 1 0 0097ARSC00HA1 1 0 0 0 1 0 0 0 0098ARSC00HA2 1 0 0 0 1 0 0 0 0099ARSE00HA1 1 1 0 0 1 0 0 1 0100ARSE00HA2 1 1 0 0 1 0 0 1 0137ARSE00286 1 1 1 0 0 1 0 0 0138ARSC00355 1 0 0 1 0 0 0 0 0139ARSE00355 1 1 0 1 0 0 1 0 0140ARSC00HA3 1 0 0 0 1 0 0 0 0141ARSC00HA4 1 0 0 0 1 0 0 0 0142ARSC00HA5 1 0 0 0 1 0 0 0 0143ARSC00HA3 1 0 0 0 1 0 0 0 0144ARSE00HA4 1 1 0 0 1 0 0 1 0176ARSC00196 1 0 0 0 0 0 0 0 attr(,"assign") [1] 0 1 2 2 2 3 3 3 attr(,"contrasts") attr(,"contrasts")$group [1] "contr.treatment" attr(,"contrasts")$stage [1] "contr.treatment" zq1 <- voomWithQualityWeights(z, design1, plot=TRUE) fit1 <- lmFit(zq1,design1) fit1 <- eBayes(fit1) plotSA(fit1, main="Final model: mean-variance trend") results1 <- decideTests(fit1, adjust.method="BH") > summary(results1) Intercept E stage28 stage35 stageHA E.stage28 E.stage35 E.stageHA -1 0 2 1095 3359 2800 0 21 12 0 8762 23382 20023 13256 15012 23426 23403 23412 1 14664 42 2308 6811 5614 0 2 2 ## pt_devixn ixn1 <- contrasts.fit(fit1, coef=6:8) ixn1 <- eBayes(ixn1) ixn1results <- decideTests(ixn1) summary(ixn1results) E.stage28 E.stage35 E.stageHA -1 0 21 12 0 23426 23403 23412 1 0 2 2
## Analyzing as a single factor
combi <- factor(paste(z$samples$group, z$samples$stage, sep=".")) cbind(z$samples, combi=combi) design2 <- model.matrix(~combi, data=z$samples) colnames(design2) <- levels(combi) colnames(design2) <- gsub("combi", "", colnames(design2)) colnames(design2)[1] <- "Intercept" > design2 Intercept C.28 C.35 C.HA E.19 E.28 E.35 E.HA 0017ARSC00281 1 1 0 0 0 0 0 0 0018ARSC00351 1 0 1 0 0 0 0 0 0021ARSE00281 1 0 0 0 0 1 0 0 0022ARSE00351 1 0 0 0 0 0 1 0 0025ARSC00191 1 0 0 0 0 0 0 0 0026ARSE00191 1 0 0 0 1 0 0 0 0037ARSC00282 1 1 0 0 0 0 0 0 0038ARSE00282 1 0 0 0 0 1 0 0 0041ARSC00192 1 0 0 0 0 0 0 0 0043ARSE00192 1 0 0 0 1 0 0 0 0053ARSC00352 1 0 1 0 0 0 0 0 0055ARSE00352 1 0 0 0 0 0 1 0 0057ARSC00193 1 0 0 0 0 0 0 0 0058ARSC00194 1 0 0 0 0 0 0 0 0059ARSE00193 1 0 0 0 1 0 0 0 0060ARSE00194 1 0 0 0 1 0 0 0 0061ARSC00283 1 1 0 0 0 0 0 0 0062ARSC00284 1 1 0 0 0 0 0 0 0063ARSE00283 1 0 0 0 0 1 0 0 0089ARSC00195 1 0 0 0 0 0 0 0 0090ARSE00195 1 0 0 0 1 0 0 0 0091ARSC00285 1 1 0 0 0 0 0 0 0092ARSE00285 1 0 0 0 0 1 0 0 0093ARSC00353 1 0 1 0 0 0 0 0 0094ARSC00354 1 0 1 0 0 0 0 0 0095ARSE00353 1 0 0 0 0 0 1 0 0096ARSE00354 1 0 0 0 0 0 1 0 0097ARSC00HA1 1 0 0 1 0 0 0 0 0098ARSC00HA2 1 0 0 1 0 0 0 0 0099ARSE00HA1 1 0 0 0 0 0 0 1 0100ARSE00HA2 1 0 0 0 0 0 0 1 0137ARSE00286 1 0 0 0 0 1 0 0 0138ARSC00355 1 0 1 0 0 0 0 0 0139ARSE00355 1 0 0 0 0 0 1 0 0140ARSC00HA3 1 0 0 1 0 0 0 0 0141ARSC00HA4 1 0 0 1 0 0 0 0 0142ARSC00HA5 1 0 0 1 0 0 0 0 0143ARSC00HA3 1 0 0 1 0 0 0 0 0144ARSE00HA4 1 0 0 0 0 0 0 1 0176ARSC00196 1 0 0 0 0 0 0 0 attr(,"assign") [1] 0 1 1 1 1 1 1 1 attr(,"contrasts") attr(,"contrasts")$combi [1] "contr.treatment" zq2 <- voomWithQualityWeights(z, design2, plot=FALSE) fit2 <- lmFit(zq2,design2) fit2 <- eBayes(fit2) results2 <- decideTests(fit2, adjust.method="BH") summary(results2) Intercept C.28 C.35 C.HA E.19 E.28 E.35 E.HA -1 0 1095 3359 2800 2 1058 2738 2160 0 8762 20023 13256 15012 23382 20109 15744 16948 1 14664 2308 6811 5614 42 2259 4944 4318
ixn.con <- makeContrasts(ixn=(Intercept-C.28-C.35-C.HA)-(E.19-E.28-E.35-E.HA), levels=design2)
ixn2 <- contrasts.fit(fit2, ixn.con)
ixn2 <- eBayes(ixn2)
ixn2results <- decideTests(ixn2)
summary(ixn2results)
ixn
-1 90
0 15483
1 7853
Model 1 and its contrast outputs ~25 DEG for the interaction, which makes more biological sense to me than the ~8000 DEG from Model 2. I realize that's not the best barometer for assessing the output, so some guidance on how to best understand these coefficients are also appreciated. Most importantly, I'd like to figure out the appropriate contrasts for Model 2 because it'll be much easier to interpret coefficients when adding the third factor of interest.
I apologize in advance if I've missed something obvious in the manual or from the forum posts. Any feedback on where I'm going wrong is greatly appreciated.
Best,
Jane
Thanks for the clarification, Aaron. It looks like the separate up/down statistics from decideTests look slightly different between the two approaches, as well as the results of topTable. Why do these differences appear between the two approaches? Thanks again for the input!
This seems to be caused by a rather esoteric problem with the approximation of weights in
contrasts.fit
, see the documentation of that function for more details. Usually the approximation is good enough that you don't see any differences in the DE results with the different parametrizations. This isn't the case here, but even so, the differences in the number of DE genes (and probably their p-values) is probably small enough to be ignored.