**20**wrote:

Hi fellow Limma fans,

I can't imagine that this is actually a bug, but is is a REALLY odd situation in which one contrast give me different results depending on how it is formulated.

If you (like me) have been assuming that it doesn't matter if you make your limma design matrix with an explicit contrast or an implicit one, well, check again—you may get different results!

(Note that these are NOT due to numeric under/overflow.)

See if you can solve the mystery... I couldn't!

Below is a** minimal R test case that relies on no external data **and generates a scatterplot.

require(limma); require(edgeR) # edgeR is for DGEList set.seed(1234) counts <- data.frame(matrix(as.integer(10*rexp(9*50)), ncol=9)) colnames(counts) <- c("CTRL1", "CTRL2", "CTRL3", "DrugX_1", "DrugX_2", "DrugX_3", "DrugY_1", "DrugY_2", "DrugY_3") rownames(counts) <- paste0("Gene_", 1:nrow(counts)) counts[5:15 , grepl("DrugX", colnames(counts))] = 4 + counts[5:15 , grepl("DrugX", colnames(counts))] * 5 # fake diff expressed genes counts[30:40 , grepl("DrugY", colnames(counts))] = 2 + counts[30:40 , grepl("DrugX", colnames(counts))] * 12 # fake diff expressed genes counts[13:25 , grepl("Drug.*", colnames(counts))] = 3 + counts[13:25 , grepl("Drug.*", colnames(counts))] * 8 # some more fake stuff heatmap(as.matrix(log2(1+counts)), scale="none") # show the data metadata=data.frame("SAMPLE_ID"=colnames(counts), "GROUP"=c(rep("CTRL",3), rep("DrugX",3), rep("DrugY",3))) pvals = list(); logFC = list() for (STYLE in c("NO_INTERCEPT", "WITH_INTERCEPT")) { if (STYLE == "WITH_INTERCEPT") { design.mat = model.matrix(~ GROUP, metadata) colnames(design.mat) <- c("INTERCEPT","DrugX", "DrugY") contrast.mat <- limma::makeContrasts(DrugX , DrugY , DrugY-DrugX, levels=design.mat) # don't subtract the intercept! comparisons = c("DrugX", "DrugY", "DrugY - DrugX") } else if (STYLE=="NO_INTERCEPT") { design.mat = model.matrix(~ 0 + GROUP, metadata) colnames(design.mat) <- gsub("^GROUP", "", colnames(design.mat)) contrast.mat <- limma::makeContrasts(DrugX-CTRL, DrugY-CTRL, DrugY-DrugX, levels=design.mat) comparisons = c("DrugX - CTRL", "DrugY - CTRL", "DrugY - DrugX") } else { stop("unrecognized 'STYLE'") } #contrast.mat dge = DGEList(counts=counts, group=as.factor(metadata$GROUP)) dubious_comparison = comparisons[3] # the drugY - drugX one print(paste0("Running the following comparison: ", dubious_comparison)) voomObj = voom(dge, design=design.mat, lib.size=dge$samples$lib.size, plot=TRUE) fit = limma::lmFit(object=voomObj, design=voomObj$design) fit = limma::contrasts.fit(fit, contrast.mat) # or remove this line to go back to the previous method fit = limma::eBayes(fit) toptab = limma::topTable(fit=fit, coef=dubious_comparison, number=Inf, sort.by='p', confint=TRUE) pvals[[STYLE]] = toptab$P.Value[order(rownames(toptab))] logFC[[STYLE]] = toptab$logFC[order(rownames(toptab))] print(paste0("If we use the style where our design matrix is set up with: ", STYLE, ", then our results are (note the P.Value):")) print(head(toptab)) } par(mfrow=c(1,2)) # 2 plots par(pty='s') # square plot(logFC[["NO_INTERCEPT"]], logFC[["WITH_INTERCEPT"]], col="blue", pch=22, main="logFC comparison (always perfectly on the line)") abline(a=0, b=1, col="#0000FF55") par(pty='s') # square plot(pvals[["NO_INTERCEPT"]], pvals[["WITH_INTERCEPT"]], col="red", pch=19, main="P-value comparison (NOT on the line!)") abline(a=0, b=1, col="#0000FF55")

Below: the image compares the two methods. Fold change is the same, but (for some reason) the P-values are different, so I assume that the computed variance being different is the culprit.

**37k**• written 10 months ago by alexgraehl •

**20**