Dear all,

I have a question about differential gene expression calculation using limma. I have two disease groups - **D1** (n=6) and **D2** (n=7) that I have first compared to each other and obtained about 30 genes that have adj.P-value < 0.05. Please see the code that I've used below.

> design <- model.matrix(~0 + Treat) > fit <- lmFit(eset,design) > cm <- makeContrasts(coef1 = D1 - D2, levels=design) > fit2 <- contrasts.fit(fit, cm) > fit2 <- eBayes(fit2)

After I had calculated that, I obtained some healthy samples - **H** (n=23) to compare each of the disease types to. However, now when I calculate **D1 - D2**, I get a lot more significant genes (~200) with adj.P-value < 0.05 as opposed to the 30 genes I was getting before with the same comparison. The sample numbers or anything else for my **D1** and **D2** groups haven't changed at all, I've only added the **H** samples. Please see code below:

> design <- model.matrix(~0 + Treat) > fit <- lmFit(eset,design) > cm <- makeContrasts(coef1 = D1 - D2, coef2 = D1 - H, coef3= D2 - H, levels=design) > fit2 <- contrasts.fit(fit, cm) > fit2 <- eBayes(fit2)

I was wondering if anybody knew the reason for this discrepancy in the numbers of significant genes (with adj.P-value < 0.05) for 'coef1' from both of these codes.

Thank you,

Akul

Just for completeness, I assume you renamed the columns of

`design`

according to the levels of`Treat`

in your code.I did yes, I just didn't paste that part of the code as I wanted to keep it simple. Thank you for checking.

-Akul