Question

Limma design and contrast matrix

0

Entering edit mode

donny.dw • 0

@donnydw-9957

Last seen 8.1 years ago

I have a question when I analysis the microarray data collected from Affymetrix mouse transcriptome array. I have four groups

Control,Special Diet,Special Diet+Drug1,Special Diet+Drug2

When I use limma to ge DE genes, I tried to design the matrix in four ways.

1. 4 groups, Control,Special Diet,Special Diet+Drug1,Special Diet+Drug2

design1 <- model.matrix(~ 0+factor(c(1,1,1,2,2,2,3,3,3,4,4,4)))

contrast.matrix <- makeContrasts(SpecialDiet-Control, SpecialDietDrug1-Control, SpecialDietDrug2-Control, levels=design)

2. 3 groups, Control,Special Diet,Special Diet+Drug1

design2 <- model.matrix(~ 0+factor(c(1,1,1,2,2,2,3,3,3)))

contrast.matrix <- makeContrasts(SpecialDiet-Control, SpecialDietDrug1-Control, levels=design)

3. 3 groups, Control,Special Diet,Special Diet+Drug2

design3 <- model.matrix(~ 0+factor(c(1,1,1,2,2,2,3,3,3)))

contrast.matrix <- makeContrasts(SpecialDiet-Control, SpecialDietDrug2-Control, levels=design)

4. 2 groups Control,Special Diet

design4 <- model.matrix(~ 0+factor(c(1,1,1,2,2,2)))

contrast.matrix <- makeContrasts(SpecialDiet-Control, levels=design)

When I compared the results of those results, I found the DE genes between same compared groups are totally different. For example, Special Diet vs Control. In design1, there are 102 genes. In design2, there are 163 genes. In design3, there are 128 genes. In design4, there are 160 genes.

Why? How can I design the design and contrast matrix for my experiments?

limma limma design matrix limma contrast matrix • 8.2k views

ADD COMMENT • link 8.1 years ago donny.dw • 0

score 2 · Answer 1 · 2016-03-22

I presume you named the columns of each design matrix appropriately, otherwise the makeContrasts calls wouldn't make sense (and in fact, they don't really make sense as you've shown them, because levels=design while the design matrix variable has a number at the end, e.g., design1, design2). I also assume that you've used topTable sensibly to only perform the "SpecialDiet-Control" comparison for each design, rather than doing, e.g., an ANOVA-like comparison. Finally, I'll assume that you've subsetted the expression matrix correctly to match the groups you've retained in each design.

Now, with all that out of the way, the reason that you get some differences in the number of DE genes is because the variance estimates are different when you use a subset of the data. When you use all samples, information in all groups is used to estimate the variance of each gene, regardless of whether those groups are involved in the DE comparison. When you subset, you're losing the information in the groups that you've dropped; this results in a change in the variance estimate, and changes to the number of DE genes. Personally, I always try use all of the information in the data set, because the number of replicates is so low in most genomics studies that it doesn't make sense to throw stuff out. In any case, your numbers don't change too much - 100 to 160 is a minor change in the bigger scheme of things.

Your statement that the DE genes are "totally different" is more concerning. If the top set of significant genes is genuinely DE between groups, changes to the variance estimate between designs should not have a major effect on their presence in the DE set. Some more detail is required here, regarding what you did and what the results were, to figure out what's going on.

score 0 · Answer 2 · 2016-03-22

Thanks so much for your reply. I do change the "colnames" and use topTable to show the results. Below is the code when I ran 3 groups analysis.

And I misused the words "totally different". Actually, there are overlap genes. The different is the DE genes number when I use different design matrix.

-----------------------------------------------------------------------
library(limma)

design <- model.matrix(~ -1+factor(c(1,1,1,2,2,2,3,3,3))) 

colnames(design) <- c("Control", "SpecialDiet","SpecialDietDrug1") 

fit <- lmFit(geneCore, design)

contrast.matrix <- makeContrasts(SpecialDiet-Control, SpecialDietDrug1-SpecialDiet, SpecialDietDrug1-Control, levels=design) 

fit2 <- contrasts.fit(fit, contrast.matrix) 

fit2 <- eBayes(fit2)

topTable(fit2, coef=1, adjust="fdr", sort.by="B", number=10) 

results <- decideTests(fit2, p.value=0.05); vennDiagram(results)

-----------------------------------------------------------------------

For my experiment, Drug1 and Drug2 are supposed to work on different pathwaies. So, they should cause different genes expression changes. Actually, they can be two experiments. In this case, do you think I should split the experiment to two groups? Like below,

1. 3 groups, Control,Special Diet,Special Diet+Drug1

design2 <- model.matrix(~ 0+factor(c(1,1,1,2,2,2,3,3,3)))

... ... ... ...

contrast.matrix <- makeContrasts(SpecialDiet-Control, SpecialDietDrug1-Control, levels=design)

2. 3 groups, Control,Special Diet,Special Diet+Drug2

design3 <- model.matrix(~ 0+factor(c(1,1,1,2,2,2,3,3,3)))

... ... ... ...

contrast.matrix <- makeContrasts(SpecialDiet-Control, SpecialDietDrug2-Control, levels=design)