Question: Limma design and contrast matrix
0
3.6 years ago by
donny.dw0
donny.dw0 wrote:

I have a question when I analysis the microarray data collected from Affymetrix mouse transcriptome array. I have four groups

Control,Special Diet,Special Diet+Drug1,Special Diet+Drug2

When I use limma to ge DE genes, I tried to design the matrix in four ways.

1.  4 groups, Control,Special Diet,Special Diet+Drug1,Special Diet+Drug2

design1 <- model.matrix(~ 0+factor(c(1,1,1,2,2,2,3,3,3,4,4,4)))

contrast.matrix <- makeContrasts(SpecialDiet-Control, SpecialDietDrug1-Control, SpecialDietDrug2-Control, levels=design)

2.  3 groups, Control,Special Diet,Special Diet+Drug1

design2 <- model.matrix(~ 0+factor(c(1,1,1,2,2,2,3,3,3)))

contrast.matrix <- makeContrasts(SpecialDiet-Control, SpecialDietDrug1-Control, levels=design)

3.  3 groups, Control,Special Diet,Special Diet+Drug2

design3 <- model.matrix(~ 0+factor(c(1,1,1,2,2,2,3,3,3)))

contrast.matrix <- makeContrasts(SpecialDiet-Control, SpecialDietDrug2-Control, levels=design)

4.  2 groups  Control,Special Diet

design4 <- model.matrix(~ 0+factor(c(1,1,1,2,2,2)))

contrast.matrix <- makeContrasts(SpecialDiet-Control, levels=design)

When I compared the results of those results, I found the DE genes between same compared groups are totally different. For example, Special Diet vs Control. In design1, there are 102 genes. In design2, there are 163 genes. In design3, there are 128 genes. In design4, there are 160 genes.

Why? How can I design the design and contrast matrix for my experiments

modified 3.6 years ago • written 3.6 years ago by donny.dw0
Answer: Limma design and contrast matrix
2
3.6 years ago by
Aaron Lun25k
Cambridge, United Kingdom
Aaron Lun25k wrote:

I presume you named the columns of each design matrix appropriately, otherwise the makeContrasts calls wouldn't make sense (and in fact, they don't really make sense as you've shown them, because levels=design while the design matrix variable has a number at the end, e.g., design1, design2). I also assume that you've used topTable sensibly to only perform the "SpecialDiet-Control" comparison for each design, rather than doing, e.g., an ANOVA-like comparison. Finally, I'll assume that you've subsetted the expression matrix correctly to match the groups you've retained in each design.

Now, with all that out of the way, the reason that you get some differences in the number of DE genes is because the variance estimates are different when you use a subset of the data. When you use all samples, information in all groups is used to estimate the variance of each gene, regardless of whether those groups are involved in the DE comparison. When you subset, you're losing the information in the groups that you've dropped; this results in a change in the variance estimate, and changes to the number of DE genes. Personally, I always try use all of the information in the data set, because the number of replicates is so low in most genomics studies that it doesn't make sense to throw stuff out. In any case, your numbers don't change too much - 100 to 160 is a minor change in the bigger scheme of things.

Your statement that the DE genes are "totally different" is more concerning. If the top set of significant genes is genuinely DE between groups, changes to the variance estimate between designs should not have a major effect on their presence in the DE set. Some more detail is required here, regarding what you did and what the results were, to figure out what's going on.

Answer: Limma design and contrast matrix
0
3.6 years ago by
donny.dw0
donny.dw0 wrote:

Thanks so much for your reply. I do change the "colnames" and use topTable to show the results. Below is the code when I ran 3 groups analysis.

And I misused the words "totally different". Actually, there are overlap genes. The different is the DE genes number when I use different design matrix.

-----------------------------------------------------------------------
library(limma)

design <- model.matrix(~ -1+factor(c(1,1,1,2,2,2,3,3,3)))

colnames(design) <- c("Control", "SpecialDiet","SpecialDietDrug1")

fit <- lmFit(geneCore, design)

contrast.matrix <- makeContrasts(SpecialDiet-Control, SpecialDietDrug1-SpecialDiet, SpecialDietDrug1-Control, levels=design)

fit2 <- contrasts.fit(fit, contrast.matrix)

fit2 <- eBayes(fit2)

results <- decideTests(fit2, p.value=0.05); vennDiagram(results) 
-----------------------------------------------------------------------


For my experiment, Drug1 and Drug2 are supposed to work on different pathwaies. So, they should cause different genes expression changes. Actually, they can be two experiments. In this case, do you think I should split the experiment to two groups? Like below,

1.  3 groups, Control,Special Diet,Special Diet+Drug1

design2 <- model.matrix(~ 0+factor(c(1,1,1,2,2,2,3,3,3)))

... ... ... ...

contrast.matrix <- makeContrasts(SpecialDiet-Control, SpecialDietDrug1-Control, levels=design)

2.  3 groups, Control,Special Diet,Special Diet+Drug2

design3 <- model.matrix(~ 0+factor(c(1,1,1,2,2,2,3,3,3)))

​... ... ... ...

contrast.matrix <- makeContrasts(SpecialDiet-Control, SpecialDietDrug2-Control, levels=design)

Please use the 'Add Comment' button to respond to a post. Anyway, if it's just a change in the number of DE genes, I wouldn't worry about it, the numbers involved are quite small. Also, you shouldn't split the data set into two groups - even though you could have done the two drug treatments in separate experiments, the fact is that you actually did them in a single experiment, so it makes sense to analyze all of the data as that from a single experiment. Remember, you get more accuracy when you have more samples to estimate the variance, so if you generated the samples at the same time, with the same cell type and on the same platform, you should be analyzing it all together to exploit this improved accuracy.