Question

Limma: Contrasts comparing one factor to multiple others

0

Entering edit mode

Daniel Brewer ★ 1.9k

@daniel-brewer-1791

Last seen 9.6 years ago

Hello, I am trying to get to the bottom of how limma works and was thinking about what the best way is to compare one level of a variable to multiple other levels. For example say we had three sample types A, B and C and I want to find genes that are differentially expressed between A and the other samples. I can see that there would be two ways to set up the design matrix: 1) to have columns "A" and "Other" 2) to have columns "A", "B" and "C" and deal with the comparison in a contrast I would have expected both of these approaches to produce the same results but they do not. Is this correct? And if they are different which is the best approach to use and why are they different? A test example is posted below: Library(limma) testexpr <- matrix(runif(100),ncol=10) design2 <- matrix(c(rep(1,5),rep(0,5),rep(0,5),rep(1,5)),ncol=2) > design2 A Other [1,] 1 0 [2,] 1 0 [3,] 1 0 [4,] 1 0 [5,] 1 0 [6,] 0 1 [7,] 0 1 [8,] 0 1 [9,] 0 1 [10,] 0 1 design1 <- model.matrix(~as.factor(c(rep(1,5),2,2,3,3,3))) > design1 A B C 1 1 0 0 2 1 0 0 3 1 0 0 4 1 0 0 5 1 0 0 6 0 1 0 7 0 1 0 8 0 0 1 9 0 0 1 10 0 0 1 fit1 <- lmFit(testexpr,design=design1) contrasts.matrix1 <- makeContrasts("AvsOther"=A-(B+C)/2,levels=design1) fit1 <- eBayes(contrasts.fit(fit1,contrasts=contrasts.matrix1)) fit2 <- lmFit(testexpr,design=design2) contrasts.matrix2 <- makeContrasts("AvsOther"=A-Other,levels=design2) fit2 <- eBayes(contrasts.fit(fit2,contrasts=contrasts.matrix2)) > topTable(fit1,coef=1) logFC AveExpr t P.Value adj.P.Val B 4 0.4990143 0.4692092 2.5585742 0.01051024 0.1051024 -2.791453 8 -0.3745471 0.6429097 -1.9203987 0.05480755 0.2231148 -4.069512 3 -0.3573284 0.4941828 -1.8321141 0.06693443 0.2231148 -4.217641 1 -0.3137072 0.4258817 -1.6084570 0.10773513 0.2693378 -4.561710 7 0.2588919 0.5142112 1.3274050 0.18437474 0.3687495 -4.930648 5 -0.2243146 0.5899142 -1.1501181 0.25009522 0.4167625 -5.127042 2 0.2056316 0.4184454 1.0543258 0.29173376 0.4167625 -5.221461 9 0.1813196 0.5025485 0.9296718 0.35254105 0.4406763 -5.332042 6 -0.1109509 0.4021577 -0.5688735 0.56944202 0.6327134 -5.573792 10 0.0722048 0.5218702 0.3702125 0.71122419 0.7112242 -5.657208 > topTable(fit2,coef=1) logFC AveExpr t P.Value adj.P.Val B 4 0.47838295 0.4692092 2.5633453 0.01036689 0.1036689 -2.781949 8 -0.37026456 0.6429097 -1.9840087 0.04725487 0.1692887 -3.961164 3 -0.36452956 0.4941828 -1.9532785 0.05078660 0.1692887 -4.015323 1 -0.28931647 0.4258817 -1.5502601 0.12107910 0.3026977 -4.647349 7 0.25213156 0.5142112 1.3510102 0.17669216 0.3533843 -4.906105 5 -0.23271487 0.5899142 -1.2469687 0.21240897 0.3540150 -5.027093 2 0.19075798 0.4184454 1.0221488 0.30671047 0.4381578 -5.255440 9 0.15219110 0.5025485 0.8154938 0.41478969 0.5184871 -5.425425 10 0.08103956 0.5218702 0.4342387 0.66411512 0.6936484 -5.638698 6 -0.07351301 0.4021577 -0.3939088 0.69364840 0.6936484 -5.653648 The resulting fits using the two different approaches are different. Many thanks Dan -- ************************************************************** Daniel Brewer, Ph.D. Institute of Cancer Research Molecular Carcinogenesis Email: daniel.brewer at icr.ac.uk ************************************************************** The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the a...{{dropped:2}}

Cancer limma Cancer limma • 5.4k views

ADD COMMENT • link updated 15.2 years ago by James W. MacDonald 65k • written 15.2 years ago by Daniel Brewer ★ 1.9k

score 1 · Answer 1 · 2009-02-11

Hi Dan, Daniel Brewer wrote: > Hello, > > I am trying to get to the bottom of how limma works and was thinking > about what the best way is to compare one level of a variable to > multiple other levels. For example say we had three sample types A, B > and C and I want to find genes that are differentially expressed between > A and the other samples. I can see that there would be two ways to set > up the design matrix: > 1) to have columns "A" and "Other" > 2) to have columns "A", "B" and "C" and deal with the comparison in a > contrast > > I would have expected both of these approaches to produce the same > results but they do not. Is this correct? And if they are different > which is the best approach to use and why are they different? They are different, primarily due to what is in the denominator. The calculation of the standard error of the mean will be different, and the eBayes estimate might be different as well. But the SEM will likely have the largest effect. What you have to remember is that the t-tests you construct all use the SEM as a yardstick to determine if the observed difference between the groups is large or not. In the first case you are pooling the B and C groups, so they both have to be pretty similar to each other in order to get a significant result. In the second case you are using a measure of the intra-group correlation for each individual group for the SEM, so the B and C groups don't have to be that similar to each other, just consistent within the groups. So for instance if you have something like Group Mean SD A 5 1 B 12 1.3 C 5 1.1 This might be significant in the second situation, as the intra-group variance is low, and the difference between the mean of B and C and the A group is pretty large. However, in the first situation the variance of the combined B and C group will be really big, so the SEM will be much larger than in the second situation, likely resulting in an insignificant result. Best, Jim > > A test example is posted below: > > Library(limma) > testexpr <- matrix(runif(100),ncol=10) > > design2 <- matrix(c(rep(1,5),rep(0,5),rep(0,5),rep(1,5)),ncol=2) >> design2 > A Other > [1,] 1 0 > [2,] 1 0 > [3,] 1 0 > [4,] 1 0 > [5,] 1 0 > [6,] 0 1 > [7,] 0 1 > [8,] 0 1 > [9,] 0 1 > [10,] 0 1 > > design1 <- model.matrix(~as.factor(c(rep(1,5),2,2,3,3,3))) >> design1 > A B C > 1 1 0 0 > 2 1 0 0 > 3 1 0 0 > 4 1 0 0 > 5 1 0 0 > 6 0 1 0 > 7 0 1 0 > 8 0 0 1 > 9 0 0 1 > 10 0 0 1 > > fit1 <- lmFit(testexpr,design=design1) > contrasts.matrix1 <- makeContrasts("AvsOther"=A-(B+C)/2,levels=design1) > fit1 <- eBayes(contrasts.fit(fit1,contrasts=contrasts.matrix1)) > > fit2 <- lmFit(testexpr,design=design2) > contrasts.matrix2 <- makeContrasts("AvsOther"=A-Other,levels=design2) > fit2 <- eBayes(contrasts.fit(fit2,contrasts=contrasts.matrix2)) > >> topTable(fit1,coef=1) > logFC AveExpr t P.Value adj.P.Val B > 4 0.4990143 0.4692092 2.5585742 0.01051024 0.1051024 -2.791453 > 8 -0.3745471 0.6429097 -1.9203987 0.05480755 0.2231148 -4.069512 > 3 -0.3573284 0.4941828 -1.8321141 0.06693443 0.2231148 -4.217641 > 1 -0.3137072 0.4258817 -1.6084570 0.10773513 0.2693378 -4.561710 > 7 0.2588919 0.5142112 1.3274050 0.18437474 0.3687495 -4.930648 > 5 -0.2243146 0.5899142 -1.1501181 0.25009522 0.4167625 -5.127042 > 2 0.2056316 0.4184454 1.0543258 0.29173376 0.4167625 -5.221461 > 9 0.1813196 0.5025485 0.9296718 0.35254105 0.4406763 -5.332042 > 6 -0.1109509 0.4021577 -0.5688735 0.56944202 0.6327134 -5.573792 > 10 0.0722048 0.5218702 0.3702125 0.71122419 0.7112242 -5.657208 > >> topTable(fit2,coef=1) > logFC AveExpr t P.Value adj.P.Val B > 4 0.47838295 0.4692092 2.5633453 0.01036689 0.1036689 -2.781949 > 8 -0.37026456 0.6429097 -1.9840087 0.04725487 0.1692887 -3.961164 > 3 -0.36452956 0.4941828 -1.9532785 0.05078660 0.1692887 -4.015323 > 1 -0.28931647 0.4258817 -1.5502601 0.12107910 0.3026977 -4.647349 > 7 0.25213156 0.5142112 1.3510102 0.17669216 0.3533843 -4.906105 > 5 -0.23271487 0.5899142 -1.2469687 0.21240897 0.3540150 -5.027093 > 2 0.19075798 0.4184454 1.0221488 0.30671047 0.4381578 -5.255440 > 9 0.15219110 0.5025485 0.8154938 0.41478969 0.5184871 -5.425425 > 10 0.08103956 0.5218702 0.4342387 0.66411512 0.6936484 -5.638698 > 6 -0.07351301 0.4021577 -0.3939088 0.69364840 0.6936484 -5.653648 > > The resulting fits using the two different approaches are different. > > Many thanks > > Dan > -- James W. MacDonald, M.S. Biostatistician Hildebrandt Lab 8220D MSRB III 1150 W. Medical Center Drive Ann Arbor MI 48109-0646 734-936-8662