odd limma results?
1
0
Entering edit mode
David A. ▴ 70
@david-a-4430
Last seen 6.9 years ago
Hi list, I have a one-color setup with 238 samples and 6 conditions, for a geneset of just over 400 probes (targeted array). One of the conditions is control and the rest are experimental conditions. After normalization using vsn I would like to make a comparison first between the control and the rest and then between some of the conditions grouped. Being human samples I can expect great variability within conditions and thus not many genes coming as significant. But when I run a limma analysis, for some of the contrasts I get that all the genes come up significant with large B values and really small p-values. When doing boxplots of normalized expression values for some of these genes, the image doesn´t show they are that different, rather the opposite. Is my parametrization wrong? Are the results strangely significant because the reduced geneset to be analyzed? Being a targeted array probably many results should come up as significant, but I don't think all of them should show up as significant as they do for many of the contrasts. here is my code: > table(myeset$group) 0 1 2 3 4 5 68 60 18 53 6 33 >design <- model.matrix(~0+factor(myeset$group)) >colnames(design) <- c("G0","G1","G2","G3","G4","G5") >fit <- lmFit(myeset,design) >contrast.matrix <- makeContrasts((G1+G2+G3+G4+G5)-G0,(G1+G2+G3+G4)-G0, (G1+G2)-G0, G1-G2,(G1+G2+G3+G4)-G5, levels = design) >fit2 <- contrasts.fit(fit, contrast.matrix) >fit3 <- ebayes(fit2) > topTable(fit3,adjust="BH",coef=1) ID logFC AveExpr t P.Value adj.P.Val B 411 VTI2 39.30167 13.08281 149.2822 5.690975e-249 2.356064e-246 501.3006 315 PI14 38.07841 12.54843 138.1938 1.399494e-240 2.896952e-238 489.1714 49 CHRM1 35.60063 11.93787 135.3310 2.623964e-238 3.621070e-236 485.7677 67 DDX32 34.95446 11.61260 132.9274 2.310411e-236 2.391276e-234 482.8165 144 GI_18044224 35.16818 11.83657 132.1551 9.904483e-236 8.200912e-234 481.8495 385 TDE1 34.80442 11.65140 131.8907 1.633346e-235 1.127009e-233 481.5163 136 GI_16878151 35.98769 12.01306 130.9807 9.205817e-235 5.444583e-233 480.3612 269 MGLL 37.82152 12.80129 129.2519 2.539646e-233 1.168237e-231 478.1303 120 GI_15559367 37.07777 12.19571 130.4492 1.285421e-233 6.652055e-232 478.0643 61 CSNK1E 35.45598 11.93302 128.1286 2.243467e-232 9.287953e-231 476.6548 Thanks for your help D. [[alternative HTML version deleted]]
vsn limma vsn limma • 793 views
0
Entering edit mode
@james-w-macdonald-5106
Last seen 5 hours ago
United States
Hi David, On 1/11/2011 11:02 AM, David A. wrote: > > Hi list, > > I have a one-color setup with 238 samples and 6 conditions, for a geneset of just over 400 probes (targeted array). One of the conditions is control and the rest are experimental conditions. After normalization using vsn I would like to make a comparison first between the control and the rest and then between some of the conditions grouped. Being human samples I can expect great variability within conditions and thus not many genes coming as significant. But when I run a limma analysis, for some of the contrasts I get that all the genes come up significant with large B values and really small p-values. When doing boxplots of normalized expression values for some of these genes, the image doesn?t show they are that different, rather the opposite. Is my parametrization wrong? Are the results strangely significant because the reduced geneset to be analyzed? Being a targeted array probably many results should come up as significant, but I don't think all of them should show up as significant as they do for many of the contrasts. > here is my code: > >> table(myeset$group) > > 0 1 2 3 4 5 > 68 60 18 53 6 33 > >> design<- model.matrix(~0+factor(myeset$group)) >> colnames(design)<- c("G0","G1","G2","G3","G4","G5") > >> fit<- lmFit(myeset,design) >> contrast.matrix<- makeContrasts((G1+G2+G3+G4+G5)-G0,(G1+G2+G3+G4)-G0, (G1+G2)-G0, G1-G2,(G1+G2+G3+G4)-G5, levels = design) Most of these aren't contrasts. The definition of a contrast is that the coefficients should sum to 0. So if we look at e.g. the first 'contrast' above, it is equivalent to (1*G1 + 1*G2 + 1*G3 + 1*G4 + 1*G5) - 1*G0 and the coefficients are thus 1 + 1 + 1 + 1 + 1 - 1, which isn't zero. So you need to divide each of the sums you create by the number of coefficients so the math works out. Another way to look at this is to realize that your first question is 'Is the average expression of samples G1-G5 different from G0?', which indicates again that you want the average of the estimated expression of the G1-G5 samples, not the sum. Does that make sense? Best, Jim >> fit2<- contrasts.fit(fit, contrast.matrix) >> fit3<- ebayes(fit2) >> topTable(fit3,adjust="BH",coef=1) > ID logFC AveExpr t P.Value adj.P.Val B > 411 VTI2 39.30167 13.08281 149.2822 5.690975e-249 2.356064e-246 501.3006 > 315 PI14 38.07841 12.54843 138.1938 1.399494e-240 2.896952e-238 489.1714 > 49 CHRM1 35.60063 11.93787 135.3310 2.623964e-238 3.621070e-236 485.7677 > 67 DDX32 34.95446 11.61260 132.9274 2.310411e-236 2.391276e-234 482.8165 > 144 GI_18044224 35.16818 11.83657 132.1551 9.904483e-236 8.200912e-234 481.8495 > 385 TDE1 34.80442 11.65140 131.8907 1.633346e-235 1.127009e-233 481.5163 > 136 GI_16878151 35.98769 12.01306 130.9807 9.205817e-235 5.444583e-233 480.3612 > 269 MGLL 37.82152 12.80129 129.2519 2.539646e-233 1.168237e-231 478.1303 > 120 GI_15559367 37.07777 12.19571 130.4492 1.285421e-233 6.652055e-232 478.0643 > 61 CSNK1E 35.45598 11.93302 128.1286 2.243467e-232 9.287953e-231 476.6548 > > > Thanks for your help > > D. > > > [[alternative HTML version deleted]] > > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues