testing triangular setup for finding instances of "all are different"
1
0
Entering edit mode
@wolfgang-raffelsberger-2876
Last seen 4 days ago
France
Dear List, I'm trying to find out the best procedure to treat the following "triangular" experimental plan. Three groups of samples (prepared as triplicates and run on 3+3+3=9 microarrays) are to be compared in the way where those elements/genes that are different in each of the three groups should be identified (ie, (av1 != av2) & (av1 != av3) & (av2 != av3) ) If I run an ANOVA the F value won't tell me if one or two groups are different. From the literature I see that people then like to resolves this again as multiple pair-wise comparisons to find out which group-means are actually different .... Of course the simples way might be to run three (independent) pair- wise comparisons (HoA: av1 = av2; HoB: av1 = av3; HoC: av2 = av3). However, when looking at the resulting p-values I'm not taking in account the extra multiplicity of testing. (Of course, before testing I'd check the data by QC and run some filtering of the data). A work-around might be to append all p-values to a 3 * length(genes) vector that could be used to estimate FDR or local fdr and finally search elements/genes where all FDRs are very low (I suppose I'd have to use the max(FDR) for a given element/gene) On this list I've seen a post for a somehow similar (?) case ("Re: [BioC] result of linear model", 12 june 2009) suggesting to either look at a "p.value/ratio threshold in limma", but I don't think this would address my questions satisfyingly. Looking at the ratio of only 2 pair-wise comparisons it might happen that (av1 != av2) and (av1 != av3), but if (av2 = av3) the conclusion that all three are different may be wrong ... So, I'm asking myself if there are more direct and elegant approaches to this problem. I remember that in related fields a constellation called "trio" is somehow popular, and I'm sure others may have already thought about solutions for such a setup. Finally, the questions remains if it is possible to develop one (single) FDR value for estimating that a given gene might be different in each of the three groups. Any hints/suggestions ? Thank's in advance, Wolfgang Raffelsberger # Here an example to illustrate ... set.seed(2009) dat <- matrix(runif(180),nc=9) rownames(dat) <- paste("g",1:nrow(dat),sep="_") dat[1,] <- dat[1,] + rep(1:3,each=3) # true positive dat[2,] <- dat[2,] + 2*rep(1:3,each=3) # another true positive groups <- gl(3,3,labels=letters[1:3]) # organize the data in 3 groups # basic testing in limma (emprical Bayes) require(limma) (design1 <- model.matrix( ~ -1 + groups )) (contr.matr1 <- makeContrasts(paste(colnames(design1)[1:2],collapse="-"), paste(colnames(design1)[2:3],collapse="-"), paste(colnames(design1)[c(1:3)],collapse="-"), levels=design1)) fit1 <- eBayes(contrasts.fit(lmFit(dat,design1),contr.matr1)) topTable(fit1, coef=1,number=4) # the best of 1st pairwise comparison # append p-values for correcting them all together and then rearrange in origial setup combFDR <- matrix(p.adjust(as.numeric(fit1$p.value),method="BY"), nc=3) # or your favorite FDR estimation method rownames(combFDR) <- rownames(dat) combFDR[ order(apply(combFDR,1,min))[1:10],] # the best FDR results # and then the 'old' question of finding/defining a suitable threshold for the FDR values remains to be addressed ... # plot the best one ... require(lattice) stripplot(dat[ order(apply(combFDR,1,min))[1],],groups=groups) # for completeness : > sessionInfo() R version 2.9.1 (2009-06-26) i386-pc-mingw32 locale: LC_COLLATE=French_France.1252;LC_CTYPE=French_France.1252;LC_MONETARY= French_France.1252;LC_NUMERIC=C;LC_TIME=French_France.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] lattice_0.17-25 limma_2.18.2 loaded via a namespace (and not attached): [1] grid_2.9.1 tools_2.9.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wolfgang Raffelsberger, PhD Laboratoire de BioInformatique et G?nomique Int?gratives CNRS UMR7104, IGBMC, 1 rue Laurent Fries, 67404 Illkirch Strasbourg, France Tel (+33) 388 65 3300 Fax (+33) 388 65 3276 wolfgang.raffelsberger (at) igbmc.fr limma limma • 884 views ADD COMMENT 0 Entering edit mode @james-w-macdonald-5106 Last seen 2 days ago United States Hi Wolfgang, Maybe I am missing your point entirely, but doesn't decideTests(fit1) do exactly what you want? You would have to modify your contrasts matrix (as the third column isn't a contrast anyway) to something like this: > cont <- cbind(contr.matr1[,-3], c(1,0,-1)) > colnames(cont) <- c("a-b","b-c","a-c") > cont a-b b-c a-c groupsa 1 0 1 groupsb -1 1 0 groupsc 0 -1 -1 > fit1 <- eBayes(contrasts.fit(lmFit(dat,design1),cont)) > rslt <- decideTests(fit1) > vennDiagram(rslt) Best, Jim Wolfgang Raffelsberger wrote: > Dear List, > > I'm trying to find out the best procedure to treat the following > "triangular" experimental plan. > Three groups of samples (prepared as triplicates and run on 3+3+3=9 > microarrays) are to be compared in the way where those elements/genes > that are different in each of the three groups should be identified (ie, > (av1 != av2) & (av1 != av3) & (av2 != av3) ) > > If I run an ANOVA the F value won't tell me if one or two groups are > different. From the literature I see that people then like to resolves > this again as multiple pair-wise comparisons to find out which > group-means are actually different .... > Of course the simples way might be to run three (independent) pair- wise > comparisons (HoA: av1 = av2; HoB: av1 = av3; HoC: av2 = av3). However, > when looking at the resulting p-values I'm not taking in account the > extra multiplicity of testing. (Of course, before testing I'd check the > data by QC and run some filtering of the data). A work-around might be > to append all p-values to a 3 * length(genes) vector that could be used > to estimate FDR or local fdr and finally search elements/genes where all > FDRs are very low (I suppose I'd have to use the max(FDR) for a given > element/gene) > On this list I've seen a post for a somehow similar (?) case ("Re: > [BioC] result of linear model", 12 june 2009) suggesting to either look > at a "p.value/ratio threshold in limma", but I don't think this would > address my questions satisfyingly. Looking at the ratio of only 2 > pair-wise comparisons it might happen that (av1 != av2) and (av1 != > av3), but if (av2 = av3) the conclusion that all three are different may > be wrong ... > > So, I'm asking myself if there are more direct and elegant approaches to > this problem. I remember that in related fields a constellation called > "trio" is somehow popular, and I'm sure others may have already thought > about solutions for such a setup. > Finally, the questions remains if it is possible to develop one (single) > FDR value for estimating that a given gene might be different in each of > the three groups. > > Any hints/suggestions ? > > Thank's in advance, > Wolfgang Raffelsberger > > > # Here an example to illustrate ... > set.seed(2009) > dat <- matrix(runif(180),nc=9) > rownames(dat) <- paste("g",1:nrow(dat),sep="_") > dat[1,] <- dat[1,] + rep(1:3,each=3) # true positive > dat[2,] <- dat[2,] + 2*rep(1:3,each=3) # another true positive > > groups <- gl(3,3,labels=letters[1:3]) # organize the data in 3 groups > > # basic testing in limma (emprical Bayes) > require(limma) > (design1 <- model.matrix( ~ -1 + groups )) > (contr.matr1 <- makeContrasts(paste(colnames(design1)[1:2],collapse="-"), > paste(colnames(design1)[2:3],collapse="-"), > paste(colnames(design1)[c(1:3)],collapse="-"), > levels=design1)) > fit1 <- eBayes(contrasts.fit(lmFit(dat,design1),contr.matr1)) > topTable(fit1, coef=1,number=4) # the best of 1st pairwise > comparison > > # append p-values for correcting them all together and then rearrange in > origial setup > combFDR <- matrix(p.adjust(as.numeric(fit1$p.value),method="BY"), > nc=3) # or your favorite FDR estimation method > rownames(combFDR) <- rownames(dat) > combFDR[ order(apply(combFDR,1,min))[1:10],] # the best FDR results > # and then the 'old' question of finding/defining a suitable threshold > for the FDR values remains to be addressed ... > > # plot the best one ... > require(lattice) > stripplot(dat[ order(apply(combFDR,1,min))[1],],groups=groups) > > > # for completeness : > > sessionInfo() > R version 2.9.1 (2009-06-26) > i386-pc-mingw32 > > locale: > LC_COLLATE=French_France.1252;LC_CTYPE=French_France.1252;LC_MONETAR Y=French_France.1252;LC_NUMERIC=C;LC_TIME=French_France.1252 > > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > other attached packages: > [1] lattice_0.17-25 limma_2.18.2 > loaded via a namespace (and not attached): > [1] grid_2.9.1 tools_2.9.1 > > > . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . > Wolfgang Raffelsberger, PhD > Laboratoire de BioInformatique et G?nomique Int?gratives > CNRS UMR7104, IGBMC, 1 rue Laurent Fries, 67404 Illkirch Strasbourg, > France > Tel (+33) 388 65 3300 Fax (+33) 388 65 3276 > wolfgang.raffelsberger (at) igbmc.fr > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826