testing triangular setup for finding instances of "all are different"
1
0
Entering edit mode
@wolfgang-raffelsberger-2876
Last seen 4 weeks ago
France
Dear List, I'm trying to find out the best procedure to treat the following "triangular" experimental plan. Three groups of samples (prepared as triplicates and run on 3+3+3=9 microarrays) are to be compared in the way where those elements/genes that are different in each of the three groups should be identified (ie, (av1 != av2) & (av1 != av3) & (av2 != av3) ) If I run an ANOVA the F value won't tell me if one or two groups are different. From the literature I see that people then like to resolves this again as multiple pair-wise comparisons to find out which group-means are actually different .... Of course the simples way might be to run three (independent) pair- wise comparisons (HoA: av1 = av2; HoB: av1 = av3; HoC: av2 = av3). However, when looking at the resulting p-values I'm not taking in account the extra multiplicity of testing. (Of course, before testing I'd check the data by QC and run some filtering of the data). A work-around might be to append all p-values to a 3 * length(genes) vector that could be used to estimate FDR or local fdr and finally search elements/genes where all FDRs are very low (I suppose I'd have to use the max(FDR) for a given element/gene) On this list I've seen a post for a somehow similar (?) case ("Re: [BioC] result of linear model", 12 june 2009) suggesting to either look at a "p.value/ratio threshold in limma", but I don't think this would address my questions satisfyingly. Looking at the ratio of only 2 pair-wise comparisons it might happen that (av1 != av2) and (av1 != av3), but if (av2 = av3) the conclusion that all three are different may be wrong ... So, I'm asking myself if there are more direct and elegant approaches to this problem. I remember that in related fields a constellation called "trio" is somehow popular, and I'm sure others may have already thought about solutions for such a setup. Finally, the questions remains if it is possible to develop one (single) FDR value for estimating that a given gene might be different in each of the three groups. Any hints/suggestions ? Thank's in advance, Wolfgang Raffelsberger # Here an example to illustrate ... set.seed(2009) dat <- matrix(runif(180),nc=9) rownames(dat) <- paste("g",1:nrow(dat),sep="_") dat[1,] <- dat[1,] + rep(1:3,each=3) # true positive dat[2,] <- dat[2,] + 2*rep(1:3,each=3) # another true positive groups <- gl(3,3,labels=letters[1:3]) # organize the data in 3 groups # basic testing in limma (emprical Bayes) require(limma) (design1 <- model.matrix( ~ -1 + groups )) (contr.matr1 <- makeContrasts(paste(colnames(design1)[1:2],collapse="-"), paste(colnames(design1)[2:3],collapse="-"), paste(colnames(design1)[c(1:3)],collapse="-"), levels=design1)) fit1 <- eBayes(contrasts.fit(lmFit(dat,design1),contr.matr1)) topTable(fit1, coef=1,number=4) # the best of 1st pairwise comparison # append p-values for correcting them all together and then rearrange in origial setup combFDR <- matrix(p.adjust(as.numeric(fit1$p.value),method="BY"), nc=3) # or your favorite FDR estimation method rownames(combFDR) <- rownames(dat) combFDR[ order(apply(combFDR,1,min))[1:10],] # the best FDR results # and then the 'old' question of finding/defining a suitable threshold for the FDR values remains to be addressed ... # plot the best one ... require(lattice) stripplot(dat[ order(apply(combFDR,1,min))[1],],groups=groups) # for completeness : > sessionInfo() R version 2.9.1 (2009-06-26) i386-pc-mingw32 locale: LC_COLLATE=French_France.1252;LC_CTYPE=French_France.1252;LC_MONETARY= French_France.1252;LC_NUMERIC=C;LC_TIME=French_France.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] lattice_0.17-25 limma_2.18.2 loaded via a namespace (and not attached): [1] grid_2.9.1 tools_2.9.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wolfgang Raffelsberger, PhD Laboratoire de BioInformatique et G?nomique Int?gratives CNRS UMR7104, IGBMC, 1 rue Laurent Fries, 67404 Illkirch Strasbourg, France Tel (+33) 388 65 3300 Fax (+33) 388 65 3276 wolfgang.raffelsberger (at) igbmc.fr
limma limma • 1.2k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 8 minutes ago
United States
Hi Wolfgang, Maybe I am missing your point entirely, but doesn't decideTests(fit1) do exactly what you want? You would have to modify your contrasts matrix (as the third column isn't a contrast anyway) to something like this: > cont <- cbind(contr.matr1[,-3], c(1,0,-1)) > colnames(cont) <- c("a-b","b-c","a-c") > cont a-b b-c a-c groupsa 1 0 1 groupsb -1 1 0 groupsc 0 -1 -1 > fit1 <- eBayes(contrasts.fit(lmFit(dat,design1),cont)) > rslt <- decideTests(fit1) > vennDiagram(rslt) Best, Jim Wolfgang Raffelsberger wrote: > Dear List, > > I'm trying to find out the best procedure to treat the following > "triangular" experimental plan. > Three groups of samples (prepared as triplicates and run on 3+3+3=9 > microarrays) are to be compared in the way where those elements/genes > that are different in each of the three groups should be identified (ie, > (av1 != av2) & (av1 != av3) & (av2 != av3) ) > > If I run an ANOVA the F value won't tell me if one or two groups are > different. From the literature I see that people then like to resolves > this again as multiple pair-wise comparisons to find out which > group-means are actually different .... > Of course the simples way might be to run three (independent) pair- wise > comparisons (HoA: av1 = av2; HoB: av1 = av3; HoC: av2 = av3). However, > when looking at the resulting p-values I'm not taking in account the > extra multiplicity of testing. (Of course, before testing I'd check the > data by QC and run some filtering of the data). A work-around might be > to append all p-values to a 3 * length(genes) vector that could be used > to estimate FDR or local fdr and finally search elements/genes where all > FDRs are very low (I suppose I'd have to use the max(FDR) for a given > element/gene) > On this list I've seen a post for a somehow similar (?) case ("Re: > [BioC] result of linear model", 12 june 2009) suggesting to either look > at a "p.value/ratio threshold in limma", but I don't think this would > address my questions satisfyingly. Looking at the ratio of only 2 > pair-wise comparisons it might happen that (av1 != av2) and (av1 != > av3), but if (av2 = av3) the conclusion that all three are different may > be wrong ... > > So, I'm asking myself if there are more direct and elegant approaches to > this problem. I remember that in related fields a constellation called > "trio" is somehow popular, and I'm sure others may have already thought > about solutions for such a setup. > Finally, the questions remains if it is possible to develop one (single) > FDR value for estimating that a given gene might be different in each of > the three groups. > > Any hints/suggestions ? > > Thank's in advance, > Wolfgang Raffelsberger > > > # Here an example to illustrate ... > set.seed(2009) > dat <- matrix(runif(180),nc=9) > rownames(dat) <- paste("g",1:nrow(dat),sep="_") > dat[1,] <- dat[1,] + rep(1:3,each=3) # true positive > dat[2,] <- dat[2,] + 2*rep(1:3,each=3) # another true positive > > groups <- gl(3,3,labels=letters[1:3]) # organize the data in 3 groups > > # basic testing in limma (emprical Bayes) > require(limma) > (design1 <- model.matrix( ~ -1 + groups )) > (contr.matr1 <- makeContrasts(paste(colnames(design1)[1:2],collapse="-"), > paste(colnames(design1)[2:3],collapse="-"), > paste(colnames(design1)[c(1:3)],collapse="-"), > levels=design1)) > fit1 <- eBayes(contrasts.fit(lmFit(dat,design1),contr.matr1)) > topTable(fit1, coef=1,number=4) # the best of 1st pairwise > comparison > > # append p-values for correcting them all together and then rearrange in > origial setup > combFDR <- matrix(p.adjust(as.numeric(fit1$p.value),method="BY"), > nc=3) # or your favorite FDR estimation method > rownames(combFDR) <- rownames(dat) > combFDR[ order(apply(combFDR,1,min))[1:10],] # the best FDR results > # and then the 'old' question of finding/defining a suitable threshold > for the FDR values remains to be addressed ... > > # plot the best one ... > require(lattice) > stripplot(dat[ order(apply(combFDR,1,min))[1],],groups=groups) > > > # for completeness : > > sessionInfo() > R version 2.9.1 (2009-06-26) > i386-pc-mingw32 > > locale: > LC_COLLATE=French_France.1252;LC_CTYPE=French_France.1252;LC_MONETAR Y=French_France.1252;LC_NUMERIC=C;LC_TIME=French_France.1252 > > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > other attached packages: > [1] lattice_0.17-25 limma_2.18.2 > loaded via a namespace (and not attached): > [1] grid_2.9.1 tools_2.9.1 > > > . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . > Wolfgang Raffelsberger, PhD > Laboratoire de BioInformatique et G?nomique Int?gratives > CNRS UMR7104, IGBMC, 1 rue Laurent Fries, 67404 Illkirch Strasbourg, > France > Tel (+33) 388 65 3300 Fax (+33) 388 65 3276 > wolfgang.raffelsberger (at) igbmc.fr > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826
ADD COMMENT

Login before adding your answer.

Traffic: 819 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6