Re: Manova nuances
6
0
Entering edit mode
@baker-stephen-469
Last seen 7.1 years ago
Principle component analyses should reduce your data array to as many independent components as you have samples, and for each sample get a score for each dimension. These will have the same total information as the original data. These can then be analysed separately with univariate anova but since these are "orthogonal" analyses, multiple comparisons adjustments would not be needed. -.- -.. .---- .--. ..-. Stephen P. Baker, MScPH , PhD(ABD) (508) 856-2625 Senior Biostatistician (775) 254-4885 fax Academic Computing Services Lecturer in Biostatistics , Graduate School of Biomedical Sciences University of Massachusetts Medical School 55 Lake Avenue North stephen.baker@umassmed.edu Worcester, MA 01655 USA ---------------------------------------------------------------------- ------ ---- Date: Fri, 21 Nov 2003 00:18:54 -0500 From: "Michael Benjamin" <msb1129@bellsouth.net> Subject: [BioC] Manova nuances To: <bioconductor@stat.math.ethz.ch> Message-ID: <003401c3afee$f7eff000$7a05fea9@amd> Content-Type: text/plain; charset="US-ASCII" Anybody here using manova? It's powerful and pretty fast, but I'm finding that you can't have more variables than samples (limits its applicability to microarray research). Is there any way around this? Assume dim(eset) 1200 35 transeset<-t(eset) fit<-manova(transeset ~ categories) summary(fit) There is probably a complicated mathematical truth that underlies this limitation--if anybody can shed some light, that would be great. Also, if anyone knows of a quick, free multivariate tool that summarizes all the tests into a single test statistic, that would be much appreciated. Regards, Michael Benjamin, MD Emory University Winship Cancer Institute
Microarray Cancer Microarray Cancer • 930 views
0
Entering edit mode
Liaw, Andy ▴ 360
@liaw-andy-125
Last seen 7.1 years ago
> From: Stephen P. Baker [mailto:stephen.baker@umassmed.edu] > > Principle component analyses should reduce your data array to ^^^^^^^^^ Principal > as many independent components as you have samples, and for > each sample get a score for each dimension. These will have > the same total information as the original data. These can > then be analysed separately with univariate anova but since > these are "orthogonal" analyses, multiple comparisons > adjustments would not be needed. The analysis you described is quite different than MANOVA, so the conclusion/interpretation would be quite different, too. MANOVA treats the data as coming from multivariate normal distribution, and tests whether all groups have the same mean vector. What you described is n (number of samples) ANOVA analyses that gives n p-values. Cheers, Andy Andy Liaw, PhD Biometrics Research PO Box 2000, RY33-300 Merck Research Labs Rahway, NJ 07065 mailto:andy_liaw@merck.com 732-594-0820 > -.- -.. .---- .--. ..-. > Stephen P. Baker, MScPH , PhD(ABD) (508) 856-2625 > Senior Biostatistician > (775) 254-4885 fax > Academic Computing Services > Lecturer in Biostatistics , Graduate School of Biomedical > Sciences University of Massachusetts Medical School > 55 Lake Avenue North > stephen.baker@umassmed.edu > Worcester, MA 01655 USA > -------------------------------------------------------------- > -------------- > ---- > Date: Fri, 21 Nov 2003 00:18:54 -0500 > From: "Michael Benjamin" <msb1129@bellsouth.net> > Subject: [BioC] Manova nuances > To: <bioconductor@stat.math.ethz.ch> > Message-ID: <003401c3afee$f7eff000$7a05fea9@amd> > Content-Type: text/plain; charset="US-ASCII" > > > Anybody here using manova? It's powerful and pretty fast, > but I'm finding that you can't have more variables than > samples (limits its applicability to microarray research). > Is there any way around this? Assume > > dim(eset) > > 1200 35 > > transeset<-t(eset) > fit<-manova(transeset ~ categories) > summary(fit) > > There is probably a complicated mathematical truth that > underlies this limitation--if anybody can shed some light, > that would be great. > > Also, if anyone knows of a quick, free multivariate tool that > summarizes all the tests into a single test statistic, that > would be much appreciated. > > Regards, > Michael Benjamin, MD > Emory University > Winship Cancer Institute > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo> /bioconductor >
0
Entering edit mode
Yes, Dr. Liaw is closer to my desired goal of one summary statistic for a large number of variables. Is that possible with PCA? I appreciate Dr. Baker's thoughts, though--I think PCA and manova have similarities. -----Original Message----- From: Liaw, Andy [mailto:andy_liaw@merck.com] Sent: Friday, November 21, 2003 8:13 AM To: 'Stephen P. Baker'; bioconductor@stat.math.ethz.ch Cc: 'msb1129@bellsouth.net' Subject: RE: [BioC] Re: Manova nuances > From: Stephen P. Baker [mailto:stephen.baker@umassmed.edu] > > Principle component analyses should reduce your data array to ^^^^^^^^^ Principal > as many independent components as you have samples, and for > each sample get a score for each dimension. These will have > the same total information as the original data. These can > then be analysed separately with univariate anova but since > these are "orthogonal" analyses, multiple comparisons > adjustments would not be needed. The analysis you described is quite different than MANOVA, so the conclusion/interpretation would be quite different, too. MANOVA treats the data as coming from multivariate normal distribution, and tests whether all groups have the same mean vector. What you described is n (number of samples) ANOVA analyses that gives n p-values. Cheers, Andy Andy Liaw, PhD Biometrics Research PO Box 2000, RY33-300 Merck Research Labs Rahway, NJ 07065 mailto:andy_liaw@merck.com 732-594-0820 > -.- -.. .---- .--. ..-. > Stephen P. Baker, MScPH , PhD(ABD) (508) 856-2625 > Senior Biostatistician > (775) 254-4885 fax > Academic Computing Services > Lecturer in Biostatistics , Graduate School of Biomedical > Sciences University of Massachusetts Medical School > 55 Lake Avenue North > stephen.baker@umassmed.edu > Worcester, MA 01655 USA > -------------------------------------------------------------- > -------------- > ---- > Date: Fri, 21 Nov 2003 00:18:54 -0500 > From: "Michael Benjamin" <msb1129@bellsouth.net> > Subject: [BioC] Manova nuances > To: <bioconductor@stat.math.ethz.ch> > Message-ID: <003401c3afee$f7eff000$7a05fea9@amd> > Content-Type: text/plain; charset="US-ASCII" > > > Anybody here using manova? It's powerful and pretty fast, > but I'm finding that you can't have more variables than > samples (limits its applicability to microarray research). > Is there any way around this? Assume > > dim(eset) > > 1200 35 > > transeset<-t(eset) > fit<-manova(transeset ~ categories) > summary(fit) > > There is probably a complicated mathematical truth that > underlies this limitation--if anybody can shed some light, > that would be great. > > Also, if anyone knows of a quick, free multivariate tool that > summarizes all the tests into a single test statistic, that > would be much appreciated. > > Regards, > Michael Benjamin, MD > Emory University > Winship Cancer Institute > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo> /bioconductor >
0
Entering edit mode
@baker-stephen-469
Last seen 7.1 years ago
Andy et al. (Thanks for correcting my typo on the spelling of "principal"). Yes I know that ANOVA of n principal components will result in n p-values, however the SMALLEST p-value will be equivalent to a multivariate test of his hypotheses on his data. MANOVA and univariate ANOVA on the principal components are essentially equivalent in theory and quite similar in that both approaches involve the characteristic roots and functions of the same design and covariance matrices. The equivalence is based on the fact that multivariate hypotheses will be rejected only if the equivalent univariate hypotheses do not hold for all variates (Morrison,1976). Principle components simply transforms the original variates into new variates which conserve all the original information. Michael Benjamin's problem is that he CANNOT run MANOVA as he has fewer cases than variates however my suggested approach WOULD work. With regards to Michael's request/need for a SINGLE SUMMARY STATISTIC, he would use the minimum of the p-values for the appropriate effect from the univariate ANOVA's on the principal components as his single p-value. These are orthogonal tests and the minimum would be equivalent to testing the same hypotheses with MANOVA on his dataset. The only caveat is that with K genes and n<k and="" he="" will="" be="" able="" to="" test="" his="" hypotheses="" on="" the="" first="" n="" principal="" components="" which="" account="" for="" the="" largest="" portions="" of="" the="" variation.="" however,="" in="" my="" 20="" years="" experience,="" in="" most="" datasets="" the="" number="" of="" "significant"="" components="" (with="" eigenvalues="">1) is usually much smaller than the number of variates. It would be unusual for any real biological effect to not be represented among one or more of the first n components given n is not too small. In his case that's 35 and I think that's probably enough. Best wishes Stephen -.- -.. .---- .--. ..-. Stephen P. Baker, MScPH, PhD (ABD) (508) 856-2625 Sr. Biostatistician- Information Services Lecturer in Biostatistics (775) 254-4885 fax Graduate School of Biomedical Sciences University of Massachusetts Medical School, Worcester 55 Lake Avenue North stephen.baker@umassmed.edu Worcester, MA 01655 USA -----Original Message----- From: Liaw, Andy [mailto:andy_liaw@merck.com] Sent: Friday, November 21, 2003 8:13 AM To: Baker, Stephen; bioconductor@stat.math.ethz.ch Cc: 'msb1129@bellsouth.net' Subject: RE: [BioC] Re: Manova nuances > From: Stephen P. Baker [mailto:stephen.baker@umassmed.edu] > > Principle component analyses should reduce your data array to ^^^^^^^^^ Principal > as many independent components as you have samples, and for > each sample get a score for each dimension. These will have > the same total information as the original data. These can > then be analysed separately with univariate anova but since > these are "orthogonal" analyses, multiple comparisons > adjustments would not be needed. The analysis you described is quite different than MANOVA, so the conclusion/interpretation would be quite different, too. MANOVA treats the data as coming from multivariate normal distribution, and tests whether all groups have the same mean vector. What you described is n (number of samples) ANOVA analyses that gives n p-values. Cheers, Andy Andy Liaw, PhD Biometrics Research PO Box 2000, RY33-300 Merck Research Labs Rahway, NJ 07065 mailto:andy_liaw@merck.com 732-594-0820 > -.- -.. .---- .--. ..-. > Stephen P. Baker, MScPH , PhD(ABD) (508) 856-2625 > Senior Biostatistician > (775) 254-4885 fax > Academic Computing Services > Lecturer in Biostatistics , Graduate School of Biomedical > Sciences University of Massachusetts Medical School > 55 Lake Avenue North > stephen.baker@umassmed.edu > Worcester, MA 01655 USA > -------------------------------------------------------------- > -------------- > ---- > Date: Fri, 21 Nov 2003 00:18:54 -0500 > From: "Michael Benjamin" <msb1129@bellsouth.net> > Subject: [BioC] Manova nuances > To: <bioconductor@stat.math.ethz.ch> > Message-ID: <003401c3afee$f7eff000$7a05fea9@amd> > Content-Type: text/plain; charset="US-ASCII" > > > Anybody here using manova? It's powerful and pretty fast, > but I'm finding that you can't have more variables than > samples (limits its applicability to microarray research). > Is there any way around this? Assume > > dim(eset) > > 1200 35 > > transeset<-t(eset) > fit<-manova(transeset ~ categories) > summary(fit) > > There is probably a complicated mathematical truth that > underlies this limitation--if anybody can shed some light, > that would be great. > > Also, if anyone knows of a quick, free multivariate tool that > summarizes all the tests into a single test statistic, that > would be much appreciated. > > Regards, > Michael Benjamin, MD > Emory University > Winship Cancer Institute > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo> /bioconductor >
0
Entering edit mode
Can I do instead: comps1<-svd(teset[group1])$d comps2<-svd(teset[group2])$d t.test(comps1,comps2) Maybe I could just compare the top two or three components to one another? Mike -----Original Message----- From: Baker, Stephen [mailto:Stephen.Baker@umassmed.edu] Sent: Friday, November 21, 2003 3:05 PM To: Liaw, Andy; bioconductor@stat.math.ethz.ch Cc: msb1129@bellsouth.net Subject: RE: [BioC] Re: Manova nuances Andy et al. (Thanks for correcting my typo on the spelling of "principal"). Yes I know that ANOVA of n principal components will result in n p-values, however the SMALLEST p-value will be equivalent to a multivariate test of his hypotheses on his data. MANOVA and univariate ANOVA on the principal components are essentially equivalent in theory and quite similar in that both approaches involve the characteristic roots and functions of the same design and covariance matrices. The equivalence is based on the fact that multivariate hypotheses will be rejected only if the equivalent univariate hypotheses do not hold for all variates (Morrison,1976). Principle components simply transforms the original variates into new variates which conserve all the original information. Michael Benjamin's problem is that he CANNOT run MANOVA as he has fewer cases than variates however my suggested approach WOULD work. With regards to Michael's request/need for a SINGLE SUMMARY STATISTIC, he would use the minimum of the p-values for the appropriate effect from the univariate ANOVA's on the principal components as his single p-value. These are orthogonal tests and the minimum would be equivalent to testing the same hypotheses with MANOVA on his dataset. The only caveat is that with K genes and n<k and="" he="" will="" be="" able="" to="" test="" his="" hypotheses="" on="" the="" first="" n="" principal="" components="" which="" account="" for="" the="" largest="" portions="" of="" the="" variation.="" however,="" in="" my="" 20="" years="" experience,="" in="" most="" datasets="" the="" number="" of="" "significant"="" components="" (with="" eigenvalues="">1) is usually much smaller than the number of variates. It would be unusual for any real biological effect to not be represented among one or more of the first n components given n is not too small. In his case that's 35 and I think that's probably enough. Best wishes Stephen -.- -.. .---- .--. ..-. Stephen P. Baker, MScPH, PhD (ABD) (508) 856-2625 Sr. Biostatistician- Information Services Lecturer in Biostatistics (775) 254-4885 fax Graduate School of Biomedical Sciences University of Massachusetts Medical School, Worcester 55 Lake Avenue North stephen.baker@umassmed.edu Worcester, MA 01655 USA -----Original Message----- From: Liaw, Andy [mailto:andy_liaw@merck.com] Sent: Friday, November 21, 2003 8:13 AM To: Baker, Stephen; bioconductor@stat.math.ethz.ch Cc: 'msb1129@bellsouth.net' Subject: RE: [BioC] Re: Manova nuances > From: Stephen P. Baker [mailto:stephen.baker@umassmed.edu] > > Principle component analyses should reduce your data array to ^^^^^^^^^ Principal > as many independent components as you have samples, and for > each sample get a score for each dimension. These will have > the same total information as the original data. These can > then be analysed separately with univariate anova but since > these are "orthogonal" analyses, multiple comparisons > adjustments would not be needed. The analysis you described is quite different than MANOVA, so the conclusion/interpretation would be quite different, too. MANOVA treats the data as coming from multivariate normal distribution, and tests whether all groups have the same mean vector. What you described is n (number of samples) ANOVA analyses that gives n p-values. Cheers, Andy Andy Liaw, PhD Biometrics Research PO Box 2000, RY33-300 Merck Research Labs Rahway, NJ 07065 mailto:andy_liaw@merck.com 732-594-0820 > -.- -.. .---- .--. ..-. > Stephen P. Baker, MScPH , PhD(ABD) (508) 856-2625 > Senior Biostatistician > (775) 254-4885 fax > Academic Computing Services > Lecturer in Biostatistics , Graduate School of Biomedical > Sciences University of Massachusetts Medical School > 55 Lake Avenue North > stephen.baker@umassmed.edu > Worcester, MA 01655 USA > -------------------------------------------------------------- > -------------- > ---- > Date: Fri, 21 Nov 2003 00:18:54 -0500 > From: "Michael Benjamin" <msb1129@bellsouth.net> > Subject: [BioC] Manova nuances > To: <bioconductor@stat.math.ethz.ch> > Message-ID: <003401c3afee$f7eff000$7a05fea9@amd> > Content-Type: text/plain; charset="US-ASCII" > > > Anybody here using manova? It's powerful and pretty fast, > but I'm finding that you can't have more variables than > samples (limits its applicability to microarray research). > Is there any way around this? Assume > > dim(eset) > > 1200 35 > > transeset<-t(eset) > fit<-manova(transeset ~ categories) > summary(fit) > > There is probably a complicated mathematical truth that > underlies this limitation--if anybody can shed some light, > that would be great. > > Also, if anyone knows of a quick, free multivariate tool that > summarizes all the tests into a single test statistic, that > would be much appreciated. > > Regards, > Michael Benjamin, MD > Emory University > Winship Cancer Institute > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo> /bioconductor >
0
Entering edit mode
@baker-stephen-469
Last seen 7.1 years ago
The principal components are orthogonal and independent and measure different things so it makes no sense to compare them, like comparing horizontal to vertical or heart rate to IQ. Treat the components like variables and perform the same analysis on them with ANOVA that you would have with MANOVA. Like 2 groups, do a t-test, 3 do ANOVA, whatever analysis is appropriate for your experimental design. If ANY of these are significant, the MANOVA would have been significant. Stephen -.- -.. .---- .--. ..-. Stephen P. Baker, MScPH , PhD(ABD) (508) 856-2625 Senior Biostatistician (775) 254-4885 fax Academic Computing Services Lecturer in Biostatistics , Graduate School of Biomedical Sciences University of Massachusetts Medical School 55 Lake Avenue North stephen.baker@umassmed.edu Worcester, MA 01655 USA ----- Original Message ----- From: "Michael Benjamin" <msb1129@bellsouth.net> To: "'Baker, Stephen'" <stephen.baker@umassmed.edu>; "'Liaw, Andy'" <andy_liaw@merck.com>; <bioconductor@stat.math.ethz.ch> Sent: Friday, November 21, 2003 11:11 PM Subject: RE: [BioC] Re: Manova nuances > Can I do instead: > comps1<-svd(teset[group1])$d > comps2<-svd(teset[group2])$d > t.test(comps1,comps2) > > Maybe I could just compare the top two or three components to one > another? > > Mike > > -----Original Message----- > From: Baker, Stephen [mailto:Stephen.Baker@umassmed.edu] > Sent: Friday, November 21, 2003 3:05 PM > To: Liaw, Andy; bioconductor@stat.math.ethz.ch > Cc: msb1129@bellsouth.net > Subject: RE: [BioC] Re: Manova nuances > > Andy et al. > > (Thanks for correcting my typo on the spelling of "principal"). > > Yes I know that ANOVA of n principal components will result in n > p-values, however the SMALLEST p-value will be equivalent to a > multivariate test of his hypotheses on his data. > > MANOVA and univariate ANOVA on the principal components are essentially > equivalent in theory and quite similar in that both approaches involve > the characteristic roots and functions of the same design and covariance > matrices. > > The equivalence is based on the fact that multivariate hypotheses will > be rejected only if the equivalent univariate hypotheses do not hold for > all variates (Morrison,1976). Principle components simply transforms > the original variates into new variates which conserve all the original > information. Michael Benjamin's problem is that he CANNOT run MANOVA as > he has fewer cases than variates however my suggested approach WOULD > work. > > With regards to Michael's request/need for a SINGLE SUMMARY STATISTIC, > he would use the minimum of the p-values for the appropriate effect from > the univariate ANOVA's on the principal components as his single > p-value. These are orthogonal tests and the minimum would be equivalent > to testing the same hypotheses with MANOVA on his dataset. > > > The only caveat is that with K genes and n<k and="" he="" will="" be="" able="" to="" test=""> his hypotheses on the first n principal components which account for the > largest portions of the variation. However, in my 20 years experience, > in most datasets the number of "significant" components (with > eigenvalues >1) is usually much smaller than the number of variates. It > would be unusual for any real biological effect to not be represented > among one or more of the first n components given n is not too small. In > his case that's 35 and I think that's probably enough. > > Best wishes > Stephen > > -.- -.. .---- .--. ..-. > Stephen P. Baker, MScPH, PhD (ABD) (508) 856-2625 > Sr. Biostatistician- Information Services > Lecturer in Biostatistics (775) 254-4885 fax > Graduate School of Biomedical Sciences > University of Massachusetts Medical School, Worcester > 55 Lake Avenue North stephen.baker@umassmed.edu > Worcester, MA 01655 USA > > > > -----Original Message----- > From: Liaw, Andy [mailto:andy_liaw@merck.com] > Sent: Friday, November 21, 2003 8:13 AM > To: Baker, Stephen; bioconductor@stat.math.ethz.ch > Cc: 'msb1129@bellsouth.net' > Subject: RE: [BioC] Re: Manova nuances > > > > From: Stephen P. Baker [mailto:stephen.baker@umassmed.edu] > > > > Principle component analyses should reduce your data array to > ^^^^^^^^^ > Principal > > > as many independent components as you have samples, and for > > each sample get a score for each dimension. These will have > > the same total information as the original data. These can > > then be analysed separately with univariate anova but since > > these are "orthogonal" analyses, multiple comparisons > > adjustments would not be needed. > > The analysis you described is quite different than MANOVA, so > the conclusion/interpretation would be quite different, too. MANOVA > treats the data as coming from multivariate normal distribution, and > tests whether all groups have the same mean vector. What you described > is n (number of samples) ANOVA analyses that gives n p-values. > > Cheers, > Andy > Andy Liaw, PhD > Biometrics Research PO Box 2000, RY33-300 > Merck Research Labs Rahway, NJ 07065 > mailto:andy_liaw@merck.com 732-594-0820 > > > > > -.- -.. .---- .--. ..-. > > Stephen P. Baker, MScPH , PhD(ABD) (508) 856-2625 > > Senior Biostatistician > > (775) 254-4885 fax > > Academic Computing Services > > Lecturer in Biostatistics , Graduate School of Biomedical > > Sciences University of Massachusetts Medical School > > 55 Lake Avenue North > > stephen.baker@umassmed.edu > > Worcester, MA 01655 USA > > -------------------------------------------------------------- > > -------------- > > ---- > > Date: Fri, 21 Nov 2003 00:18:54 -0500 > > From: "Michael Benjamin" <msb1129@bellsouth.net> > > Subject: [BioC] Manova nuances > > To: <bioconductor@stat.math.ethz.ch> > > Message-ID: <003401c3afee$f7eff000$7a05fea9@amd> > > Content-Type: text/plain; charset="US-ASCII" > > > > > > Anybody here using manova? It's powerful and pretty fast, > > but I'm finding that you can't have more variables than > > samples (limits its applicability to microarray research). > > Is there any way around this? Assume > > > > dim(eset) > > > > 1200 35 > > > > transeset<-t(eset) > > fit<-manova(transeset ~ categories) > > summary(fit) > > > > There is probably a complicated mathematical truth that > > underlies this limitation--if anybody can shed some light, > > that would be great. > > > > Also, if anyone knows of a quick, free multivariate tool that > > summarizes all the tests into a single test statistic, that > > would be much appreciated. > > > > Regards, > > Michael Benjamin, MD > > Emory University > > Winship Cancer Institute > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@stat.math.ethz.ch > > https://www.stat.math.ethz.ch/mailman/listinfo> /bioconductor > > > > >
0
Entering edit mode
This sounds very reasonable. I'm having a bit of trouble with the implementation. How do you solve for a variable? I know u * diag(d) * t(v) gets your original data, but how do you pick variables? Just taking the first four columns of u or v alone doesn't work--I tried. There must be a way to combine d, u, and v to represent the first few variables in low-dimensional space. In other words, after you do svd, then what? What do you compare? U? V? Thanks, Mike -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch [mailto:bioconductor-bounces@stat.math.ethz.ch] On Behalf Of Stephen P. Baker Sent: Saturday, November 22, 2003 8:57 AM To: Michael Benjamin; bioconductor@stat.math.ethz.ch Subject: Re: [BioC] Re: Manova nuances The principal components are orthogonal and independent and measure different things so it makes no sense to compare them, like comparing horizontal to vertical or heart rate to IQ. Treat the components like variables and perform the same analysis on them with ANOVA that you would have with MANOVA. Like 2 groups, do a t-test, 3 do ANOVA, whatever analysis is appropriate for your experimental design. If ANY of these are significant, the MANOVA would have been significant. Stephen -.- -.. .---- .--. ..-. Stephen P. Baker, MScPH , PhD(ABD) (508) 856-2625 Senior Biostatistician (775) 254-4885 fax Academic Computing Services Lecturer in Biostatistics , Graduate School of Biomedical Sciences University of Massachusetts Medical School 55 Lake Avenue North stephen.baker@umassmed.edu Worcester, MA 01655 USA ----- Original Message ----- From: "Michael Benjamin" <msb1129@bellsouth.net> To: "'Baker, Stephen'" <stephen.baker@umassmed.edu>; "'Liaw, Andy'" <andy_liaw@merck.com>; <bioconductor@stat.math.ethz.ch> Sent: Friday, November 21, 2003 11:11 PM Subject: RE: [BioC] Re: Manova nuances > Can I do instead: > comps1<-svd(teset[group1])$d > comps2<-svd(teset[group2])$d > t.test(comps1,comps2) > > Maybe I could just compare the top two or three components to one > another? > > Mike > > -----Original Message----- > From: Baker, Stephen [mailto:Stephen.Baker@umassmed.edu] > Sent: Friday, November 21, 2003 3:05 PM > To: Liaw, Andy; bioconductor@stat.math.ethz.ch > Cc: msb1129@bellsouth.net > Subject: RE: [BioC] Re: Manova nuances > > Andy et al. > > (Thanks for correcting my typo on the spelling of "principal"). > > Yes I know that ANOVA of n principal components will result in n > p-values, however the SMALLEST p-value will be equivalent to a > multivariate test of his hypotheses on his data. > > MANOVA and univariate ANOVA on the principal components are essentially > equivalent in theory and quite similar in that both approaches involve > the characteristic roots and functions of the same design and covariance > matrices. > > The equivalence is based on the fact that multivariate hypotheses will > be rejected only if the equivalent univariate hypotheses do not hold for > all variates (Morrison,1976). Principle components simply transforms > the original variates into new variates which conserve all the original > information. Michael Benjamin's problem is that he CANNOT run MANOVA as > he has fewer cases than variates however my suggested approach WOULD > work. > > With regards to Michael's request/need for a SINGLE SUMMARY STATISTIC, > he would use the minimum of the p-values for the appropriate effect from > the univariate ANOVA's on the principal components as his single > p-value. These are orthogonal tests and the minimum would be equivalent > to testing the same hypotheses with MANOVA on his dataset. > > > The only caveat is that with K genes and n<k and="" he="" will="" be="" able="" to="" test=""> his hypotheses on the first n principal components which account for the > largest portions of the variation. However, in my 20 years experience, > in most datasets the number of "significant" components (with > eigenvalues >1) is usually much smaller than the number of variates. It > would be unusual for any real biological effect to not be represented > among one or more of the first n components given n is not too small. In > his case that's 35 and I think that's probably enough. > > Best wishes > Stephen > > -.- -.. .---- .--. ..-. > Stephen P. Baker, MScPH, PhD (ABD) (508) 856-2625 > Sr. Biostatistician- Information Services > Lecturer in Biostatistics (775) 254-4885 fax > Graduate School of Biomedical Sciences > University of Massachusetts Medical School, Worcester > 55 Lake Avenue North stephen.baker@umassmed.edu > Worcester, MA 01655 USA > > > > -----Original Message----- > From: Liaw, Andy [mailto:andy_liaw@merck.com] > Sent: Friday, November 21, 2003 8:13 AM > To: Baker, Stephen; bioconductor@stat.math.ethz.ch > Cc: 'msb1129@bellsouth.net' > Subject: RE: [BioC] Re: Manova nuances > > > > From: Stephen P. Baker [mailto:stephen.baker@umassmed.edu] > > > > Principle component analyses should reduce your data array to > ^^^^^^^^^ > Principal > > > as many independent components as you have samples, and for > > each sample get a score for each dimension. These will have > > the same total information as the original data. These can > > then be analysed separately with univariate anova but since > > these are "orthogonal" analyses, multiple comparisons > > adjustments would not be needed. > > The analysis you described is quite different than MANOVA, so > the conclusion/interpretation would be quite different, too. MANOVA > treats the data as coming from multivariate normal distribution, and > tests whether all groups have the same mean vector. What you described > is n (number of samples) ANOVA analyses that gives n p-values. > > Cheers, > Andy > Andy Liaw, PhD > Biometrics Research PO Box 2000, RY33-300 > Merck Research Labs Rahway, NJ 07065 > mailto:andy_liaw@merck.com 732-594-0820 > > > > > -.- -.. .---- .--. ..-. > > Stephen P. Baker, MScPH , PhD(ABD) (508) 856-2625 > > Senior Biostatistician > > (775) 254-4885 fax > > Academic Computing Services > > Lecturer in Biostatistics , Graduate School of Biomedical > > Sciences University of Massachusetts Medical School > > 55 Lake Avenue North > > stephen.baker@umassmed.edu > > Worcester, MA 01655 USA > > -------------------------------------------------------------- > > -------------- > > ---- > > Date: Fri, 21 Nov 2003 00:18:54 -0500 > > From: "Michael Benjamin" <msb1129@bellsouth.net> > > Subject: [BioC] Manova nuances > > To: <bioconductor@stat.math.ethz.ch> > > Message-ID: <003401c3afee$f7eff000$7a05fea9@amd> > > Content-Type: text/plain; charset="US-ASCII" > > > > > > Anybody here using manova? It's powerful and pretty fast, > > but I'm finding that you can't have more variables than > > samples (limits its applicability to microarray research). > > Is there any way around this? Assume > > > > dim(eset) > > > > 1200 35 > > > > transeset<-t(eset) > > fit<-manova(transeset ~ categories) > > summary(fit) > > > > There is probably a complicated mathematical truth that > > underlies this limitation--if anybody can shed some light, > > that would be great. > > > > Also, if anyone knows of a quick, free multivariate tool that > > summarizes all the tests into a single test statistic, that > > would be much appreciated. > > > > Regards, > > Michael Benjamin, MD > > Emory University > > Winship Cancer Institute > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@stat.math.ethz.ch > > https://www.stat.math.ethz.ch/mailman/listinfo> /bioconductor > > > > > _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
0
Entering edit mode
@baker-stephen-469
Last seen 7.1 years ago
Michael, One of the matrices output by svd should have dimensions = k by n or n by k where k is number of original variates and n the number of chips, this would be a set of eigenvectors. This matrix times the vector of expression levels from one chip should produce a vector of length n with values for the new components. There should also be either a vector or diagonal matrix ouput with values that are in descending order; the first corresponds to the variation in the first component, the second the next highest, etc., these should be the eigenvalues. Eigenvalues greater than 1 can be interpreted as indicating that component accounts for a significant portion of the variation in the original variables. -.- -.. .---- .--. ..-. Stephen P. Baker, MScPH , PhD(ABD) (508) 856-2625 Senior Biostatistician (775) 254-4885 fax Academic Computing Services Lecturer in Biostatistics , Graduate School of Biomedical Sciences University of Massachusetts Medical School 55 Lake Avenue North stephen.baker@umassmed.edu Worcester, MA 01655 USA ----- Original Message ----- From: "Michael Benjamin" <msb1129@bellsouth.net> To: "'Stephen P. Baker'" <stephen.baker@umassmed.edu>; <bioconductor@stat.math.ethz.ch> Sent: Sunday, November 23, 2003 8:53 PM Subject: RE: [BioC] Re: Manova nuances > This sounds very reasonable. I'm having a bit of trouble with the > implementation. > > How do you solve for a variable? I know u * diag(d) * t(v) gets your > original data, but how do you pick variables? Just taking the first > four columns of u or v alone doesn't work--I tried. There must be a way > to combine d, u, and v to represent the first few variables in > low-dimensional space. > > In other words, after you do svd, then what? What do you compare? U? V? > > Thanks, > Mike > > > -----Original Message----- > From: bioconductor-bounces@stat.math.ethz.ch > [mailto:bioconductor-bounces@stat.math.ethz.ch] On Behalf Of Stephen P. > Baker > Sent: Saturday, November 22, 2003 8:57 AM > To: Michael Benjamin; bioconductor@stat.math.ethz.ch > Subject: Re: [BioC] Re: Manova nuances > > The principal components are orthogonal and independent and measure > different things so it makes no sense to compare them, like comparing > horizontal to vertical or heart rate to IQ. > > Treat the components like variables and perform the same analysis on > them > with ANOVA that you would have with MANOVA. Like 2 groups, do a t-test, > 3 > do ANOVA, whatever analysis is appropriate for your experimental design. > If > ANY of these are significant, the MANOVA would have been significant. > > Stephen > -.- -.. .---- .--. ..-. > Stephen P. Baker, MScPH , PhD(ABD) (508) 856-2625 > Senior Biostatistician > (775) 254-4885 fax > Academic Computing Services > Lecturer in Biostatistics , Graduate School of Biomedical Sciences > University of Massachusetts Medical School > 55 Lake Avenue North stephen.baker@umassmed.edu > Worcester, MA 01655 USA > > ----- Original Message ----- > From: "Michael Benjamin" <msb1129@bellsouth.net> > To: "'Baker, Stephen'" <stephen.baker@umassmed.edu>; "'Liaw, Andy'" > <andy_liaw@merck.com>; <bioconductor@stat.math.ethz.ch> > Sent: Friday, November 21, 2003 11:11 PM > Subject: RE: [BioC] Re: Manova nuances > > > > Can I do instead: > > comps1<-svd(teset[group1])$d > > comps2<-svd(teset[group2])$d > > t.test(comps1,comps2) > > > > Maybe I could just compare the top two or three components to one > > another? > > > > Mike > > > > -----Original Message----- > > From: Baker, Stephen [mailto:Stephen.Baker@umassmed.edu] > > Sent: Friday, November 21, 2003 3:05 PM > > To: Liaw, Andy; bioconductor@stat.math.ethz.ch > > Cc: msb1129@bellsouth.net > > Subject: RE: [BioC] Re: Manova nuances > > > > Andy et al. > > > > (Thanks for correcting my typo on the spelling of "principal"). > > > > Yes I know that ANOVA of n principal components will result in n > > p-values, however the SMALLEST p-value will be equivalent to a > > multivariate test of his hypotheses on his data. > > > > MANOVA and univariate ANOVA on the principal components are > essentially > > equivalent in theory and quite similar in that both approaches involve > > the characteristic roots and functions of the same design and > covariance > > matrices. > > > > The equivalence is based on the fact that multivariate hypotheses will > > be rejected only if the equivalent univariate hypotheses do not hold > for > > all variates (Morrison,1976). Principle components simply transforms > > the original variates into new variates which conserve all the > original > > information. Michael Benjamin's problem is that he CANNOT run MANOVA > as > > he has fewer cases than variates however my suggested approach WOULD > > work. > > > > With regards to Michael's request/need for a SINGLE SUMMARY STATISTIC, > > he would use the minimum of the p-values for the appropriate effect > from > > the univariate ANOVA's on the principal components as his single > > p-value. These are orthogonal tests and the minimum would be > equivalent > > to testing the same hypotheses with MANOVA on his dataset. > > > > > > The only caveat is that with K genes and n<k and="" he="" will="" be="" able="" to=""> test > > his hypotheses on the first n principal components which account for > the > > largest portions of the variation. However, in my 20 years > experience, > > in most datasets the number of "significant" components (with > > eigenvalues >1) is usually much smaller than the number of variates. > It > > would be unusual for any real biological effect to not be represented > > among one or more of the first n components given n is not too small. > In > > his case that's 35 and I think that's probably enough. > > > > Best wishes > > Stephen > > > > -.- -.. .---- .--. ..-. > > Stephen P. Baker, MScPH, PhD (ABD) (508) 856-2625 > > Sr. Biostatistician- Information Services > > Lecturer in Biostatistics (775) 254-4885 fax > > Graduate School of Biomedical Sciences > > University of Massachusetts Medical School, Worcester > > 55 Lake Avenue North > stephen.baker@umassmed.edu > > Worcester, MA 01655 USA > > > > > > > > -----Original Message----- > > From: Liaw, Andy [mailto:andy_liaw@merck.com] > > Sent: Friday, November 21, 2003 8:13 AM > > To: Baker, Stephen; bioconductor@stat.math.ethz.ch > > Cc: 'msb1129@bellsouth.net' > > Subject: RE: [BioC] Re: Manova nuances > > > > > > > From: Stephen P. Baker [mailto:stephen.baker@umassmed.edu] > > > > > > Principle component analyses should reduce your data array to > > ^^^^^^^^^ > > Principal > > > > > as many independent components as you have samples, and for > > > each sample get a score for each dimension. These will have > > > the same total information as the original data. These can > > > then be analysed separately with univariate anova but since > > > these are "orthogonal" analyses, multiple comparisons > > > adjustments would not be needed. > > > > The analysis you described is quite different than MANOVA, so > > the conclusion/interpretation would be quite different, too. MANOVA > > treats the data as coming from multivariate normal distribution, and > > tests whether all groups have the same mean vector. What you > described > > is n (number of samples) ANOVA analyses that gives n p-values. > > > > Cheers, > > Andy > > Andy Liaw, PhD > > Biometrics Research PO Box 2000, RY33-300 > > Merck Research Labs Rahway, NJ 07065 > > mailto:andy_liaw@merck.com 732-594-0820 > > > > > > > > > -.- -.. .---- .--. ..-. > > > Stephen P. Baker, MScPH , PhD(ABD) (508) > 856-2625 > > > Senior Biostatistician > > > (775) 254-4885 fax > > > Academic Computing Services > > > Lecturer in Biostatistics , Graduate School of Biomedical > > > Sciences University of Massachusetts Medical School > > > 55 Lake Avenue North > > > stephen.baker@umassmed.edu > > > Worcester, MA 01655 USA > > > -------------------------------------------------------------- > > > -------------- > > > ---- > > > Date: Fri, 21 Nov 2003 00:18:54 -0500 > > > From: "Michael Benjamin" <msb1129@bellsouth.net> > > > Subject: [BioC] Manova nuances > > > To: <bioconductor@stat.math.ethz.ch> > > > Message-ID: <003401c3afee$f7eff000$7a05fea9@amd> > > > Content-Type: text/plain; charset="US-ASCII" > > > > > > > > > Anybody here using manova? It's powerful and pretty fast, > > > but I'm finding that you can't have more variables than > > > samples (limits its applicability to microarray research). > > > Is there any way around this? Assume > > > > > > dim(eset) > > > > > > 1200 35 > > > > > > transeset<-t(eset) > > > fit<-manova(transeset ~ categories) > > > summary(fit) > > > > > > There is probably a complicated mathematical truth that > > > underlies this limitation--if anybody can shed some light, > > > that would be great. > > > > > > Also, if anyone knows of a quick, free multivariate tool that > > > summarizes all the tests into a single test statistic, that > > > would be much appreciated. > > > > > > Regards, > > > Michael Benjamin, MD > > > Emory University > > > Winship Cancer Institute > > > > > > _______________________________________________ > > > Bioconductor mailing list > > > Bioconductor@stat.math.ethz.ch > > > https://www.stat.math.ethz.ch/mailman/listinfo> /bioconductor > > > > > > > > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor > > >
0
Entering edit mode
Liaw, Andy ▴ 360
@liaw-andy-125
Last seen 7.1 years ago
I did a bit of sanity check. If anyone can tell me what I'm missing, I'll be very grateful. I did a small simulation to check the empirical level (type I error) of the procedure as described by Dr. Baker, compared with the MANOVA tests available in summary.manova(). The setup is as follows (R code shown below): o Simple oneway layout with three groups. n=50 in each group. o Response is 10, 20 or 30 dimensional, independent standard normal. o Repeat the simulation 1000 times and count the percent of times null is rejected. The result looks like this (took 9 minutes on my P3 933MHz laptop): Dr. Baker's suggested procedure: > apply(pval.pc, 2, function(x) mean(x <= 0.05)) 10 20 30 0.439 0.638 0.810 Pillai's trace from summary.manova: > apply(pval.m, 2, function(x) mean(x <= 0.05)) 10 20 30 0.039 0.052 0.055 The problem with using minimum of p-values for univariate ANOVA on PCs, as I see it, is the absense of multiplicity adjustment, which is taken care of in the multivariate test. I suppose one way to "fix" it is to use p.adjust(). Here's the R code for the simulation: ============================================== n <- 150 p <- c(10, 20, 30) x <- factor(rep(letters[1:3], each=n/3)) nreps <- 1000 pval.pc <- pval.m <- matrix(NA, nreps, length(p), dimnames=list(NULL, p)) system.time( for (i in 1:length(p)) { for (j in 1:nreps) { y <- matrix(rnorm(n*p[i]), n, p[i]) y.pc <- princomp(y)$scores pval.pc[j, i] <- min(sapply(summary(aov(y.pc ~ x)), function(x) x["x", "Pr(>F)"])) pval.m[j, i] <- summary(manova(y ~ x))$stat["x", "Pr(>F)"] } }) ============================================= Best, Andy > From: Stephen P. Baker > > Michael, > One of the matrices output by svd should have dimensions = k > by n or n by k where k is number of original variates and n > the number of chips, this would be a set of eigenvectors. > This matrix times the vector of expression levels from one > chip should produce a vector of length n with values for the > new components. There should also be either a vector or > diagonal matrix ouput with values that are in descending > order; the first corresponds to the variation in the first > component, the second the next highest, etc., these should be > the eigenvalues. Eigenvalues greater than 1 can be > interpreted as indicating that component accounts for a > significant portion of the variation in the original variables. > > -.- -.. .---- .--. ..-. > Stephen P. Baker, MScPH , PhD(ABD) (508) 856-2625 > Senior Biostatistician > (775) 254-4885 fax > Academic Computing Services > Lecturer in Biostatistics , Graduate School of Biomedical > Sciences University of Massachusetts Medical School > 55 Lake Avenue North > stephen.baker@umassmed.edu > Worcester, MA 01655 USA > > ----- Original Message ----- > From: "Michael Benjamin" <msb1129@bellsouth.net> > To: "'Stephen P. Baker'" <stephen.baker@umassmed.edu>; > <bioconductor@stat.math.ethz.ch> > Sent: Sunday, November 23, 2003 8:53 PM > Subject: RE: [BioC] Re: Manova nuances > > > > This sounds very reasonable. I'm having a bit of trouble with the > > implementation. > > > > How do you solve for a variable? I know u * diag(d) * t(v) > gets your > > original data, but how do you pick variables? Just taking > the first > > four columns of u or v alone doesn't work--I tried. There > must be a > > way to combine d, u, and v to represent the first few variables in > > low-dimensional space. > > > > In other words, after you do svd, then what? What do you > compare? U? > > V? > > > > Thanks, > > Mike > > > > > > -----Original Message----- > > From: bioconductor-bounces@stat.math.ethz.ch > > [mailto:bioconductor-bounces@stat.math.ethz.ch] On Behalf > Of Stephen > > P. Baker > > Sent: Saturday, November 22, 2003 8:57 AM > > To: Michael Benjamin; bioconductor@stat.math.ethz.ch > > Subject: Re: [BioC] Re: Manova nuances > > > > The principal components are orthogonal and independent and measure > > different things so it makes no sense to compare them, like > comparing > > horizontal to vertical or heart rate to IQ. > > > > Treat the components like variables and perform the same > analysis on > > them with ANOVA that you would have with MANOVA. Like 2 > groups, do a > > t-test, 3 > > do ANOVA, whatever analysis is appropriate for your > experimental design. > > If > > ANY of these are significant, the MANOVA would have been > significant. > > > > Stephen > > -.- -.. .---- .--. ..-. > > Stephen P. Baker, MScPH , PhD(ABD) > (508) 856-2625 > > Senior Biostatistician > > (775) 254-4885 fax > > Academic Computing Services > > Lecturer in Biostatistics , Graduate School of Biomedical Sciences > > University of Massachusetts Medical School > > 55 Lake Avenue North > stephen.baker@umassmed.edu > > Worcester, MA 01655 USA > > > > ----- Original Message ----- > > From: "Michael Benjamin" <msb1129@bellsouth.net> > > To: "'Baker, Stephen'" <stephen.baker@umassmed.edu>; "'Liaw, Andy'" > > <andy_liaw@merck.com>; <bioconductor@stat.math.ethz.ch> > > Sent: Friday, November 21, 2003 11:11 PM > > Subject: RE: [BioC] Re: Manova nuances > > > > > > > Can I do instead: > > > comps1<-svd(teset[group1])$d > > > comps2<-svd(teset[group2])$d > > > t.test(comps1,comps2) > > > > > > Maybe I could just compare the top two or three components to one > > > another? > > > > > > Mike > > > > > > -----Original Message----- > > > From: Baker, Stephen [mailto:Stephen.Baker@umassmed.edu] > > > Sent: Friday, November 21, 2003 3:05 PM > > > To: Liaw, Andy; bioconductor@stat.math.ethz.ch > > > Cc: msb1129@bellsouth.net > > > Subject: RE: [BioC] Re: Manova nuances > > > > > > Andy et al. > > > > > > (Thanks for correcting my typo on the spelling of "principal"). > > > > > > Yes I know that ANOVA of n principal components will result in n > > > p-values, however the SMALLEST p-value will be equivalent to a > > > multivariate test of his hypotheses on his data. > > > > > > MANOVA and univariate ANOVA on the principal components are > > essentially > > > equivalent in theory and quite similar in that both approaches > > > involve the characteristic roots and functions of the same design > > > and > > covariance > > > matrices. > > > > > > The equivalence is based on the fact that multivariate hypotheses > > > will be rejected only if the equivalent univariate > hypotheses do not > > > hold > > for > > > all variates (Morrison,1976). Principle components simply > > > transforms the original variates into new variates which conserve > > > all the > > original > > > information. Michael Benjamin's problem is that he CANNOT > run MANOVA > > as > > > he has fewer cases than variates however my suggested > approach WOULD > > > work. > > > > > > With regards to Michael's request/need for a SINGLE SUMMARY > > > STATISTIC, he would use the minimum of the p-values for the > > > appropriate effect > > from > > > the univariate ANOVA's on the principal components as his single > > > p-value. These are orthogonal tests and the minimum would be > > equivalent > > > to testing the same hypotheses with MANOVA on his dataset. > > > > > > > > > The only caveat is that with K genes and n<k and="" he="" will=""> be able to > > test > > > his hypotheses on the first n principal components which > account for > > the > > > largest portions of the variation. However, in my 20 years > > experience, > > > in most datasets the number of "significant" components (with > > > eigenvalues >1) is usually much smaller than the number > of variates. > > It > > > would be unusual for any real biological effect to not be > > > represented among one or more of the first n components > given n is > > > not too small. > > In > > > his case that's 35 and I think that's probably enough. > > > > > > Best wishes > > > Stephen > > > > > > -.- -.. .---- .--. ..-. > > > Stephen P. Baker, MScPH, PhD (ABD) (508) 856-2625 > > > Sr. Biostatistician- Information Services > > > Lecturer in Biostatistics (775) 254-4885 fax > > > Graduate School of Biomedical Sciences > > > University of Massachusetts Medical School, Worcester > > > 55 Lake Avenue North > > stephen.baker@umassmed.edu > > > Worcester, MA 01655 USA > > > > > > > > > > > > -----Original Message----- > > > From: Liaw, Andy [mailto:andy_liaw@merck.com] > > > Sent: Friday, November 21, 2003 8:13 AM > > > To: Baker, Stephen; bioconductor@stat.math.ethz.ch > > > Cc: 'msb1129@bellsouth.net' > > > Subject: RE: [BioC] Re: Manova nuances > > > > > > > > > > From: Stephen P. Baker [mailto:stephen.baker@umassmed.edu] > > > > > > > > Principle component analyses should reduce your data array to > > > ^^^^^^^^^ > > > Principal > > > > > > > as many independent components as you have samples, and > for each > > > > sample get a score for each dimension. These will have > the same > > > > total information as the original data. These can then be > > > > analysed separately with univariate anova but since these are > > > > "orthogonal" analyses, multiple comparisons adjustments > would not > > > > be needed. > > > > > > The analysis you described is quite different than MANOVA, so the > > > conclusion/interpretation would be quite different, too. MANOVA > > > treats the data as coming from multivariate normal > distribution, and > > > tests whether all groups have the same mean vector. What you > > described > > > is n (number of samples) ANOVA analyses that gives n p-values. > > > > > > Cheers, > > > Andy > > > Andy Liaw, PhD > > > Biometrics Research PO Box 2000, RY33-300 > > > Merck Research Labs Rahway, NJ 07065 > > > mailto:andy_liaw@merck.com 732-594-0820 > > > > > > > > > > > > > -.- -.. .---- .--. ..-. > > > > Stephen P. Baker, MScPH , PhD(ABD) (508) > > 856-2625 > > > > Senior Biostatistician > > > > (775) 254-4885 fax > > > > Academic Computing Services > > > > Lecturer in Biostatistics , Graduate School of > Biomedical Sciences > > > > University of Massachusetts Medical School 55 Lake Avenue North > > > > stephen.baker@umassmed.edu > > > > Worcester, MA 01655 USA > > > > -------------------------------------------------------------- > > > > -------------- > > > > ---- > > > > Date: Fri, 21 Nov 2003 00:18:54 -0500 > > > > From: "Michael Benjamin" <msb1129@bellsouth.net> > > > > Subject: [BioC] Manova nuances > > > > To: <bioconductor@stat.math.ethz.ch> > > > > Message-ID: <003401c3afee$f7eff000$7a05fea9@amd> > > > > Content-Type: text/plain; charset="US-ASCII" > > > > > > > > > > > > Anybody here using manova? It's powerful and pretty > fast, but I'm > > > > finding that you can't have more variables than samples (limits > > > > its applicability to microarray research). Is there any > way around > > > > this? Assume > > > > > > > > dim(eset) > > > > > > > > 1200 35 > > > > > > > > transeset<-t(eset) > > > > fit<-manova(transeset ~ categories) > > > > summary(fit) > > > > > > > > There is probably a complicated mathematical truth that > underlies > > > > this limitation--if anybody can shed some light, that would be > > > > great. > > > > > > > > Also, if anyone knows of a quick, free multivariate tool that > > > > summarizes all the tests into a single test statistic, > that would > > > > be much appreciated. > > > > > > > > Regards, > > > > Michael Benjamin, MD > > > > Emory University > > > > Winship Cancer Institute > > > > > > > > _______________________________________________ > > > > Bioconductor mailing list > > > > Bioconductor@stat.math.ethz.ch > > > > https://www.stat.math.ethz.ch/mailman/listinfo> /bioconductor > > > > > > > > > > > > > > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@stat.math.ethz.ch > > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor > > > > > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo> /bioconductor >
0
Entering edit mode
@baker-stephen-469
Last seen 7.1 years ago
Andy, Michael et al, I found Andy's simulation convincing, (I did some of my own as well which corroborated his results) so it would seem that multiple comparison adjustment would be necessary for the number of components analyzed (i.e. the number of components with eigenvalues greater than 1). My mistake was equating orthogonality in dependant variables with orthogonality in independent variables (like the factors in a factorial design)where multiple comparisons adjustments aren't necessary. In trying to reconcile this with theory I found that Manova accomplishes the hypothesis test without compensating for multiplicity only because it uses only one "component", the principle root of a complex function of the design, contrast and covariance matrices. I'm not aware of a way to involve the design and contrast matrices prior to the SVD analysis without duplicating the MANOVA. The approach I proposed would enable a MANOVA-like analysis in R but with a multiplicity adjustment for the number of significant eigenvalues. I was able to perform MANOVA on datasets with fewer cases than variables in another statistics package, hence that would be the preferred approach. -.- -.. .---- .--. ..-. Stephen P. Baker, MScPH, PhD (ABD) (508) 856-2625 Sr. Biostatistician- Information Services Lecturer in Biostatistics (775) 254-4885 fax Graduate School of Biomedical Sciences University of Massachusetts Medical School, Worcester 55 Lake Avenue North stephen.baker@umassmed.edu Worcester, MA 01655 USA -----Original Message----- From: Liaw, Andy [mailto:andy_liaw@merck.com] Sent: Monday, November 24, 2003 9:30 AM To: Baker, Stephen; Michael Benjamin; bioconductor@stat.math.ethz.ch Subject: RE: [BioC] Re: Manova nuances I did a bit of sanity check. If anyone can tell me what I'm missing, I'll be very grateful. I did a small simulation to check the empirical level (type I error) of the procedure as described by Dr. Baker, compared with the MANOVA tests available in summary.manova(). The setup is as follows (R code shown below): o Simple oneway layout with three groups. n=50 in each group. o Response is 10, 20 or 30 dimensional, independent standard normal. o Repeat the simulation 1000 times and count the percent of times null is rejected. The result looks like this (took 9 minutes on my P3 933MHz laptop): Dr. Baker's suggested procedure: > apply(pval.pc, 2, function(x) mean(x <= 0.05)) 10 20 30 0.439 0.638 0.810 Pillai's trace from summary.manova: > apply(pval.m, 2, function(x) mean(x <= 0.05)) 10 20 30 0.039 0.052 0.055 The problem with using minimum of p-values for univariate ANOVA on PCs, as I see it, is the absense of multiplicity adjustment, which is taken care of in the multivariate test. I suppose one way to "fix" it is to use p.adjust(). Here's the R code for the simulation: ============================================== n <- 150 p <- c(10, 20, 30) x <- factor(rep(letters[1:3], each=n/3)) nreps <- 1000 pval.pc <- pval.m <- matrix(NA, nreps, length(p), dimnames=list(NULL, p)) system.time( for (i in 1:length(p)) { for (j in 1:nreps) { y <- matrix(rnorm(n*p[i]), n, p[i]) y.pc <- princomp(y)$scores pval.pc[j, i] <- min(sapply(summary(aov(y.pc ~ x)), function(x) x["x", "Pr(>F)"])) pval.m[j, i] <- summary(manova(y ~ x))$stat["x", "Pr(>F)"] } }) ============================================= Best, Andy > From: Stephen P. Baker > > Michael, > One of the matrices output by svd should have dimensions = k > by n or n by k where k is number of original variates and n > the number of chips, this would be a set of eigenvectors. > This matrix times the vector of expression levels from one > chip should produce a vector of length n with values for the > new components. There should also be either a vector or > diagonal matrix ouput with values that are in descending > order; the first corresponds to the variation in the first > component, the second the next highest, etc., these should be > the eigenvalues. Eigenvalues greater than 1 can be > interpreted as indicating that component accounts for a > significant portion of the variation in the original variables. > > -.- -.. .---- .--. ..-. > Stephen P. Baker, MScPH , PhD(ABD) (508) 856-2625 > Senior Biostatistician > (775) 254-4885 fax > Academic Computing Services > Lecturer in Biostatistics , Graduate School of Biomedical > Sciences University of Massachusetts Medical School > 55 Lake Avenue North > stephen.baker@umassmed.edu > Worcester, MA 01655 USA > > ----- Original Message ----- > From: "Michael Benjamin" <msb1129@bellsouth.net> > To: "'Stephen P. Baker'" <stephen.baker@umassmed.edu>; > <bioconductor@stat.math.ethz.ch> > Sent: Sunday, November 23, 2003 8:53 PM > Subject: RE: [BioC] Re: Manova nuances > > > > This sounds very reasonable. I'm having a bit of trouble with the > > implementation. > > > > How do you solve for a variable? I know u * diag(d) * t(v) > gets your > > original data, but how do you pick variables? Just taking > the first > > four columns of u or v alone doesn't work--I tried. There > must be a > > way to combine d, u, and v to represent the first few variables in > > low-dimensional space. > > > > In other words, after you do svd, then what? What do you > compare? U? > > V? > > > > Thanks, > > Mike > > > > > > -----Original Message----- > > From: bioconductor-bounces@stat.math.ethz.ch > > [mailto:bioconductor-bounces@stat.math.ethz.ch] On Behalf > Of Stephen > > P. Baker > > Sent: Saturday, November 22, 2003 8:57 AM > > To: Michael Benjamin; bioconductor@stat.math.ethz.ch > > Subject: Re: [BioC] Re: Manova nuances > > > > The principal components are orthogonal and independent and measure > > different things so it makes no sense to compare them, like > comparing > > horizontal to vertical or heart rate to IQ. > > > > Treat the components like variables and perform the same > analysis on > > them with ANOVA that you would have with MANOVA. Like 2 > groups, do a > > t-test, 3 > > do ANOVA, whatever analysis is appropriate for your > experimental design. > > If > > ANY of these are significant, the MANOVA would have been > significant. > > > > Stephen > > -.- -.. .---- .--. ..-. > > Stephen P. Baker, MScPH , PhD(ABD) > (508) 856-2625 > > Senior Biostatistician > > (775) 254-4885 fax > > Academic Computing Services > > Lecturer in Biostatistics , Graduate School of Biomedical Sciences > > University of Massachusetts Medical School > > 55 Lake Avenue North > stephen.baker@umassmed.edu > > Worcester, MA 01655 USA > > > > ----- Original Message ----- > > From: "Michael Benjamin" <msb1129@bellsouth.net> > > To: "'Baker, Stephen'" <stephen.baker@umassmed.edu>; "'Liaw, Andy'" > > <andy_liaw@merck.com>; <bioconductor@stat.math.ethz.ch> > > Sent: Friday, November 21, 2003 11:11 PM > > Subject: RE: [BioC] Re: Manova nuances > > > > > > > Can I do instead: > > > comps1<-svd(teset[group1])$d > > > comps2<-svd(teset[group2])$d > > > t.test(comps1,comps2) > > > > > > Maybe I could just compare the top two or three components to one > > > another? > > > > > > Mike > > > > > > -----Original Message----- > > > From: Baker, Stephen [mailto:Stephen.Baker@umassmed.edu] > > > Sent: Friday, November 21, 2003 3:05 PM > > > To: Liaw, Andy; bioconductor@stat.math.ethz.ch > > > Cc: msb1129@bellsouth.net > > > Subject: RE: [BioC] Re: Manova nuances > > > > > > Andy et al. > > > > > > (Thanks for correcting my typo on the spelling of "principal"). > > > > > > Yes I know that ANOVA of n principal components will result in n > > > p-values, however the SMALLEST p-value will be equivalent to a > > > multivariate test of his hypotheses on his data. > > > > > > MANOVA and univariate ANOVA on the principal components are > > essentially > > > equivalent in theory and quite similar in that both approaches > > > involve the characteristic roots and functions of the same design > > > and > > covariance > > > matrices. > > > > > > The equivalence is based on the fact that multivariate hypotheses > > > will be rejected only if the equivalent univariate > hypotheses do not > > > hold > > for > > > all variates (Morrison,1976). Principle components simply > > > transforms the original variates into new variates which conserve > > > all the > > original > > > information. Michael Benjamin's problem is that he CANNOT > run MANOVA > > as > > > he has fewer cases than variates however my suggested > approach WOULD > > > work. > > > > > > With regards to Michael's request/need for a SINGLE SUMMARY > > > STATISTIC, he would use the minimum of the p-values for the > > > appropriate effect > > from > > > the univariate ANOVA's on the principal components as his single > > > p-value. These are orthogonal tests and the minimum would be > > equivalent > > > to testing the same hypotheses with MANOVA on his dataset. > > > > > > > > > The only caveat is that with K genes and n<k and="" he="" will=""> be able to > > test > > > his hypotheses on the first n principal components which > account for > > the > > > largest portions of the variation. However, in my 20 years > > experience, > > > in most datasets the number of "significant" components (with > > > eigenvalues >1) is usually much smaller than the number > of variates. > > It > > > would be unusual for any real biological effect to not be > > > represented among one or more of the first n components > given n is > > > not too small. > > In > > > his case that's 35 and I think that's probably enough. > > > > > > Best wishes > > > Stephen > > > > > > -.- -.. .---- .--. ..-. > > > Stephen P. Baker, MScPH, PhD (ABD) (508) 856-2625 > > > Sr. Biostatistician- Information Services > > > Lecturer in Biostatistics (775) 254-4885 fax > > > Graduate School of Biomedical Sciences > > > University of Massachusetts Medical School, Worcester > > > 55 Lake Avenue North > > stephen.baker@umassmed.edu > > > Worcester, MA 01655 USA > > > > > > > > > > > > -----Original Message----- > > > From: Liaw, Andy [mailto:andy_liaw@merck.com] > > > Sent: Friday, November 21, 2003 8:13 AM > > > To: Baker, Stephen; bioconductor@stat.math.ethz.ch > > > Cc: 'msb1129@bellsouth.net' > > > Subject: RE: [BioC] Re: Manova nuances > > > > > > > > > > From: Stephen P. Baker [mailto:stephen.baker@umassmed.edu] > > > > > > > > Principle component analyses should reduce your data array to > > > ^^^^^^^^^ > > > Principal > > > > > > > as many independent components as you have samples, and > for each > > > > sample get a score for each dimension. These will have > the same > > > > total information as the original data. These can then be > > > > analysed separately with univariate anova but since these are > > > > "orthogonal" analyses, multiple comparisons adjustments > would not > > > > be needed. > > > > > > The analysis you described is quite different than MANOVA, so the > > > conclusion/interpretation would be quite different, too. MANOVA > > > treats the data as coming from multivariate normal > distribution, and > > > tests whether all groups have the same mean vector. What you > > described > > > is n (number of samples) ANOVA analyses that gives n p-values. > > > > > > Cheers, > > > Andy > > > Andy Liaw, PhD > > > Biometrics Research PO Box 2000, RY33-300 > > > Merck Research Labs Rahway, NJ 07065 > > > mailto:andy_liaw@merck.com 732-594-0820 > > > > > > > > > > > > > -.- -.. .---- .--. ..-. > > > > Stephen P. Baker, MScPH , PhD(ABD) (508) > > 856-2625 > > > > Senior Biostatistician > > > > (775) 254-4885 fax > > > > Academic Computing Services > > > > Lecturer in Biostatistics , Graduate School of > Biomedical Sciences > > > > University of Massachusetts Medical School 55 Lake Avenue North > > > > stephen.baker@umassmed.edu Worcester, MA 01655 USA > > > > -------------------------------------------------------------- > > > > -------------- > > > > ---- > > > > Date: Fri, 21 Nov 2003 00:18:54 -0500 > > > > From: "Michael Benjamin" <msb1129@bellsouth.net> > > > > Subject: [BioC] Manova nuances > > > > To: <bioconductor@stat.math.ethz.ch> > > > > Message-ID: <003401c3afee$f7eff000$7a05fea9@amd> > > > > Content-Type: text/plain; charset="US-ASCII" > > > > > > > > > > > > Anybody here using manova? It's powerful and pretty > fast, but I'm > > > > finding that you can't have more variables than samples (limits > > > > its applicability to microarray research). Is there any > way around > > > > this? Assume > > > > > > > > dim(eset) > > > > > > > > 1200 35 > > > > > > > > transeset<-t(eset) > > > > fit<-manova(transeset ~ categories) > > > > summary(fit) > > > > > > > > There is probably a complicated mathematical truth that > underlies > > > > this limitation--if anybody can shed some light, that would be > > > > great. > > > > > > > > Also, if anyone knows of a quick, free multivariate tool that > > > > summarizes all the tests into a single test statistic, > that would > > > > be much appreciated. > > > > > > > > Regards, > > > > Michael Benjamin, MD > > > > Emory University > > > > Winship Cancer Institute > > > > > > > > _______________________________________________ > > > > Bioconductor mailing list Bioconductor@stat.math.ethz.ch > > > > https://www.stat.math.ethz.ch/mailman/listinfo> /bioconductor > > > > > > > > > > > > > > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@stat.math.ethz.ch > > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor > > > > > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo> /bioconductor >