Question

Limma vs GEE

0

Entering edit mode

Michael Breen ▴ 370

@michael-breen-5999

Last seen 10.6 years ago

Hi all, Consider 20 samples at baseline later exposed to treatment. 10 develop a disease and 10 do not develop a disease. Here we want to make a longitudinal assessment of gene expression in the diseased vs disease- free. All done on Affy microarrays. Are there any obvious reasons why one would consider limma over GEE for testing for conditional or disease related outcomes? Cheers, Michael [[alternative HTML version deleted]]

affy limma affy limma • 1.6k views

ADD COMMENT • link updated 11.5 years ago by Gordon Smyth 52k • written 11.5 years ago by Michael Breen ▴ 370

score 0 · Answer 1 · 2013-10-01

0

Entering edit mode

Michael Breen ▴ 370

@michael-breen-5999

Last seen 10.6 years ago

An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20131001="" 66720437="" attachment-0001.pl="">

ADD COMMENT • link 11.5 years ago Michael Breen ▴ 370

score 0 · Answer 2 · 2013-10-02

0

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 22 hours ago

WEHI, Melbourne, Australia

Dear Michael, It would help if you explained what you mean by "GEE" and why you think it might be relevant for your problem. Best wishes Gordon > Date: Tue, 1 Oct 2013 15:28:36 +0100 > From: Michael Breen <breenbioinformatics at="" gmail.com=""> > To: "bioconductor at r-project.org" <bioconductor at="" r-project.org="">, > "Bioconductor Mailing List" <bioconductor at="" stat.math.ethz.ch=""> > Subject: [BioC] Limma vs GEE > > Hi all, > > Consider 20 samples at baseline later exposed to treatment. 10 develop a > disease and 10 do not develop a disease. Here we want to make a > longitudinal assessment of gene expression in the diseased vs > disease-free. All done on Affy microarrays. > > Are there any obvious reasons why one would consider limma over GEE for > testing for conditional or disease related outcomes? > > Cheers, > > Michael ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

ADD COMMENT • link 11.5 years ago Gordon Smyth 52k

0

Entering edit mode

Hi Gordon, We are just about finished with a write-up of a manuscript where we describe longitudinal differences within subjects between two different groups from baseline to an outcome. We used a factorial design in limma and are happy with its results and robustness. Recently, a colleague mentioned had GEE as a means to test for DE between groups. I have yet to find any microarray differential testing done with it. GEE is a generalized estimated equation used to estimate parameters of a glm, it measures population-averaged effects. Truthfully, I dont what is is about and was hoping to gain a bit more of insight which google could not offer. Often this mail listing brings me resolution in a much more explicit and unambigous manner. Yours, Michael On Wed, Oct 2, 2013 at 2:13 PM, Gordon K Smyth <smyth@wehi.edu.au> wrote: > Dear Michael, > > It would help if you explained what you mean by "GEE" and why you think it > might be relevant for your problem. > > Best wishes > Gordon > > Date: Tue, 1 Oct 2013 15:28:36 +0100 >> From: Michael Breen <breenbioinformatics@gmail.com**> >> To: "bioconductor@r-project.org" <bioconductor@r-project.org>, >> "Bioconductor Mailing List" <bioconductor@stat.math.ethz.**ch<bioconductor@stat.math.ethz.ch> >> > >> Subject: [BioC] Limma vs GEE >> >> Hi all, >> >> Consider 20 samples at baseline later exposed to treatment. 10 develop a >> disease and 10 do not develop a disease. Here we want to make a >> longitudinal assessment of gene expression in the diseased vs disease-free. >> All done on Affy microarrays. >> >> Are there any obvious reasons why one would consider limma over GEE for >> testing for conditional or disease related outcomes? >> >> Cheers, >> >> Michael >> > > ______________________________**______________________________**____ ______ > The information in this email is confidential and inte...{{dropped:10}}

ADD REPLY • link 11.5 years ago Michael Breen ▴ 370

0

Entering edit mode

GEE makes sense when you have lots of samples in the population, measured many times, and wish to know about population-level effects of a small number of factors; it's an alternative to other methods of hierarchical/nested mixed models. In the limma moderated-ANOVA world, you'd use duplicateCorrelation to account for the nested correlation structure, while still getting the (often huge) benefits of between- gene moderation of variance you get with limma. http://www.jstatsoft.org/v15/i02/paper If this were a reviewer making the criticism/suggestion, I'd respond with a sensitivity analysis: to what degree are your inferred results sensitive to choices in modeling (limma with/without duplicateCorrelation; limma vs. unmoderated straight-up glm; glm vs. geepack error structures). Based on that, I'd argue whether it even matters, and if it does matter, show data for an example gene or two for which the modeling choice has a strong effect on estimates of magnitude and significance. There's never any one single "right" answer -- "all models are wrong; but some are useful" (George Box). -Aaron [[alternative HTML version deleted]]

ADD REPLY • link 11.5 years ago Aaron Mackey ▴ 200

0

Entering edit mode

Hi Micahel, As Aaron Mackey has said in separate email, limma has the obvious advantage of borrowing information between genes. I have trouble thinking of any possible motivation for using a GEE in your context, and the fact that you can't find any applications to microarrays is a sign of this. GEEs are not actually used to fit generalized linear models (glms). If one wanted to fit a glm, one would simply do so using the usual likelihood method. GEEs are actually used to estimate glms with correlation structures. The reason why a "generalized" (approximate) estimating equation is needed is that such models don't correspond to any well defined probability distribution. The GEE equations don't maximize any optimality criteria such as a likelihood or sum of squares. In your case you don't even have glms. You have normal data from Affymetrix arrays for which likelihood methods are readily available. So there is no need to use glms or GEEs. With your data, the potential motivation for fitting a correlation structure would be to take account of correlation between repeated time course measurements on the same samples (if that is what you actually have). limma allows you to fit a constant correlation between the repeated measures. That should be sufficient unless you have large number of longitudinal observations on the same samples. If you did need to go outside the limma framework to fit a more complex correlation structure (and forgo the benefits of information borrowing), you would probably want to use one of the many normal-based mixed model tools rather than GEEs. Best wishes Gordon On Wed, 2 Oct 2013, Michael Breen wrote: > Hi Gordon, > > We are just about finished with a write-up of a manuscript where we > describe longitudinal differences within subjects between two different > groups from baseline to an outcome. > > We used a factorial design in limma and are happy with its results and > robustness. > > Recently, a colleague mentioned had GEE as a means to test for DE between > groups. I have yet to find any microarray differential testing done with > it. GEE is a generalized estimated equation used to estimate parameters of > a glm, it measures population-averaged effects. Truthfully, I dont what is > is about and was hoping to gain a bit more of insight which google could > not offer. Often this mail listing brings me resolution in a much more > explicit and unambigous manner. > > Yours, > > Michael > > > > > > On Wed, Oct 2, 2013 at 2:13 PM, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: > >> Dear Michael, >> >> It would help if you explained what you mean by "GEE" and why you think it >> might be relevant for your problem. >> >> Best wishes >> Gordon >> >> Date: Tue, 1 Oct 2013 15:28:36 +0100 >>> From: Michael Breen <breenbioinformatics at="" gmail.com**=""> >>> To: "bioconductor at r-project.org" <bioconductor at="" r-project.org="">, >>> "Bioconductor Mailing List" <bioconductor at="" stat.math.ethz.**ch<bioconductor="" at="" stat.math.ethz.ch=""> >>>> >>> Subject: [BioC] Limma vs GEE >>> >>> Hi all, >>> >>> Consider 20 samples at baseline later exposed to treatment. 10 develop >>> a disease and 10 do not develop a disease. Here we want to make a >>> longitudinal assessment of gene expression in the diseased vs >>> disease-free. All done on Affy microarrays. >>> >>> Are there any obvious reasons why one would consider limma over GEE >>> for testing for conditional or disease related outcomes? >>> >>> Cheers, >>> >>> Michael ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

ADD REPLY • link 11.5 years ago Gordon Smyth 52k

0

Entering edit mode

Hi Aaron and Gordon, Thanks for your entirely straightforward replies to our broad question. In fact this was not yet critic from a reviewer, rather constructive criticism from a colleague. Although, now we have a better idea about these types of tests. I find your summarization of GEE rather helpful in that they do not maximize any optimality criteria such as they don't correspond to any well defined probability distribution and are poor when maximizing likelihood and sum of squares. Thanks again for your time and answers! Michael On Thu, Oct 3, 2013 at 12:39 AM, Gordon K Smyth <smyth@wehi.edu.au> wrote: > Hi Micahel, > > As Aaron Mackey has said in separate email, limma has the obvious > advantage of borrowing information between genes. > > I have trouble thinking of any possible motivation for using a GEE in your > context, and the fact that you can't find any applications to microarrays > is a sign of this. > > GEEs are not actually used to fit generalized linear models (glms). If > one wanted to fit a glm, one would simply do so using the usual likelihood > method. GEEs are actually used to estimate glms with correlation > structures. The reason why a "generalized" (approximate) estimating > equation is needed is that such models don't correspond to any well defined > probability distribution. The GEE equations don't maximize any optimality > criteria such as a likelihood or sum of squares. > > In your case you don't even have glms. You have normal data from > Affymetrix arrays for which likelihood methods are readily available. So > there is no need to use glms or GEEs. > > With your data, the potential motivation for fitting a correlation > structure would be to take account of correlation between repeated time > course measurements on the same samples (if that is what you actually > have). limma allows you to fit a constant correlation between the repeated > measures. That should be sufficient unless you have large number of > longitudinal observations on the same samples. If you did need to go > outside the limma framework to fit a more complex correlation structure > (and forgo the benefits of information borrowing), you would probably want > to use one of the many normal-based mixed model tools rather than GEEs. > > Best wishes > Gordon > > > On Wed, 2 Oct 2013, Michael Breen wrote: > > Hi Gordon, >> >> We are just about finished with a write-up of a manuscript where we >> describe longitudinal differences within subjects between two different >> groups from baseline to an outcome. >> >> We used a factorial design in limma and are happy with its results and >> robustness. >> >> Recently, a colleague mentioned had GEE as a means to test for DE between >> groups. I have yet to find any microarray differential testing done with >> it. GEE is a generalized estimated equation used to estimate parameters of >> a glm, it measures population-averaged effects. Truthfully, I dont what is >> is about and was hoping to gain a bit more of insight which google could >> not offer. Often this mail listing brings me resolution in a much more >> explicit and unambigous manner. >> >> Yours, >> >> Michael >> >> >> >> >> >> On Wed, Oct 2, 2013 at 2:13 PM, Gordon K Smyth <smyth@wehi.edu.au> wrote: >> >> Dear Michael, >>> >>> It would help if you explained what you mean by "GEE" and why you think >>> it >>> might be relevant for your problem. >>> >>> Best wishes >>> Gordon >>> >>> Date: Tue, 1 Oct 2013 15:28:36 +0100 >>> >>>> From: Michael Breen <breenbioinformatics@gmail.com****> >>>> To: "bioconductor@r-project.org" <bioconductor@r-project.org>, >>>> "Bioconductor Mailing List" <bioconductor@stat.math.ethz.***>>>> *ch<bioconductor@stat.math.**ethz.ch <bioconductor@stat.math.ethz.ch="">> >>>> >>>> >>>>> Subject: [BioC] Limma vs GEE >>>> >>>> Hi all, >>>> >>>> Consider 20 samples at baseline later exposed to treatment. 10 develop >>>> a disease and 10 do not develop a disease. Here we want to make a >>>> longitudinal assessment of gene expression in the diseased vs disease-free. >>>> All done on Affy microarrays. >>>> >>>> Are there any obvious reasons why one would consider limma over GEE for >>>> testing for conditional or disease related outcomes? >>>> >>>> Cheers, >>>> >>>> Michael >>>> >>> > ______________________________**______________________________**____ ______ > The information in this email is confidential and inte...{{dropped:10}}

ADD REPLY • link 11.5 years ago Michael Breen ▴ 370