GSEAbase and limma

0

Entering edit mode

Javier Pérez Florido ▴ 840

@javier-perez-florido-3121

Last seen 6.9 years ago

Dear list, I'm new using GSEAbase and I've seen some examples given in "Bioconductor case studies" book. A data example is given according to the following steps: * Nonspecific filtering on expression data object. * Building the GeneSetCollection using KEGG (for example). * Compute the per gene test statistics using t-test * Use of a permutation test to assess which genes have an unusually large absolute value of the distribution. My question is: can we use any kind of statistic? For example, moderated t-statistic using limma?I know that limma uses the eBayes function, which employs information from all genes to arrive at more stable estimates of each individual gene's variance and I don't know if, in GSEA context, it is correct to use this moderated statistic which takes into account all the genes (it is not like the "standard" per gene statistic t-test). Thanks, Javier [[alternative HTML version deleted]]

limma GSEABase limma GSEABase • 1.7k views

ADD COMMENT • link updated 15.2 years ago by Gordon Smyth 52k • written 15.2 years ago by Javier Pérez Florido ▴ 840

0

Entering edit mode

Sunny Srivastava ▴ 350

@sunny-srivastava-3793

Last seen 10.4 years ago

Dear Javier, I am pretty sure more experienced member would have a lot and deeper things to say about your question. Here is my 25 cent: Model based statistic (moderated t statistic) and permutation tests are two different flavors of testing the Null Hypothes[es|is]. Comparing these two flavors, in my case, will be equivalent to comparing apple and oranges. Each of these methods have their own advantages. If the model suits well - moderated/unmoderated t - statistic should be preferred. If you have no idea of what the model is OR/AND if you are not sure if the model assumptions hold for the data then - permutation test would be a "wiser" (but not necessarily better) choice. A lot can be said to the above discussion - but permutation test will always exist but might not give superior results to what you model based test statistic would give (t-test is quiet robust to assumptions). This should apply to your example as well. You are allowed to used moderated t statitic Please correct if I am wrong. I am also learning my statistics :-) Thanks and Best Regards, S. 2009/11/23 Javier Pérez Florido <jpflorido@gmail.com> > Dear list, > I'm new using GSEAbase and I've seen some examples given in > "Bioconductor case studies" book. A data example is given according to > the following steps: > > * Nonspecific filtering on expression data object. > * Building the GeneSetCollection using KEGG (for example). > * Compute the per gene test statistics using t-test > * Use of a permutation test to assess which genes have an unusually > large absolute value of the distribution. > > My question is: can we use any kind of statistic? For example, moderated > t-statistic using limma?I know that limma uses the eBayes function, > which employs information from all genes to arrive at more stable > estimates of each individual gene's variance and I don't know if, in > GSEA context, it is correct to use this moderated statistic which takes > into account all the genes (it is not like the "standard" per gene > statistic t-test). > > Thanks, > Javier > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD COMMENT • link 15.2 years ago Sunny Srivastava ▴ 350

0

Entering edit mode

Dear Sunny, Thanks for your reply regarding the use of parametric/nonparametric statistical tests. What I wanted to mean is the use of a "global" parametric test such limma in the context of Gene Set Enrichment useful for finding biological themes in gene sets. My question is if limma is suitable when building groups of genes since eBayes function employs information from ALL genes, rather than individual genes.... :-) Javier Sunny Srivastava escribi?: > Dear Javier, > I am pretty sure more experienced member would have a lot and deeper > things to say about your question. > > Here is my 25 cent: > Model based statistic (moderated t statistic) and permutation tests > are two different flavors of testing the Null Hypothes[es|is]. > Comparing these two flavors, in my case, will be equivalent to > comparing apple and oranges. > > Each of these methods have their own advantages. If the model suits > well - moderated/unmoderated t - statistic should be preferred. If you > have no idea of what the model is OR/AND if you are not sure if the > model assumptions hold for the data then - permutation test would be a > "wiser" (but not necessarily better) choice. > > A lot can be said to the above discussion - but permutation test will > always exist but might not give superior results to what you model > based test statistic would give (t-test is quiet robust to assumptions). > > This should apply to your example as well. You are allowed to used > moderated t statitic > > Please correct if I am wrong. I am also learning my statistics :-) > > Thanks and Best Regards, > S. > > 2009/11/23 Javier P?rez Florido <jpflorido at="" gmail.com=""> <mailto:jpflorido at="" gmail.com="">> > > Dear list, > I'm new using GSEAbase and I've seen some examples given in > "Bioconductor case studies" book. A data example is given according to > the following steps: > > * Nonspecific filtering on expression data object. > * Building the GeneSetCollection using KEGG (for example). > * Compute the per gene test statistics using t-test > * Use of a permutation test to assess which genes have an unusually > large absolute value of the distribution. > > My question is: can we use any kind of statistic? For example, > moderated > t-statistic using limma?I know that limma uses the eBayes function, > which employs information from all genes to arrive at more stable > estimates of each individual gene's variance and I don't know if, in > GSEA context, it is correct to use this moderated statistic which > takes > into account all the genes (it is not like the "standard" per gene > statistic t-test). > > Thanks, > Javier > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch <mailto:bioconductor at="" stat.math.ethz.ch=""> > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > >

ADD REPLY • link 15.2 years ago Javier Pérez Florido ▴ 840

0

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 2 hours ago

WEHI, Melbourne, Australia

Hi Javier, It's ok, as long as you repeat the whole eBayes procedure for each permutation. The smoothed standard errors are statistically independent of the moderated t-statistics, hence independent of your category inference. You might also consider the roast() and romer() functions which use the empirical Bayes statistics explicitly. Best wishes Gordon > Date: Tue, 24 Nov 2009 10:16:46 +0100 > From: Javier P?rez Florido <jpflorido at="" gmail.com=""> > Subject: Re: [BioC] GSEAbase and limma > To: Sunny Srivastava <research.baba at="" gmail.com=""> > Cc: bioconductor at stat.math.ethz.ch > Message-ID: <4B0BA47E.8010103 at gmail.com> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > Dear Sunny, > Thanks for your reply regarding the use of parametric/nonparametric > statistical tests. > What I wanted to mean is the use of a "global" parametric test such > limma in the context of Gene Set Enrichment useful for finding > biological themes in gene sets. My question is if limma is suitable when > building groups of genes since eBayes function employs information from > ALL genes, rather than individual genes.... :-) > > Javier > > > Sunny Srivastava escribi?: >> Dear Javier, >> I am pretty sure more experienced member would have a lot and deeper >> things to say about your question. >> >> Here is my 25 cent: >> Model based statistic (moderated t statistic) and permutation tests >> are two different flavors of testing the Null Hypothes[es|is]. >> Comparing these two flavors, in my case, will be equivalent to >> comparing apple and oranges. >> >> Each of these methods have their own advantages. If the model suits >> well - moderated/unmoderated t - statistic should be preferred. If you >> have no idea of what the model is OR/AND if you are not sure if the >> model assumptions hold for the data then - permutation test would be a >> "wiser" (but not necessarily better) choice. >> >> A lot can be said to the above discussion - but permutation test will >> always exist but might not give superior results to what you model >> based test statistic would give (t-test is quiet robust to assumptions). >> >> This should apply to your example as well. You are allowed to used >> moderated t statitic >> >> Please correct if I am wrong. I am also learning my statistics :-) >> >> Thanks and Best Regards, >> S. >> >> 2009/11/23 Javier P?rez Florido <jpflorido at="" gmail.com="">> <mailto:jpflorido at="" gmail.com="">> >> >> Dear list, >> I'm new using GSEAbase and I've seen some examples given in >> "Bioconductor case studies" book. A data example is given according to >> the following steps: >> >> * Nonspecific filtering on expression data object. >> * Building the GeneSetCollection using KEGG (for example). >> * Compute the per gene test statistics using t-test >> * Use of a permutation test to assess which genes have an unusually >> large absolute value of the distribution. >> >> My question is: can we use any kind of statistic? For example, >> moderated >> t-statistic using limma?I know that limma uses the eBayes function, >> which employs information from all genes to arrive at more stable >> estimates of each individual gene's variance and I don't know if, in >> GSEA context, it is correct to use this moderated statistic which >> takes >> into account all the genes (it is not like the "standard" per gene >> statistic t-test). >> >> Thanks, >> Javier

ADD COMMENT • link 15.2 years ago Gordon Smyth 52k

Login before adding your answer.