Dear list,
I'm new using GSEAbase and I've seen some examples given in
"Bioconductor case studies" book. A data example is given according to
the following steps:
* Nonspecific filtering on expression data object.
* Building the GeneSetCollection using KEGG (for example).
* Compute the per gene test statistics using t-test
* Use of a permutation test to assess which genes have an
unusually
large absolute value of the distribution.
My question is: can we use any kind of statistic? For example,
moderated
t-statistic using limma?I know that limma uses the eBayes function,
which employs information from all genes to arrive at more stable
estimates of each individual gene's variance and I don't know if, in
GSEA context, it is correct to use this moderated statistic which
takes
into account all the genes (it is not like the "standard" per gene
statistic t-test).
Thanks,
Javier
[[alternative HTML version deleted]]
Dear Javier,
I am pretty sure more experienced member would have a lot and deeper
things
to say about your question.
Here is my 25 cent:
Model based statistic (moderated t statistic) and permutation tests
are two
different flavors of testing the Null Hypothes[es|is]. Comparing these
two
flavors, in my case, will be equivalent to comparing apple and
oranges.
Each of these methods have their own advantages. If the model suits
well -
moderated/unmoderated t - statistic should be preferred. If you have
no idea
of what the model is OR/AND if you are not sure if the model
assumptions
hold for the data then - permutation test would be a "wiser" (but not
necessarily better) choice.
A lot can be said to the above discussion - but permutation test will
always
exist but might not give superior results to what you model based test
statistic would give (t-test is quiet robust to assumptions).
This should apply to your example as well. You are allowed to used
moderated
t statitic
Please correct if I am wrong. I am also learning my statistics :-)
Thanks and Best Regards,
S.
2009/11/23 Javier Pérez Florido <jpflorido@gmail.com>
> Dear list,
> I'm new using GSEAbase and I've seen some examples given in
> "Bioconductor case studies" book. A data example is given according
to
> the following steps:
>
> * Nonspecific filtering on expression data object.
> * Building the GeneSetCollection using KEGG (for example).
> * Compute the per gene test statistics using t-test
> * Use of a permutation test to assess which genes have an
unusually
> large absolute value of the distribution.
>
> My question is: can we use any kind of statistic? For example,
moderated
> t-statistic using limma?I know that limma uses the eBayes function,
> which employs information from all genes to arrive at more stable
> estimates of each individual gene's variance and I don't know if, in
> GSEA context, it is correct to use this moderated statistic which
takes
> into account all the genes (it is not like the "standard" per gene
> statistic t-test).
>
> Thanks,
> Javier
>
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
[[alternative HTML version deleted]]
Dear Sunny,
Thanks for your reply regarding the use of parametric/nonparametric
statistical tests.
What I wanted to mean is the use of a "global" parametric test such
limma in the context of Gene Set Enrichment useful for finding
biological themes in gene sets. My question is if limma is suitable
when
building groups of genes since eBayes function employs information
from
ALL genes, rather than individual genes.... :-)
Javier
Sunny Srivastava escribi?:
> Dear Javier,
> I am pretty sure more experienced member would have a lot and deeper
> things to say about your question.
>
> Here is my 25 cent:
> Model based statistic (moderated t statistic) and permutation tests
> are two different flavors of testing the Null Hypothes[es|is].
> Comparing these two flavors, in my case, will be equivalent to
> comparing apple and oranges.
>
> Each of these methods have their own advantages. If the model suits
> well - moderated/unmoderated t - statistic should be preferred. If
you
> have no idea of what the model is OR/AND if you are not sure if the
> model assumptions hold for the data then - permutation test would be
a
> "wiser" (but not necessarily better) choice.
>
> A lot can be said to the above discussion - but permutation test
will
> always exist but might not give superior results to what you model
> based test statistic would give (t-test is quiet robust to
assumptions).
>
> This should apply to your example as well. You are allowed to used
> moderated t statitic
>
> Please correct if I am wrong. I am also learning my statistics :-)
>
> Thanks and Best Regards,
> S.
>
> 2009/11/23 Javier P?rez Florido <jpflorido at="" gmail.com=""> <mailto:jpflorido at="" gmail.com="">>
>
> Dear list,
> I'm new using GSEAbase and I've seen some examples given in
> "Bioconductor case studies" book. A data example is given
according to
> the following steps:
>
> * Nonspecific filtering on expression data object.
> * Building the GeneSetCollection using KEGG (for example).
> * Compute the per gene test statistics using t-test
> * Use of a permutation test to assess which genes have an
unusually
> large absolute value of the distribution.
>
> My question is: can we use any kind of statistic? For example,
> moderated
> t-statistic using limma?I know that limma uses the eBayes
function,
> which employs information from all genes to arrive at more
stable
> estimates of each individual gene's variance and I don't know
if, in
> GSEA context, it is correct to use this moderated statistic
which
> takes
> into account all the genes (it is not like the "standard" per
gene
> statistic t-test).
>
> Thanks,
> Javier
>
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch <mailto:bioconductor at="" stat.math.ethz.ch="">
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
>
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
Hi Javier,
It's ok, as long as you repeat the whole eBayes procedure for each
permutation. The smoothed standard errors are statistically
independent
of the moderated t-statistics, hence independent of your category
inference.
You might also consider the roast() and romer() functions which use
the
empirical Bayes statistics explicitly.
Best wishes
Gordon
> Date: Tue, 24 Nov 2009 10:16:46 +0100
> From: Javier P?rez Florido <jpflorido at="" gmail.com="">
> Subject: Re: [BioC] GSEAbase and limma
> To: Sunny Srivastava <research.baba at="" gmail.com="">
> Cc: bioconductor at stat.math.ethz.ch
> Message-ID: <4B0BA47E.8010103 at gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Dear Sunny,
> Thanks for your reply regarding the use of parametric/nonparametric
> statistical tests.
> What I wanted to mean is the use of a "global" parametric test such
> limma in the context of Gene Set Enrichment useful for finding
> biological themes in gene sets. My question is if limma is suitable
when
> building groups of genes since eBayes function employs information
from
> ALL genes, rather than individual genes.... :-)
>
> Javier
>
>
> Sunny Srivastava escribi?:
>> Dear Javier,
>> I am pretty sure more experienced member would have a lot and
deeper
>> things to say about your question.
>>
>> Here is my 25 cent:
>> Model based statistic (moderated t statistic) and permutation tests
>> are two different flavors of testing the Null Hypothes[es|is].
>> Comparing these two flavors, in my case, will be equivalent to
>> comparing apple and oranges.
>>
>> Each of these methods have their own advantages. If the model suits
>> well - moderated/unmoderated t - statistic should be preferred. If
you
>> have no idea of what the model is OR/AND if you are not sure if the
>> model assumptions hold for the data then - permutation test would
be a
>> "wiser" (but not necessarily better) choice.
>>
>> A lot can be said to the above discussion - but permutation test
will
>> always exist but might not give superior results to what you model
>> based test statistic would give (t-test is quiet robust to
assumptions).
>>
>> This should apply to your example as well. You are allowed to used
>> moderated t statitic
>>
>> Please correct if I am wrong. I am also learning my statistics :-)
>>
>> Thanks and Best Regards,
>> S.
>>
>> 2009/11/23 Javier P?rez Florido <jpflorido at="" gmail.com="">> <mailto:jpflorido at="" gmail.com="">>
>>
>> Dear list,
>> I'm new using GSEAbase and I've seen some examples given in
>> "Bioconductor case studies" book. A data example is given
according to
>> the following steps:
>>
>> * Nonspecific filtering on expression data object.
>> * Building the GeneSetCollection using KEGG (for example).
>> * Compute the per gene test statistics using t-test
>> * Use of a permutation test to assess which genes have an
unusually
>> large absolute value of the distribution.
>>
>> My question is: can we use any kind of statistic? For example,
>> moderated
>> t-statistic using limma?I know that limma uses the eBayes
function,
>> which employs information from all genes to arrive at more
stable
>> estimates of each individual gene's variance and I don't know
if, in
>> GSEA context, it is correct to use this moderated statistic
which
>> takes
>> into account all the genes (it is not like the "standard" per
gene
>> statistic t-test).
>>
>> Thanks,
>> Javier