I am in the midst of analyzing Affymetrix Drosophila GeneChip data
using
RMA such that separate regression lines are estimated for each gene.
It was
recommended to me that I use a p-value of .0001 as a cutoff for the
effect
estimates rather than try to apply Bonferroni or other multiple test
corrections. Lately, however, I have begun to wonder if others doing
this
sort of analysis use similar cutoffs and, in general, what others
think
about statistical stringency in this situation. Any help will be most
appreciated; I will summarize any replies that I get that are not sent
directly to the list. Thank you.
Paul Mack, Ph.D
Department of Genetics
University of Georgia
Athens, GA
USA
706-542-1578 (w)
706-542-3910 (fax)
paulmack@arches.uga.edu
do you have a factorial design, and you run one linear model for each
gene, and then looking at the p-values for the coefficients? Could you
give some more information about what you're doing, I'm not sure I
understand ...?
regards,
Arne
--
Arne Muller, Ph.D.
Toxicogenomics, Aventis Pharma
arne dot muller domain=aventis com
> -----Original Message-----
> From: bioconductor-bounces@stat.math.ethz.ch
> [mailto:bioconductor-bounces@stat.math.ethz.ch]On Behalf Of Paul
Mack
> Sent: 14 May 2004 16:20
> Subject: [BioC] Drosophila GeneChip analysis
>
>
>
> I am in the midst of analyzing Affymetrix Drosophila GeneChip
> data using
> RMA such that separate regression lines are estimated for
> each gene. It was
> recommended to me that I use a p-value of .0001 as a cutoff
> for the effect
> estimates rather than try to apply Bonferroni or other multiple test
> corrections. Lately, however, I have begun to wonder if
> others doing this
> sort of analysis use similar cutoffs and, in general, what
> others think
> about statistical stringency in this situation. Any help will be
most
> appreciated; I will summarize any replies that I get that are
> not sent
> directly to the list. Thank you.
>
>
> Paul Mack, Ph.D
> Department of Genetics
> University of Georgia
> Athens, GA
> USA
>
> 706-542-1578 (w)
> 706-542-3910 (fax)
> paulmack@arches.uga.edu
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@stat.math.ethz.ch
> https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
>
Hi Paul,
this makes things more clear but I still have some question ...,
please see below for some comments.
> -----Original Message-----
> From: Paul Mack [mailto:paulmack@arches.uga.edu]
> Sent: 14 May 2004 18:07
> To: Muller, Arne PH/FR
> Subject: RE: [BioC] Drosophila GeneChip analysis
>
>
> Hi, Arne:
>
> Thanks for your response. Hopefully I can clarify. I have 4
> classification
> variables in the model I use: gene; category (meaning treated
> or control; I
This means you're running a single model, i.e. if you've 10,000 genes
on the chip you've the gene factor contains 10,000 levels, right?
How are you running this in R, I've tried once, and it quickly run out
of memory because I've >12k gene on the chip ... :-(
> have only one treatment and it is qualitative); array (designated as
> random); and probe (there are 14 probes per gene on each
> chip). It also
I'm also running linear models on the probe level (I think it gives a
good kind of pseudo-replication).
what is your model call, something like this:
lme(intensity ~ gene*cat*probe, random = ~ 1 | array)
or do you also include the array in the fixed effects.
I'm not sure about this call (just received the mixed model book from
Pinheiro and Bates).
> includes a category x array interaction term. The model predicts
gene
> expression as a function of array, categoy, array, probe and the
^^^^
you mean gene here?
> interaction term. I then look at the estimated category
> coefficients gene
I'm currently doing a similar thing, and despite the trouble to decide
for a method to correct for multiple testing (I'm using
p.adjust(pvalue, 'fdr')) and the actual p-value cutoff, I found that
residuals of the models are not normal distributed (see one of my last
postings to the Bioconductor list). I think one realy needs to check
the model quality, otherwise the p-values don't mean too much anyway
... .
Kerr and Churchill (2002) have reported this problem, and argue that
one actually needs to use bootstrapping to calculate condifence
intervals (since the distribution of residuals has extreme tails).
This is rather discouraging since bootstrapping will take too long for
my analysis (MG-U74Av2 chip with >12k gene).
Did you try the mmanova package from Kerr & Churchill
(http://www.jax.org/staff/churchill/labsite/software/anova/)? I'm not
sure it works for affy chips.
regards,
Arne
> by gene. Hope this makes more sense.
>
> Paul
>
> At 04:28 PM 5/14/2004 +0200, you wrote:
> >do you have a factorial design, and you run one linear model
> for each
> >gene, and then looking at the p-values for the coefficients?
> Could you
> >give some more information about what you're doing, I'm not sure I
> >understand ...?
> >
> > regards,
> >
> > Arne
> >
> >--
> >Arne Muller, Ph.D.
> >Toxicogenomics, Aventis Pharma
> >arne dot muller domain=aventis com
> >
> > > -----Original Message-----
> > > From: bioconductor-bounces@stat.math.ethz.ch
> > > [mailto:bioconductor-bounces@stat.math.ethz.ch]On Behalf
> Of Paul Mack
> > > Sent: 14 May 2004 16:20
> > > Subject: [BioC] Drosophila GeneChip analysis
> > >
> > >
> > >
> > > I am in the midst of analyzing Affymetrix Drosophila GeneChip
> > > data using
> > > RMA such that separate regression lines are estimated for
> > > each gene. It was
> > > recommended to me that I use a p-value of .0001 as a cutoff
> > > for the effect
> > > estimates rather than try to apply Bonferroni or other
> multiple test
> > > corrections. Lately, however, I have begun to wonder if
> > > others doing this
> > > sort of analysis use similar cutoffs and, in general, what
> > > others think
> > > about statistical stringency in this situation. Any help
> will be most
> > > appreciated; I will summarize any replies that I get that are
> > > not sent
> > > directly to the list. Thank you.
> > >
> > >
> > > Paul Mack, Ph.D
> > > Department of Genetics
> > > University of Georgia
> > > Athens, GA
> > > USA
> > >
> > > 706-542-1578 (w)
> > > 706-542-3910 (fax)
> > > paulmack@arches.uga.edu
> > >
> > > _______________________________________________
> > > Bioconductor mailing list
> > > Bioconductor@stat.math.ethz.ch
> > > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
> > >
>
> Paul Mack, Ph.D
> Department of Genetics
> University of Georgia
> Athens, GA
> USA
>
> 706-542-1578 (w)
> 706-542-3910 (fax)
> paulmack@arches.uga.edu
>
>
>
>