Entering edit mode
At 09:00 PM 28/10/2003, Luke Whitaker wrote:
>Hello all,
>
>I have a number of Agilent based experiments where I have been asked
to
>find up and down regulated genes, and later on to do some sort of
>clustering of gene profiles across multiple experiments. Currently I
am
>only concerned in looking for the most highly regulated genes within
a
>single (multi-array) experiment.
>
>After spending a while knocking my data into shape for analysis by
>"limma" (or so I thought) I calculated the top 30 regulated genes
>for a couple of experiments and noticed the same gene appearing
>more than once in the top 30 list. Then I read this post...
>
>On Fri, 17 Oct 2003, Gordon Smyth wrote:
> > At 11:53 PM 16/10/2003, Jason Skelton wrote:
> > >On a different note
> > >The arrays I have tested LIMMA on have 2 duplicates and are
spaced evenly
> > >throughout the array and so have no problems running your
functions.
> > >
> > >Someone else at the Sanger Insitite would like to be able to use
LIMMA
> but
> > >the number of duplicates for each gene differs on their array e.g
for
> some
> > >genes their are two copies and for others there would be four
copies or
> > >more which inturn obviously effects spacing etc between
replicates.
> > >I'm not sure why they would want differing numbers of copies of
genes but
> > >they would like to be able to estimate the correlation between
these
> genes
> > >anyway and obviously see the results as one data point per merged
gene.
> >
> > I haven't implemented this in limma because it seems to me that it
might
> > invalidate the assumptions behind the duplicate correlation
approach. See
> > the earlier post:
> >
> >
https://stat.ethz.ch/pipermail/bioconductor/2003-August/002224.html
> >
> > >I've tried to think of how this can be done but it seems overly
complex
> > >and I'm not sure if it is at all possible in R or Limma.
> > >
> > >I'm guessing there is no way of carryout the correlation, series
model
> > >fits etc based simply on the "Name" specified in the GAL files ?
> >
> > No.
> >
> > Cheers
> > Gordon
>
>Obviously I hadn't read the documentation carefully enough, because
none of
>the arrays I have been asked to analyse have evenly spaced
duplicates.
>After a bit of wailing, gnashing of teeth, and banging my head
against the
>desk, I was wondering if there is a rational way of combining the
multiple
>estimates for a single gene ? In particular, could Bayes rule be used
to
>combine multiple P value estimates for a single gene ?
No.
> What about the M
>estimates - could a simple arithmetic or geometric mean be used ?
Yes - simple arithmetic mean is fine.
Gordon
>Note that these estimates do NOT have to be theoretically perfect -
almost
>any rough and ready method that has some sort of validity will do.
After
>all, the basic experimental assumptions are fairly approximate, and
some
>sort of approximate estimate will be much better than no estimate at
all.
>
>Or is Limma the wrong package for my analysis, in which case, what
package
>should I be using ? I asked this here before, but didn't specify that
I had
>irregularly spaced duplicates as I hadn't realised that was an issue.
I do not know any package which handles irregularly spaced duplicates
or
for that matter any research material which would provide reliable
methodology to do so. (Apart from limma, I'm don't know any package
which
automatically handles duplicates at all although that doesn't mean
such
doesn't exist.)
Gordon
> Are
>there any other likely "gotchas" in terms of assumptions that
packages will
>make about data layouts ?
>Thanks,
>
>Luke Whitaker