Entering edit mode
I am a bit late on this discussion, but here is my input:
We are using EST arrays, which have similar problems. We use a few
spike-ins (3-10), spotted many times each. (We are using 50
spots/spike-in, but I think 25 would be sufficient.) We spike them in
a
titration series, which helps us determine how well the loess works
(versus
"A"). Having many replicates is very valuable - it gives us an idea
of the
natural variation in the system, and the within spot correlation for
the 2
channels. It also means that we can create our own oligos, which is a
good
thing, given the cost of oligos.
In one of our experiments, ordinary loess seems to be fine - i.e. the
spike-ins end up where they should be. In the other experiment, the
spike-ins "move" dramatically under one condition. We are
investigating
whether this was lab error, or a need for a different normalization.
If most genes differentially express, you should abandon q-values.
The
reason is obvious if you think about it - the q-value is the
estimated
percentages of false detections. If 90% of the genes actually
differentially express, the max. q-value is going to be 10%, even if
you
declare all the genes significant. In that case, you should be
controlling
the FNR - false non-detect rate. Storey's 2003 paper also discusses
estimating FNR - but this has not been so popular. In any case, my
simplistic solution is that if the q-value routine indicates small
pi0,
then I don't do multiple comparisons corrections. What do I mean by
small? To date in the studies I have been involved in, we have always
had
either pi0>90% or pi0<25%. So I have not had to worry too much about
"small" - 25% is certainly small enough.
--Naomi
At 09:33 AM 6/7/2005, Mike Schaffer wrote:
>Hi,
>
>The lab I work with has used "whole genome" human arrays (~18,000
genes)
>for a couple years and I have helped with the analysis using Limma.
Now,
>due to costs, they are now considering switching from whole genome
arrays
>to focused arrays with ~400 genes of interest (selected from the
>whole-genome array results).
>
>The obvious analysis problems with a focused array where most genes
are
>changing are:
>
>1. LOESS normalization assumes most genes are not changing. If most
of
>the genes are expected to change, there is no basis to recenter the
data
>around zero. The response from the lab was that they would be
willing to
>include 100-150 genes that are not expected to change.
>
>2. The B-statistic in Limma requires a parameter indicating a certain
>fraction of genes are changing. The corresponding moderated
t-statistic
>uses the data from all genes to moderate the standard error in the t
>calculation. Both of these could change dramatically if most of the
genes
>on the array are changing.
>
>
>My questions are:
>
>1. Are my concerns valid and are there ways around around them? Are
there
>other analysis pitfalls with this scenario?
>
>2. Can Limma handle situations where most of an array is expected to
>change? What modifications, if any, need to be made to the Limma
analysis
>to account for this?
>
>3. Alternatively, is there a more appropriate statistical package to
use
>in this case?
>
>
>Thanks.
>
>--
>Mike
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
Naomi S. Altman 814-865-3791 (voice)
Associate Professor
Bioinformatics Consulting Center
Dept. of Statistics 814-863-7114 (fax)
Penn State University 814-865-1348
(Statistics)
University Park, PA 16802-2111