Limma analysis of focused arrays vs. whole genome arrays

0

Entering edit mode

Naomi Altman ★ 6.0k

@naomi-altman-380

Last seen 3.0 years ago

United States

I am a bit late on this discussion, but here is my input: We are using EST arrays, which have similar problems. We use a few spike-ins (3-10), spotted many times each. (We are using 50 spots/spike-in, but I think 25 would be sufficient.) We spike them in a titration series, which helps us determine how well the loess works (versus "A"). Having many replicates is very valuable - it gives us an idea of the natural variation in the system, and the within spot correlation for the 2 channels. It also means that we can create our own oligos, which is a good thing, given the cost of oligos. In one of our experiments, ordinary loess seems to be fine - i.e. the spike-ins end up where they should be. In the other experiment, the spike-ins "move" dramatically under one condition. We are investigating whether this was lab error, or a need for a different normalization. If most genes differentially express, you should abandon q-values. The reason is obvious if you think about it - the q-value is the estimated percentages of false detections. If 90% of the genes actually differentially express, the max. q-value is going to be 10%, even if you declare all the genes significant. In that case, you should be controlling the FNR - false non-detect rate. Storey's 2003 paper also discusses estimating FNR - but this has not been so popular. In any case, my simplistic solution is that if the q-value routine indicates small pi0, then I don't do multiple comparisons corrections. What do I mean by small? To date in the studies I have been involved in, we have always had either pi0>90% or pi0<25%. So I have not had to worry too much about "small" - 25% is certainly small enough. --Naomi At 09:33 AM 6/7/2005, Mike Schaffer wrote: >Hi, > >The lab I work with has used "whole genome" human arrays (~18,000 genes) >for a couple years and I have helped with the analysis using Limma. Now, >due to costs, they are now considering switching from whole genome arrays >to focused arrays with ~400 genes of interest (selected from the >whole-genome array results). > >The obvious analysis problems with a focused array where most genes are >changing are: > >1. LOESS normalization assumes most genes are not changing. If most of >the genes are expected to change, there is no basis to recenter the data >around zero. The response from the lab was that they would be willing to >include 100-150 genes that are not expected to change. > >2. The B-statistic in Limma requires a parameter indicating a certain >fraction of genes are changing. The corresponding moderated t-statistic >uses the data from all genes to moderate the standard error in the t >calculation. Both of these could change dramatically if most of the genes >on the array are changing. > > >My questions are: > >1. Are my concerns valid and are there ways around around them? Are there >other analysis pitfalls with this scenario? > >2. Can Limma handle situations where most of an array is expected to >change? What modifications, if any, need to be made to the Limma analysis >to account for this? > >3. Alternatively, is there a more appropriate statistical package to use >in this case? > > >Thanks. > >-- >Mike > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Bioinformatics Consulting Center Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111

Normalization limma Normalization limma • 852 views

ADD COMMENT • link 18.8 years ago Naomi Altman ★ 6.0k

Login before adding your answer.