Limma analysis of focused arrays vs. whole genome arrays
0
0
Entering edit mode
Naomi Altman ★ 6.0k
@naomi-altman-380
Last seen 3.0 years ago
United States
I am a bit late on this discussion, but here is my input: We are using EST arrays, which have similar problems. We use a few spike-ins (3-10), spotted many times each. (We are using 50 spots/spike-in, but I think 25 would be sufficient.) We spike them in a titration series, which helps us determine how well the loess works (versus "A"). Having many replicates is very valuable - it gives us an idea of the natural variation in the system, and the within spot correlation for the 2 channels. It also means that we can create our own oligos, which is a good thing, given the cost of oligos. In one of our experiments, ordinary loess seems to be fine - i.e. the spike-ins end up where they should be. In the other experiment, the spike-ins "move" dramatically under one condition. We are investigating whether this was lab error, or a need for a different normalization. If most genes differentially express, you should abandon q-values. The reason is obvious if you think about it - the q-value is the estimated percentages of false detections. If 90% of the genes actually differentially express, the max. q-value is going to be 10%, even if you declare all the genes significant. In that case, you should be controlling the FNR - false non-detect rate. Storey's 2003 paper also discusses estimating FNR - but this has not been so popular. In any case, my simplistic solution is that if the q-value routine indicates small pi0, then I don't do multiple comparisons corrections. What do I mean by small? To date in the studies I have been involved in, we have always had either pi0>90% or pi0<25%. So I have not had to worry too much about "small" - 25% is certainly small enough. --Naomi At 09:33 AM 6/7/2005, Mike Schaffer wrote: >Hi, > >The lab I work with has used "whole genome" human arrays (~18,000 genes) >for a couple years and I have helped with the analysis using Limma. Now, >due to costs, they are now considering switching from whole genome arrays >to focused arrays with ~400 genes of interest (selected from the >whole-genome array results). > >The obvious analysis problems with a focused array where most genes are >changing are: > >1. LOESS normalization assumes most genes are not changing. If most of >the genes are expected to change, there is no basis to recenter the data >around zero. The response from the lab was that they would be willing to >include 100-150 genes that are not expected to change. > >2. The B-statistic in Limma requires a parameter indicating a certain >fraction of genes are changing. The corresponding moderated t-statistic >uses the data from all genes to moderate the standard error in the t >calculation. Both of these could change dramatically if most of the genes >on the array are changing. > > >My questions are: > >1. Are my concerns valid and are there ways around around them? Are there >other analysis pitfalls with this scenario? > >2. Can Limma handle situations where most of an array is expected to >change? What modifications, if any, need to be made to the Limma analysis >to account for this? > >3. Alternatively, is there a more appropriate statistical package to use >in this case? > > >Thanks. > >-- >Mike > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Bioinformatics Consulting Center Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111
Normalization limma Normalization limma • 852 views
ADD COMMENT

Login before adding your answer.

Traffic: 561 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6