I wondered if anyone could comment on what is generally considered the most appropriate method of calculating fold change values from Affy data. I have a data set from a test vs. control experiment (n=3 in each group) performed on the MOE430_2 array. I used the Affy package to read in the CEL file data and GCRMA to normalize. Finally I used limma to analyze (lmFit and eBayes). Now the FC coefficients I get back from limma appear to be log2(2^(mean(test replicates in log2))/(2^mean(control replicates in log2)). For example:
> exprs(eset)[18731,1:6] TA2.CEL TA5.CEL TA7.CEL TA1.CEL TA8.CEL TA9.CEL 3.294215 4.851795 4.403851 4.782934 7.716893 9.284909 > fit$coefficients[18731,1:2] #returns the mean of the above log2 values Fed_MT Fed_WT 4.183287 7.261578 > fit2$coefficients[18731,1]  -3.078291 > -1/2^fit2$coefficients[18731,1]  -8.446136
So we have a probe that is reporting a -8.4 fold downregulation.
The problem is that SAM does not calculate the fold change values in the same manner. It appears to take the average of the unlogged data and then use those values to calculate fold changes. Thus:
> 2^exprs(eset)[18731,1:6] TA2.CEL TA5.CEL TA7.CEL TA1.CEL TA8.CEL TA9.CEL 9.809738 28.875927 21.168555 27.530016 210.385700 623.786694 > tmp <- c(mean(2^exprs(eset)[18731,1:3]),mean(2^exprs(eset)[18731,4:6])) > tmp  19.95141 287.23414 > tmp/tmp  0.06946043 > -1/(tmp/tmp)  -14.39669
So now we have -14.4 fold downregulation.
The example given is of course an extreme, but using the limma method there are 282 probes with a fold change >2, while using the sam method there are only 159, with 141 probes in common. The probes that were called uniquely by limma were almost all downregulated (-2 to -5 fold).
MY QUESTION IS WHICH METHOD IS THE CORRECT WAY?
Peter White, Ph.D.
Functional Genomics Core
Department of Genetics
University of Pennsylvania
570 Clinical Research Building
415 Curie Boulevard
Philadelphia, PA 19104-6145
Tel: +1 (215) 898-0773
Fax: +1 (215) 573-2326
E-mail: pwhite at mail.med.upenn.edu