I'm running LIMMA on a study based on HGU133plus2 chips I dug out of GEO. I normalized using RMA, GCRMA and MAS5 as usual, and ran AffyQC to look for outliers, of which I found none. There are 20 normals and 61 patients, and they cluster well in PCA. I decided to move forward with the GCRMA normalized set. I removed the 68 Affy control probes, and discarded the 13,875 / 54,613 probes (25.4% of all probes) which had an average log2 intensity < 2.27. Intensities for all arrays range from 2.268579 to 16.02096, so my belief is that these probes are uninformative. I finally calling LIMMA using the following:
eset_filtered$cohort <- factor(eset_filtered$cohort) design <- model.matrix(~0 + eset_filtered$cohort) colnames(design) <- levels(eset_filtered$cohort) fit <- lmFit(eset_filtered, design) contrast.matrix <- makeContrasts(SLE-CTL,levels = design) fit2 <- contrasts.fit(fit, contrast.matrix) fit2 <- eBayes(fit2) topTable.filtered <- topTable(fit2, coef=1, adjust="fdr", n=nrow(eset_filtered) ) rm( contrast.matrix, design, fit, fit2) index <- which(topTable.filtered$adj.P.Val <= 0.05) # 11,763 nrow(eset_filtered) # 32,054
As you can see, I'm getting 11,763 (out of a total of 32,054) probes at an FDR of 0.05. The topmost gene has an adjusted pval of 5.360915e-19 (the last gene is 0.9999355). What am I missing here? Why am I getting so many significant results? Everything normalized well, the MA plots look good, there weren't any wild outliers, the cohorts are labeled properly because they cluster together in PCA. I'm really wracking myself on this one as I've had (I believe) success with LIMMA in the past. Any help our community can provide is PROFOUNDLY appreciated.
Robert D. Robl