I am trying to write an analysis of the differential expression of genes for the experiment geod19804. After normalizing the data with RMA and building the ExpressionSet, I have all the samples grouped in two conditions for $Lung.tissue: lung_cancer and paired_normal_adjacent (cancer vs. normal). The t-test with equal variance
cancer = pData(geod19804)[,"Lung.tissue"] tt = genefilter::rowttests(geod19804, cancer)
already gives me some very wrong results (with alpha=0.05, +27000 significant genes and 6411 after adjusting with Bonferroni).
The bigger problem comes from trying to do a moderated t-test.
cancer = pData(geod19804)[,"Lung.tissue"] #Model matrix design = model.matrix(~0+cancer) colnames(design) = c("lung_cancer","paired_normal_adjacent") #Adjust lmadjus = lmFit(geod19804,design) contrast.matrix = makeContrasts(dif = "lung_cancer - paired_normal_adjacent",levels = design) #Estimate fitbayes = contrasts.fit(lmadjus,contrast.matrix) fitbayes = eBayes(lmadjus)
With this, I get 81684 significant genes! (and 60681 with Bonferroni and 80126 with BH). Now, it may seem like either the normalization of the data or the creation of the ExpressionSet were wrong somewhere, but
head(fitbayes$p.value) lung_cancer paired_normal_adjacent 1007_s_at 1.311383e-142 5.935409e-139 1053_at 6.818996e-130 2.330951e-127 117_at 1.109553e-108 7.159564e-110
shows two columns for p-values, where there should only be one, which (I guess) means that the test is not actually contrasting between cancer and normal, but I have no idea why, and I have seen code similar to mine working just fine.