My question regards working with 450k data - correcting for batch effects and testing for significant differences in methylation between two groups. I'm rerunning code that my colleague originally wrote and ran in 2012. Originally, 379 sites were identified as significantly different between groups (0.05 level of significance). However, when I run the code now, I identify 5,711 sites. The sites originally identified are included, and overall they are the more significant sites - most are included in the top 10% of significant sites. I am using the current version of R and bioconductor, but the code was originally run on an older version (likely 2.10). Might the difference in significant sites be due to changes in the SVA package, or packages that it relies on?
My code is: (mscoreset is an expression set)
pheno=pData(mscoreset) edata=exprs(mscoreset) mod=model.matrix(~as.factor(Group) + as.factor(var1) + as.factor(var2), data=pheno) mod0=model.matrix(~as.factor(var1) + as.factor(var2), data=pheno) batch=pheno$Batch combat_edata=ComBat(dat=edata, batch=batch, mod=mod, par.prior=TRUE) pValuesComBat=f.pvalue(combat_edata,mod,mod0) qValuesComBat=p.adjust(pValuesComBat,method="BH")
Thanks for your help!