Reproducibility of results with ComBat
1
0
Entering edit mode
@annebozack-8680
Last seen 21 months ago
United States

Hi,

My question regards working with 450k data - correcting for batch effects and testing for significant differences in methylation between two groups.  I'm rerunning code that my colleague originally wrote and ran in 2012.  Originally, 379 sites were identified as significantly different between groups (0.05 level of significance).  However, when I run the code now, I identify 5,711 sites.  The sites originally identified are included, and overall they are the more significant sites - most are included in the top 10% of significant sites.  I am using the current version of R and bioconductor, but the code was originally run on an older version (likely 2.10).  Might the difference in significant sites be due to changes in the SVA package, or packages that it relies on?

My code is: (mscoreset is an expression set)

pheno=pData(mscoreset)
edata=exprs(mscoreset)
mod=model.matrix(~as.factor(Group) + as.factor(var1) + as.factor(var2), data=pheno)
mod0=model.matrix(~as.factor(var1) + as.factor(var2), data=pheno)
batch=pheno$Batch combat_edata=ComBat(dat=edata, batch=batch, mod=mod, par.prior=TRUE) pValuesComBat=f.pvalue(combat_edata,mod,mod0) qValuesComBat=p.adjust(pValuesComBat,method="BH") Thanks for your help! Anne sva combat • 920 views ADD COMMENT 0 Entering edit mode Jeff Leek ▴ 610 @jeff-leek-5015 Last seen 7 months ago United States Hi Anne, The code for Combat has definitely been updated since then and it is possible this is causing the change. Can you try rerunning your code without Group in the mod variable you pass to Combat? Best Jeff On Mon, Aug 31, 2015 at 3:04 PM anne.bozack [bioc] <noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org > > User anne.bozack <https: support.bioconductor.org="" u="" 8680=""/> wrote Question: > Reproducibility of results with Comb > <https: support.bioconductor.org="" p="" 71668=""/>: > > Hi, > > My question regards working with 450k data - correcting for batch effects > and testing for significant differences in methylation between two groups. > I'm rerunning code that my colleague originally wrote and ran in 2012. > Originally, 379 sites were identified as significantly different between > groups (0.05 level of significance). However, when I run the code now, I > identify 5,711 sites. The sites originally identified are included, and > overall they are the more significant sites - most are included in the top > 10% of significant sites. I am using the current version of R and > bioconductor, but the code was originally run on an older version (likely > 2.10). Might the difference in significant sites be due to changes in the > SVA package, or packages that it relies on? > > My code is: (mscoreset is an expression set) > > pheno=pData(mscoreset) > edata=exprs(mscoreset) > mod=model.matrix(~as.factor(Group) + as.factor(var1) + as.factor(var2), data=pheno) > mod0=model.matrix(~as.factor(var1) + as.factor(var2), data=pheno) > batch=pheno$Batch > combat_edata=ComBat(dat=edata, batch=batch, mod=mod, par.prior=TRUE) > > pValuesComBat=f.pvalue(combat_edata,mod,mod0) > qValuesComBat=p.adjust(pValuesComBat,method="BH") > > Thanks for your help! > > Anne > > > > > ------------------------------ > > Post tags: sva, combat > > You may reply via email or visit Reproducibility of results with ComBat >
0
Entering edit mode
0
Entering edit mode

Hi Evan,

Yes, I can share my data with you.  Can I contact you by email once I figure out the best method to send you the data?

Thanks!

0
Entering edit mode

Hi Jeff,

Thanks for your response.  To clarify, do you mean running ComBat with the null model (mod0), and then testing for differences between groups (the Group variable) by running f.pvalue and p.adjust as written?  However, the documentation for SVA states that the model passed used for ComBat should include all variables of interest.  If I do try running ComBat with the null model I only get 9 significant sites.