Reproducibility of results with ComBat
1
0
Entering edit mode
@annebozack-8680
Last seen 5.0 years ago
United States

Hi,

My question regards working with 450k data - correcting for batch effects and testing for significant differences in methylation between two groups.  I'm rerunning code that my colleague originally wrote and ran in 2012.  Originally, 379 sites were identified as significantly different between groups (0.05 level of significance).  However, when I run the code now, I identify 5,711 sites.  The sites originally identified are included, and overall they are the more significant sites - most are included in the top 10% of significant sites.  I am using the current version of R and bioconductor, but the code was originally run on an older version (likely 2.10).  Might the difference in significant sites be due to changes in the SVA package, or packages that it relies on?

My code is: (mscoreset is an expression set)

pheno=pData(mscoreset)
edata=exprs(mscoreset)
mod=model.matrix(~as.factor(Group) + as.factor(var1) + as.factor(var2), data=pheno)
mod0=model.matrix(~as.factor(var1) + as.factor(var2), data=pheno)
batch=pheno$Batch
combat_edata=ComBat(dat=edata, batch=batch, mod=mod, par.prior=TRUE)

pValuesComBat=f.pvalue(combat_edata,mod,mod0)
qValuesComBat=p.adjust(pValuesComBat,method="BH")

Thanks for your help!

Anne

 

 

sva combat • 1.6k views
ADD COMMENT
0
Entering edit mode
Jeff Leek ▴ 650
@jeff-leek-5015
Last seen 3.8 years ago
United States
Hi Anne, The code for Combat has definitely been updated since then and it is possible this is causing the change. Can you try rerunning your code without Group in the mod variable you pass to Combat? Best Jeff On Mon, Aug 31, 2015 at 3:04 PM anne.bozack [bioc] <noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org > > User anne.bozack <https: support.bioconductor.org="" u="" 8680=""/> wrote Question: > Reproducibility of results with Comb > <https: support.bioconductor.org="" p="" 71668=""/>: > > Hi, > > My question regards working with 450k data - correcting for batch effects > and testing for significant differences in methylation between two groups. > I'm rerunning code that my colleague originally wrote and ran in 2012. > Originally, 379 sites were identified as significantly different between > groups (0.05 level of significance). However, when I run the code now, I > identify 5,711 sites. The sites originally identified are included, and > overall they are the more significant sites - most are included in the top > 10% of significant sites. I am using the current version of R and > bioconductor, but the code was originally run on an older version (likely > 2.10). Might the difference in significant sites be due to changes in the > SVA package, or packages that it relies on? > > My code is: (mscoreset is an expression set) > > pheno=pData(mscoreset) > edata=exprs(mscoreset) > mod=model.matrix(~as.factor(Group) + as.factor(var1) + as.factor(var2), data=pheno) > mod0=model.matrix(~as.factor(var1) + as.factor(var2), data=pheno) > batch=pheno$Batch > combat_edata=ComBat(dat=edata, batch=batch, mod=mod, par.prior=TRUE) > > pValuesComBat=f.pvalue(combat_edata,mod,mod0) > qValuesComBat=p.adjust(pValuesComBat,method="BH") > > Thanks for your help! > > Anne > > > > > ------------------------------ > > Post tags: sva, combat > > You may reply via email or visit Reproducibility of results with ComBat >
ADD COMMENT
0
Entering edit mode
Hey Anne, This is a bit surprising. I would be very interested in looking more into this. Although we have changed a significant amount of the code, it has all been on the data handling, warnings, and adding additional options. So it is possible that the input handling can lead to different results, however, the code for the base algorithm of ComBat has not changed for a long time. So you should be getting the same results with equivalent input--so I am surprised its different. Would you be able to share some of your data? I would be happy to take a look and see if there is a problem. Evan On Aug 31, 2015, at 7:50 PM, Jeff Leek [bioc] <noreply@bioconductor.org<mailto:noreply@bioconductor.org>> wrote: Activity on a post you are following on support.bioconductor.org<https: support.bioconductor.org=""/> User Jeff Leek<https: support.bioconductor.org="" u="" 5015=""/> wrote Answer: Reproducibility of results with ComBat<https: support.bioconductor.org="" p="" 71668="" #71679="">: Hi Anne, The code for Combat has definitely been updated since then and it is possible this is causing the change. Can you try rerunning your code without Group in the mod variable you pass to Combat? Best Jeff On Mon, Aug 31, 2015 at 3:04 PM anne.bozack [bioc] <noreply@bioconductor.org<mailto:noreply@bioconductor.org>> wrote: > Activity on a post you are following on support.bioconductor.org<http: support.bioconductor.org=""> > > User anne.bozack <https: support.bioconductor.org<http:="" support.bioconductor.org="">="" u="" 8680=""/> wrote Question: > Reproducibility of results with Comb > <https: support.bioconductor.org<http:="" support.bioconductor.org="">="" p="" 71668=""/>: > > Hi, > > My question regards working with 450k data - correcting for batch effects > and testing for significant differences in methylation between two groups. > I'm rerunning code that my colleague originally wrote and ran in 2012. > Originally, 379 sites were identified as significantly different between > groups (0.05 level of significance). However, when I run the code now, I > identify 5,711 sites. The sites originally identified are included, and > overall they are the more significant sites - most are included in the top > 10% of significant sites. I am using the current version of R and > bioconductor, but the code was originally run on an older version (likely > 2.10). Might the difference in significant sites be due to changes in the > SVA package, or packages that it relies on? > > My code is: (mscoreset is an expression set) > > pheno=pData(mscoreset) > edata=exprs(mscoreset) > mod=model.matrix(~as.factor(Group) + as.factor(var1) + as.factor(var2), data=pheno) > mod0=model.matrix(~as.factor(var1) + as.factor(var2), data=pheno) > batch=pheno$Batch > combat_edata=ComBat(dat=edata, batch=batch, mod=mod, par.prior=TRUE) > > pValuesComBat=f.pvalue(combat_edata,mod,mod0) > qValuesComBat=p.adjust(pValuesComBat,method="BH") > > Thanks for your help! > > Anne > > > > > ------------------------------ > > Post tags: sva, combat > > You may reply via email or visit Reproducibility of results with ComBat<https: support.bioconductor.org="" p="" 71668=""/> > ________________________________ Post tags: sva, combat You may reply via email or visit A: Reproducibility of results with ComBat
ADD REPLY
0
Entering edit mode

Hi Evan,

Yes, I can share my data with you.  Can I contact you by email once I figure out the best method to send you the data?

Thanks!

ADD REPLY
0
Entering edit mode

Hi Jeff,

Thanks for your response.  To clarify, do you mean running ComBat with the null model (mod0), and then testing for differences between groups (the Group variable) by running f.pvalue and p.adjust as written?  However, the documentation for SVA states that the model passed used for ComBat should include all variables of interest.  If I do try running ComBat with the null model I only get 9 significant sites. 

ADD REPLY

Login before adding your answer.

Traffic: 977 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6