Diffbind : use of controlBam
2
0
Entering edit mode
Rory Stark ★ 5.2k
@rory-stark-5741
Last seen 27 days ago
Cambridge, UK
Hello Thomas- This should work fine (using the same bam files as both ChIP and control). I don't think I've actually tested this scenario, so if there is a problem it would be a bug that would get fixed. Cheers- Rory From: Thomas Nalpathamkalam <thomas.nalpathamkalam@sickkids.ca<mailto: thomas.nalpathamkalam@sickkids.ca="">> Date: Fri, 26 Jul 2013 17:32:48 +0000 To: Rory Stark <rory.stark@cruk.cam.ac.uk<mailto:rory.stark@cruk.cam.ac.uk>> Cc: Daniele Merico <daniele.merico@sickkids.ca<mailto:daniele.merico@sickkids.ca>> Subject: Diffbind : use of controlBam Dear Rory Stark , we used DiffBind with MACS peaks. In our experiment, we had 3 biological replicates x 2 experimental conditions (tumor, control). We did not have input samples, so when we did peak calling in MACS we paired samples as follows: rep1-ctrl vs rep1-tumor, rep2-ctrl vs rep2-tumor, rep3-ctrl vs rep3-tumor, rep1-tumor vs rep1-ctrl, rep2-ctrl vs rep2-tumorand rep3-ctrl vs rep3-tumor . This gave us the peaks for each control and tumor replicate. This is approved by the MACS protocol. When we load the MACS peaks in DiffBind, we are not sure if we can use the same replicate twice: as a "bamReads" for its condition, and as a "controlBam" for the other condition. This follows the sample matching we have used in MACS, but it may not work right in DiffBind. What is opinion on this? Do you prefer to do analysis without "controlBam" in this scenario? Thank you in advance, Thomas Nalpathamkalam Thomas Nalpathamkalam The Centre for Applied Genomics (TCAG) The Hospital for Sick Children MaRS Building - East Tower 101 College St., Room 14-701 Toronto, ON M5G 1L7 thomas.nalpathamkalam@sickkids.ca<mailto:thomas.nalpathamkalam@sickkid s.ca=""> (416)-813-7032 www.tcag.ca ________________________________ This e-mail may contain confidential, personal and/or health information(information which may be subject to legal restrictions on use, retention and/or disclosure) for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this e-mail in error, please contact the sender and delete all copies. [[alternative HTML version deleted]]
DiffBind DiffBind • 1.4k views
ADD COMMENT
0
Entering edit mode
@daniele-merico-6063
Last seen 10.2 years ago
Thanks Rory. We find quite different results when using bamControls, with heatmaps looking much cleaner (no residual batch effects between replicates). We will look with more care into differences. Luckily we have a few loci for which we know the true differences, and since this is a methyl-seq experiment we expect an enrichment of significantly differential peaks overlapping CpG islands. Best, Daniele and Thomas On 2013-07-26, at 1:44 PM, Rory Stark wrote: Hello Thomas- This should work fine (using the same bam files as both ChIP and control). I don't think I've actually tested this scenario, so if there is a problem it would be a bug that would get fixed. Cheers- Rory From: Thomas Nalpathamkalam <thomas.nalpathamkalam@sickkids.ca<mailto: thomas.nalpathamkalam@sickkids.ca="">> Date: Fri, 26 Jul 2013 17:32:48 +0000 To: Rory Stark <rory.stark@cruk.cam.ac.uk<mailto:rory.stark@cruk.cam.ac.uk>> Cc: Daniele Merico <daniele.merico@sickkids.ca<mailto:daniele.merico@sickkids.ca>> Subject: Diffbind : use of controlBam Dear Rory Stark , we used DiffBind with MACS peaks. In our experiment, we had 3 biological replicates x 2 experimental conditions (tumor, control). We did not have input samples, so when we did peak calling in MACS we paired samples as follows: rep1-ctrl vs rep1-tumor, rep2-ctrl vs rep2-tumor, rep3-ctrl vs rep3-tumor, rep1-tumor vs rep1-ctrl, rep2-ctrl vs rep2-tumorand rep3-ctrl vs rep3-tumor . This gave us the peaks for each control and tumor replicate. This is approved by the MACS protocol. When we load the MACS peaks in DiffBind, we are not sure if we can use the same replicate twice: as a "bamReads" for its condition, and as a "controlBam" for the other condition. This follows the sample matching we have used in MACS, but it may not work right in DiffBind. What is opinion on this? Do you prefer to do analysis without "controlBam" in this scenario? Thank you in advance, Thomas Nalpathamkalam Thomas Nalpathamkalam The Centre for Applied Genomics (TCAG) The Hospital for Sick Children MaRS Building - East Tower 101 College St., Room 14-701 Toronto, ON M5G 1L7 thomas.nalpathamkalam@sickkids.ca<mailto:thomas.nalpathamkalam@sickkid s.ca=""> (416)-813-7032 www.tcag.ca<http: www.tcag.ca=""> ________________________________ This e-mail may contain confidential, personal and/or health information(information which may be subject to legal restrictions on use, retention and/or disclosure) for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this e-mail in error, please contact the sender and delete all copies. Daniele Merico -- PhD, Molecular And Cellular Biology Informatics Core Facility Manager The Centre for Applied Genomics (TCAG) Toronto (ON), Canada daniele.merico@sickkids.ca<mailto:daniele.merico@sickkids.ca> daniele.merico@gmail.com<mailto:daniele.merico@gmail.com> 1===0 0===1 1=0 0 0=1 0===1 1===0 0=0 1 0=1 1===0 ________________________________ This e-mail may contain confidential, personal and/or health information(information which may be subject to legal restrictions on use, retention and/or disclosure) for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this e-mail in error, please contact the sender and delete all copies. [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Hi Daniele and Thomas- Given that the controls are actually ChIPs with signal, you are certainly correct to take them into account, and not surprising that this has an effect on clustering. One thing I'd suggest to get a better handle on this effect is to compare different counting/normalization scores. You can try these by using the "score" parameter to dba.plotHeatmap (or simply plot) and dba.plotPCA (I'd definitely be looking at PCA plots in this case, esp. if you are seeing batch effects in some cases). For example, you can compare scores that do or don;t take the control into account: * DBA_SCORE_READS vs. DBA_SCORE_READS_MINUS or DBA_SCORE_READS_FOLD * DBA_SCORE_RPKM vs DBA_SCORE_RPKM_FOLD * DBA_SCORE_TMM_READS_FULL or DBA_SCORE_TMM_READS_EFFECTIVE vs DBA_SCORE_TMM_MINUS_FULL or DBA_SCORE_TMM_MINUS_EFFECTIVE The DBA_SCORE_RPKM scores would be a good place to start! Cheers- Rory From: Daniele Merico <daniele.merico@sickkids.ca> Date: Fri, 26 Jul 2013 17:51:49 +0000 To: Rory Stark <rory.stark at="" cruk.cam.ac.uk=""> Cc: Thomas Nalpathamkalam <thomas.nalpathamkalam at="" sickkids.ca="">, "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> Subject: Re: Diffbind : use of controlBam Thanks Rory. We find quite different results when using bamControls, with heatmaps looking much cleaner (no residual batch effects between replicates). We will look with more care into differences. Luckily we have a few loci for which we know the true differences, and since this is a methyl-seq experiment we expect an enrichment of significantly differential peaks overlapping CpG islands. Best, Daniele and Thomas On 2013-07-26, at 1:44 PM, Rory Stark wrote: Hello Thomas- This should work fine (using the same bam files as both ChIP and control). I don't think I've actually tested this scenario, so if there is a problem it would be a bug that would get fixed. Cheers- Rory From: Thomas Nalpathamkalam <thomas.nalpathamkalam@sickkids.ca> Date: Fri, 26 Jul 2013 17:32:48 +0000 To: Rory Stark <rory.stark at="" cruk.cam.ac.uk=""> Cc: Daniele Merico <daniele.merico at="" sickkids.ca=""> Subject: Diffbind : use of controlBam Dear Rory Stark , we used DiffBind with MACS peaks. In our experiment, we had 3 biological replicates x 2 experimental conditions (tumor, control). We did not have input samples, so when we did peak calling in MACS we paired samples as follows: rep1-ctrl vs rep1-tumor, rep2-ctrl vs rep2-tumor, rep3-ctrl vs rep3-tumor, rep1-tumor vs rep1-ctrl, rep2-ctrl vs rep2-tumorand rep3-ctrl vs rep3-tumor . This gave us the peaks for each control and tumor replicate. This is approved by the MACS protocol. When we load the MACS peaks in DiffBind, we are not sure if we can use the same replicate twice: as a "bamReads" for its condition, and as a "controlBam" for the other condition. This follows the sample matching we have used in MACS, but it may not work right in DiffBind. What is opinion on this? Do you prefer to do analysis without "controlBam" in this scenario? Thank you in advance, Thomas Nalpathamkalam Thomas Nalpathamkalam The Centre for Applied Genomics (TCAG) The Hospital for Sick Children MaRS Building - East Tower 101 College St., Room 14-701 Toronto, ON M5G 1L7 thomas.nalpathamkalam at sickkids.ca (416)-813-7032 www.tcag.ca <http: www.tcag.ca=""> ________________________________________ This e-mail may contain confidential, personal and/or health information(information which may be subject to legal restrictions on use, retention and/or disclosure) for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this e-mail in error, please contact the sender and delete all copies. Daniele Merico -- PhD, Molecular And Cellular Biology Informatics Core Facility Manager The Centre for Applied Genomics (TCAG) Toronto (ON), Canada daniele.merico at sickkids.ca daniele.merico at gmail.com 1===0 0===1 1=0 0 0=1 0===1 1===0 0=0 1 0=1 1===0 ________________________________________ This e-mail may contain confidential, personal and/or health information(information which may be subject to legal restrictions on use, retention and/or disclosure) for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this e-mail in error, please contact the sender and delete all copies.
ADD REPLY
0
Entering edit mode
Hi Rory, thanks. As far as normalization, I think we should got for TMM_MINUS_FULL: * TMM has been shown superior to other methods * since we have more methylation in one condition, we have an issue with normalization reducing that true global difference, and I think TMM_..._FULL should be the best way to address that. Best, Daniele On 2013-07-29, at 7:36 AM, Rory Stark wrote: > Hi Daniele and Thomas- > > Given that the controls are actually ChIPs with signal, you are certainly > correct to take them into account, and not surprising that this has an > effect on clustering. > > One thing I'd suggest to get a better handle on this effect is to compare > different counting/normalization scores. You can try these by using the > "score" parameter to dba.plotHeatmap (or simply plot) and dba.plotPCA (I'd > definitely be looking at PCA plots in this case, esp. if you are seeing > batch effects in some cases). > > For example, you can compare scores that do or don;t take the control into > account: > > * DBA_SCORE_READS vs. DBA_SCORE_READS_MINUS or DBA_SCORE_READS_FOLD > * DBA_SCORE_RPKM vs DBA_SCORE_RPKM_FOLD > * DBA_SCORE_TMM_READS_FULL or DBA_SCORE_TMM_READS_EFFECTIVE vs > DBA_SCORE_TMM_MINUS_FULL or DBA_SCORE_TMM_MINUS_EFFECTIVE > > The DBA_SCORE_RPKM scores would be a good place to start! > > Cheers- > Rory > > From: Daniele Merico <daniele.merico at="" sickkids.ca=""> > Date: Fri, 26 Jul 2013 17:51:49 +0000 > To: Rory Stark <rory.stark at="" cruk.cam.ac.uk=""> > Cc: Thomas Nalpathamkalam <thomas.nalpathamkalam at="" sickkids.ca="">, > "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> > Subject: Re: Diffbind : use of controlBam > > > Thanks Rory. > > We find quite different results when using bamControls, with heatmaps > looking much cleaner (no residual batch effects between replicates). > > We will look with more care into differences. Luckily we have a few loci > for which we know the true differences, and since this is a methyl- seq > experiment we expect an enrichment of significantly differential peaks > overlapping CpG islands. > > Best, > Daniele and Thomas > > On 2013-07-26, at 1:44 PM, Rory Stark wrote: > > > Hello Thomas- > > This should work fine (using the same bam files as both ChIP and control). > I don't think I've actually tested this scenario, so if there is a problem > it would be a bug that would get fixed. > > Cheers- > Rory > > > > From: Thomas Nalpathamkalam <thomas.nalpathamkalam at="" sickkids.ca=""> > Date: Fri, 26 Jul 2013 17:32:48 +0000 > To: Rory Stark <rory.stark at="" cruk.cam.ac.uk=""> > Cc: Daniele Merico <daniele.merico at="" sickkids.ca=""> > Subject: Diffbind : use of controlBam > > > Dear > Rory Stark , > > we used DiffBind with MACS peaks. > > In our experiment, we had 3 biological replicates x 2 experimental > conditions (tumor, control). We did not have input samples, so when we did > peak calling in MACS we paired samples as follows: rep1-ctrl vs rep1-tumor, > rep2-ctrl vs rep2-tumor, > rep3-ctrl vs rep3-tumor, rep1-tumor vs rep1-ctrl, > rep2-ctrl vs rep2-tumorand > rep3-ctrl vs rep3-tumor . This gave us the peaks for each control and > tumor replicate. This is approved by the MACS protocol. > > When we load the MACS peaks in DiffBind, we are not sure if we can use the > same replicate twice: as a "bamReads" for its condition, and as a > "controlBam" for the other condition. This follows the sample matching we > have used in MACS, but it may not work right > in DiffBind. > > What is opinion on this? Do you prefer to > do analysis without "controlBam" in this scenario? > > > Thank you in advance, > > > Thomas Nalpathamkalam > > > > > Thomas Nalpathamkalam > The Centre for Applied Genomics (TCAG) > The Hospital for Sick Children > MaRS Building - East Tower > 101 College St., Room 14-701 > Toronto, ON M5G 1L7 > thomas.nalpathamkalam at sickkids.ca > (416)-813-7032 > www.tcag.ca <http: www.tcag.ca=""> > > > > > > ________________________________________ > > This e-mail may contain confidential, personal and/or health > information(information which may be subject to legal restrictions on use, > retention and/or disclosure) for the sole use of the intended recipient. > Any review or distribution by anyone other than > the person for whom it was originally intended is strictly prohibited. If > you have received this e-mail in error, please contact the sender and > delete all copies. > > > > > > > > Daniele Merico > -- > PhD, Molecular And Cellular Biology > Informatics Core Facility Manager > The Centre for Applied Genomics (TCAG) > Toronto (ON), Canada > > daniele.merico at sickkids.ca > daniele.merico at gmail.com > > 1===0 > 0===1 > 1=0 > 0 > 0=1 > 0===1 > 1===0 > 0=0 > 1 > 0=1 > 1===0 > > > > > > > > ________________________________________ > > This e-mail may contain confidential, personal and/or health > information(information which may be subject to legal restrictions on use, > retention and/or disclosure) for the sole use of the intended recipient. > Any review or distribution by anyone other than > the person for whom it was originally intended is strictly prohibited. If > you have received this e-mail in error, please contact the sender and > delete all copies. > Daniele Merico -- PhD, Molecular And Cellular Biology Informatics Core Facility Manager The Centre for Applied Genomics (TCAG) Toronto (ON), Canada daniele.merico at sickkids.ca daniele.merico at gmail.com 1===0 0===1 1=0 0 0=1 0===1 1===0 0=0 1 0=1 1===0 ________________________________ This e-mail may contain confidential, personal and/or health information(information which may be subject to legal restrictions on use, retention and/or disclosure) for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this e-mail in error, please contact the sender and delete all copies.
ADD REPLY
0
Entering edit mode
Rory Stark ★ 5.2k
@rory-stark-5741
Last seen 27 days ago
Cambridge, UK
I agree with your reasoning on which normalization method to use for the differential analysis. It is particularly good to see you appreciate the difference between _FULL and _EFFECTIVE and why _FULL is more appropriate in this case! The RPKM scores are only used for plotting, and even then only for unanalyzed data, but can be useful for revealing batch effects. Cheers- Rory On 29/07/2013 16:36, "Daniele Merico" <daniele.merico at="" sickkids.ca=""> wrote: >Hi Rory, thanks. > >As far as normalization, I think we should got for TMM_MINUS_FULL: >* TMM has been shown superior to other methods >* since we have more methylation in one condition, we have an issue with >normalization reducing that true global difference, and I think >TMM_..._FULL should be the best way to address that. > >Best, >Daniele > >On 2013-07-29, at 7:36 AM, Rory Stark wrote: > >> Hi Daniele and Thomas- >> >> Given that the controls are actually ChIPs with signal, you are >>certainly >> correct to take them into account, and not surprising that this has an >> effect on clustering. >> >> One thing I'd suggest to get a better handle on this effect is to >>compare >> different counting/normalization scores. You can try these by using the >> "score" parameter to dba.plotHeatmap (or simply plot) and dba.plotPCA >>(I'd >> definitely be looking at PCA plots in this case, esp. if you are seeing >> batch effects in some cases). >> >> For example, you can compare scores that do or don;t take the control >>into >> account: >> >> * DBA_SCORE_READS vs. DBA_SCORE_READS_MINUS or DBA_SCORE_READS_FOLD >> * DBA_SCORE_RPKM vs DBA_SCORE_RPKM_FOLD >> * DBA_SCORE_TMM_READS_FULL or DBA_SCORE_TMM_READS_EFFECTIVE vs >> DBA_SCORE_TMM_MINUS_FULL or DBA_SCORE_TMM_MINUS_EFFECTIVE >> >> The DBA_SCORE_RPKM scores would be a good place to start! >> >> Cheers- >> Rory >> >> From: Daniele Merico <daniele.merico at="" sickkids.ca=""> >> Date: Fri, 26 Jul 2013 17:51:49 +0000 >> To: Rory Stark <rory.stark at="" cruk.cam.ac.uk=""> >> Cc: Thomas Nalpathamkalam <thomas.nalpathamkalam at="" sickkids.ca="">, >> "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> >> Subject: Re: Diffbind : use of controlBam >> >> >> Thanks Rory. >> >> We find quite different results when using bamControls, with heatmaps >> looking much cleaner (no residual batch effects between replicates). >> >> We will look with more care into differences. Luckily we have a few loci >> for which we know the true differences, and since this is a methyl- seq >> experiment we expect an enrichment of significantly differential peaks >> overlapping CpG islands. >> >> Best, >> Daniele and Thomas >> >> On 2013-07-26, at 1:44 PM, Rory Stark wrote: >> >> >> Hello Thomas- >> >> This should work fine (using the same bam files as both ChIP and >>control). >> I don't think I've actually tested this scenario, so if there is a >>problem >> it would be a bug that would get fixed. >> >> Cheers- >> Rory >> >> >> >> From: Thomas Nalpathamkalam <thomas.nalpathamkalam at="" sickkids.ca=""> >> Date: Fri, 26 Jul 2013 17:32:48 +0000 >> To: Rory Stark <rory.stark at="" cruk.cam.ac.uk=""> >> Cc: Daniele Merico <daniele.merico at="" sickkids.ca=""> >> Subject: Diffbind : use of controlBam >> >> >> Dear >> Rory Stark , >> >> we used DiffBind with MACS peaks. >> >> In our experiment, we had 3 biological replicates x 2 experimental >> conditions (tumor, control). We did not have input samples, so when we >>did >> peak calling in MACS we paired samples as follows: rep1-ctrl vs >>rep1-tumor, >> rep2-ctrl vs rep2-tumor, >> rep3-ctrl vs rep3-tumor, rep1-tumor vs rep1-ctrl, >> rep2-ctrl vs rep2-tumorand >> rep3-ctrl vs rep3-tumor . This gave us the peaks for each control and >> tumor replicate. This is approved by the MACS protocol. >> >> When we load the MACS peaks in DiffBind, we are not sure if we can use >>the >> same replicate twice: as a "bamReads" for its condition, and as a >> "controlBam" for the other condition. This follows the sample matching >>we >> have used in MACS, but it may not work right >> in DiffBind. >> >> What is opinion on this? Do you prefer to >> do analysis without "controlBam" in this scenario? >> >> >> Thank you in advance, >> >> >> Thomas Nalpathamkalam >> >> >> >> >> Thomas Nalpathamkalam >> The Centre for Applied Genomics (TCAG) >> The Hospital for Sick Children >> MaRS Building - East Tower >> 101 College St., Room 14-701 >> Toronto, ON M5G 1L7 >> thomas.nalpathamkalam at sickkids.ca >> (416)-813-7032 >> www.tcag.ca <http: www.tcag.ca=""> >> >> >> >> >> >> ________________________________________ >> >> This e-mail may contain confidential, personal and/or health >> information(information which may be subject to legal restrictions on >>use, >> retention and/or disclosure) for the sole use of the intended recipient. >> Any review or distribution by anyone other than >> the person for whom it was originally intended is strictly prohibited. >>If >> you have received this e-mail in error, please contact the sender and >> delete all copies. >> >> >> >> >> >> >> >> Daniele Merico >> -- >> PhD, Molecular And Cellular Biology >> Informatics Core Facility Manager >> The Centre for Applied Genomics (TCAG) >> Toronto (ON), Canada >> >> daniele.merico at sickkids.ca >> daniele.merico at gmail.com >> >> 1===0 >> 0===1 >> 1=0 >> 0 >> 0=1 >> 0===1 >> 1===0 >> 0=0 >> 1 >> 0=1 >> 1===0 >> >> >> >> >> >> >> >> ________________________________________ >> >> This e-mail may contain confidential, personal and/or health >> information(information which may be subject to legal restrictions on >>use, >> retention and/or disclosure) for the sole use of the intended recipient. >> Any review or distribution by anyone other than >> the person for whom it was originally intended is strictly prohibited. >>If >> you have received this e-mail in error, please contact the sender and >> delete all copies. >> > >Daniele Merico >-- >PhD, Molecular And Cellular Biology >Informatics Core Facility Manager >The Centre for Applied Genomics (TCAG) >Toronto (ON), Canada > >daniele.merico at sickkids.ca >daniele.merico at gmail.com > >1===0 > 0===1 > 1=0 > 0 > 0=1 > 0===1 > 1===0 > 0=0 > 1 > 0=1 > 1===0 > > > > >________________________________ > >This e-mail may contain confidential, personal and/or health >information(information which may be subject to legal restrictions on >use, retention and/or disclosure) for the sole use of the intended >recipient. Any review or distribution by anyone other than the person for >whom it was originally intended is strictly prohibited. If you have >received this e-mail in error, please contact the sender and delete all >copies.
ADD COMMENT

Login before adding your answer.

Traffic: 788 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6