ChipPeakAnno - slightly different results between makeVennDiagram and findPeakOverlaps

0

Entering edit mode

@antonio-miguel-de-jesus-domingues-5182

Last seen 14 months ago

Germany

I've been trying to generate a set of high-confidence peaks that are common to my ChIP-seq replicates using ChipPeakAnno. The issue I'm having is matching the number of overlaping peaks seen on the venn digram resulting from: makeVennDiagram(RangedDataList(peaks1,peaks2), NameOfPeaks=c("TF1","TF2"), totalTest=(Npeaks1 + Npeaks2), useFeature=FALSE, minoverlap = 100, select= "first") and the number of peaks ($MergedPeaks) from: findOverlappingPeaks(peaks1, peaks2, minoverlap = 100, select= "first", NameOfPeaks1="TF1", NameOfPeaks2="TF2") I believe the difference is because some of peaks 2 overlap more than peaks in peaks1. Comparing peaks2 vs peaks one does not solve the problem and select= "first" is already being used. Also the $MergedPeaks data that is outputted from makeVennDiagram does not match the number of overlaps: $MergedPeaks RangedData with 18650 rows and 0 value columns across 24 spaces [1] 19039 [1] 21061 $p.value [1] 0 $vennCounts Replicate1 Replicate2 Counts [1,] 0 0 17300 [2,] 0 1 3761 [3,] 1 0 1739 [4,] 1 1 17300 attr(,"class") [1] "VennCounts" I would like to understand from where does this difference arises so that I ultimately have consistent results in visual and table format. Cheers, António -- -- António Miguel de Jesus Domingues, PhD Neugebauer group Max Planck Institute of Molecular Cell Biology and Genetics, Dresden Pfotenhauerstrasse 108 01307 Dresden Germany e-mail: domingue@mpi-cbg.de tel. +49 351 210 2481 The Unbearable Lightness of Molecular Biology [[alternative HTML version deleted]]

Genetics ChIPpeakAnno Genetics ChIPpeakAnno • 1.4k views

ADD COMMENT • link updated 12.5 years ago by Ou, Jianhong ★ 1.3k • written 12.5 years ago by António Miguel de Jesus Domingues ▴ 510

0

Entering edit mode

Ou, Jianhong ★ 1.3k

@ou-jianhong-4539

Last seen 12 weeks ago

United States

Hi Antonio, May I know the version of ChipPeakAnno you are using? Yours sincerely, Jianhong Ou jianhong.ou at umassmed.edu On Oct 1, 2012, at 10:36 AM, Ant?nio Miguel de Jesus Domingues wrote: > I've been trying to generate a set of high-confidence peaks that are common > to my ChIP-seq replicates using ChipPeakAnno. The issue I'm having is > matching the number of overlaping peaks seen on the venn digram resulting > from: > makeVennDiagram(RangedDataList(peaks1,peaks2), NameOfPeaks=c("TF1","TF2"), > totalTest=(Npeaks1 + Npeaks2), useFeature=FALSE, minoverlap = 100, > select= "first") > > and the number of peaks ($MergedPeaks) from: > findOverlappingPeaks(peaks1, peaks2, minoverlap = 100, select= "first", > NameOfPeaks1="TF1", NameOfPeaks2="TF2") > > I believe the difference is because some of peaks 2 overlap more than peaks > in peaks1. Comparing peaks2 vs peaks one does not solve the problem and > select= "first" is already being used. Also the $MergedPeaks data that is > outputted from makeVennDiagram does not match the number of overlaps: > $MergedPeaks > RangedData with 18650 rows and 0 value columns across 24 spaces > > [1] 19039 > [1] 21061 > $p.value > [1] 0 > > $vennCounts > Replicate1 Replicate2 Counts > [1,] 0 0 17300 > [2,] 0 1 3761 > [3,] 1 0 1739 > [4,] 1 1 17300 > attr(,"class") > [1] "VennCounts" > > > I would like to understand from where does this difference arises so that I > ultimately have consistent results in visual and table format. > > Cheers, > Ant?nio > > > -- > -- > Ant?nio Miguel de Jesus Domingues, PhD > Neugebauer group > Max Planck Institute of Molecular Cell Biology and Genetics, Dresden > Pfotenhauerstrasse 108 > 01307 Dresden > Germany > > e-mail: domingue at mpi-cbg.de > tel. +49 351 210 2481 > The Unbearable Lightness of Molecular Biology > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 12.5 years ago Ou, Jianhong ★ 1.3k

0

Entering edit mode

My apologies Jianhong, I forgot to attach the session info. I am using ChIPpeakAnno_2.5.12 Just an extra information, using the example from the vignette, it does work as it should but that might be simply because the overlaps are more straightforward - that is, no peak in peaks1 overlap with more than one peak in peaks2 and vice-versa. sessionInfo() R version 2.15.1 (2012-06-22) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] grid grDevices datasets graphics utils stats methods [8] base other attached packages: [1] ChIPpeakAnno_2.5.12 limma_3.12.3 [3] org.Hs.eg.db_2.7.1 GO.db_2.7.1 [5] RSQLite_0.11.2 DBI_0.2-5 [7] AnnotationDbi_1.18.4 BSgenome.Ecoli.NCBI.20080805_1.3.17 [9] BSgenome_1.24.0 GenomicRanges_1.8.13 [11] Biostrings_2.24.1 IRanges_1.14.4 [13] multtest_2.12.0 biomaRt_2.12.0 [15] VennDiagram_1.5.1 ggplot2_0.9.2.1 [17] Biobase_2.16.0 BiocGenerics_0.2.0 loaded via a namespace (and not attached): [1] amap_0.8-7 colorspace_1.1-1 dichromat_1.2-4 DiffBind_1.2.4 [5] digest_0.5.2 edgeR_2.6.12 gdata_2.12.0 gplots_2.11.0 [9] gtable_0.1.1 gtools_2.7.0 labeling_0.1 MASS_7.3-21 [13] memoise_0.1 munsell_0.4 plyr_1.7.1 proto_0.3-9.2 [17] RColorBrewer_1.0-5 RCurl_1.91-1 reshape2_1.2.1 scales_0.2.2 [21] splines_2.15.1 stats4_2.15.1 stringr_0.6.1 survival_2.36-14 [25] tools_2.15.1 XML_3.9-4 zlibbioc_1.2.0 On 1 October 2012 16:54, Ou, Jianhong <jianhong.ou@umassmed.edu> wrote: > Hi Antonio, > > May I know the version of ChipPeakAnno you are using? > > Yours sincerely, > > Jianhong Ou > > jianhong.ou@umassmed.edu > > > On Oct 1, 2012, at 10:36 AM, António Miguel de Jesus Domingues wrote: > > > I've been trying to generate a set of high-confidence peaks that are > common > > to my ChIP-seq replicates using ChipPeakAnno. The issue I'm having is > > matching the number of overlaping peaks seen on the venn digram resulting > > from: > > makeVennDiagram(RangedDataList(peaks1,peaks2), > NameOfPeaks=c("TF1","TF2"), > > totalTest=(Npeaks1 + Npeaks2), useFeature=FALSE, minoverlap = 100, > > select= "first") > > > > and the number of peaks ($MergedPeaks) from: > > findOverlappingPeaks(peaks1, peaks2, minoverlap = 100, select= "first", > > NameOfPeaks1="TF1", NameOfPeaks2="TF2") > > > > I believe the difference is because some of peaks 2 overlap more than > peaks > > in peaks1. Comparing peaks2 vs peaks one does not solve the problem and > > select= "first" is already being used. Also the $MergedPeaks data that is > > outputted from makeVennDiagram does not match the number of overlaps: > > $MergedPeaks > > RangedData with 18650 rows and 0 value columns across 24 spaces > > > > [1] 19039 > > [1] 21061 > > $p.value > > [1] 0 > > > > $vennCounts > > Replicate1 Replicate2 Counts > > [1,] 0 0 17300 > > [2,] 0 1 3761 > > [3,] 1 0 1739 > > [4,] 1 1 17300 > > attr(,"class") > > [1] "VennCounts" > > > > > > I would like to understand from where does this difference arises so > that I > > ultimately have consistent results in visual and table format. > > > > Cheers, > > António > > > > > > -- > > -- > > António Miguel de Jesus Domingues, PhD > > Neugebauer group > > Max Planck Institute of Molecular Cell Biology and Genetics, Dresden > > Pfotenhauerstrasse 108 > > 01307 Dresden > > Germany > > > > e-mail: domingue@mpi-cbg.de > > tel. +49 351 210 2481 > > The Unbearable Lightness of Molecular Biology > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- -- António Miguel de Jesus Domingues, PhD Neugebauer group Max Planck Institute of Molecular Cell Biology and Genetics, Dresden Pfotenhauerstrasse 108 01307 Dresden Germany e-mail: domingue@mpi-cbg.de tel. +49 351 210 2481 The Unbearable Lightness of Molecular Biology [[alternative HTML version deleted]]

ADD REPLY • link 12.5 years ago António Miguel de Jesus Domingues ▴ 510

0

Entering edit mode

Hi Antonio, > I believe the difference is because some of peaks 2 overlap more than peaks > in peaks1. Yes, this is the reason why merged peaks from findOverlappingPeaks are different from the results makeVennDiagram. As you known, some of peaks2 may overlap more than one peaks in peaks1 and viceversa. In findOverlappingPeaks, you can get the MergedPeaks (merge overlapping peaks for peaks1 and peaks2), Peaks1withOverlaps and Peaks2withOverlaps. In makeVennDiagram, it will select the smaller one from Peaks1withOverlaps and Peaks2withOverlaps. Both of them will be no less than MergedPeaks because they will not merge the small overlapping peaks to a bigger peak. The more complicated condition is multiple peaks in peaks1 merged with multiple peaks in peaks2 into one big peak when we want to makeVennDiagram for three or more groups. I will appreciated if you send your data to me as training dataset for developing a new version of makeVennDiagram. Yours sincerely, Jianhong Ou jianhong.ou@umassmed.edu<mailto:jianhong.ou@umassmed.edu> On Oct 1, 2012, at 11:14 AM, António Miguel de Jesus Domingues wrote: My apologies Jianhong, I forgot to attach the session info. I am using ChIPpeakAnno_2.5.12 Just an extra information, using the example from the vignette, it does work as it should but that might be simply because the overlaps are more straightforward - that is, no peak in peaks1 overlap with more than one peak in peaks2 and vice-versa. sessionInfo() R version 2.15.1 (2012-06-22) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] grid grDevices datasets graphics utils stats methods [8] base other attached packages: [1] ChIPpeakAnno_2.5.12 limma_3.12.3 [3] org.Hs.eg.db_2.7.1 GO.db_2.7.1 [5] RSQLite_0.11.2 DBI_0.2-5 [7] AnnotationDbi_1.18.4 BSgenome.Ecoli.NCBI.20080805_1.3.17 [9] BSgenome_1.24.0 GenomicRanges_1.8.13 [11] Biostrings_2.24.1 IRanges_1.14.4 [13] multtest_2.12.0 biomaRt_2.12.0 [15] VennDiagram_1.5.1 ggplot2_0.9.2.1 [17] Biobase_2.16.0 BiocGenerics_0.2.0 loaded via a namespace (and not attached): [1] amap_0.8-7 colorspace_1.1-1 dichromat_1.2-4 DiffBind_1.2.4 [5] digest_0.5.2 edgeR_2.6.12 gdata_2.12.0 gplots_2.11.0 [9] gtable_0.1.1 gtools_2.7.0 labeling_0.1 MASS_7.3-21 [13] memoise_0.1 munsell_0.4 plyr_1.7.1 proto_0.3-9.2 [17] RColorBrewer_1.0-5 RCurl_1.91-1 reshape2_1.2.1 scales_0.2.2 [21] splines_2.15.1 stats4_2.15.1 stringr_0.6.1 survival_2.36-14 [25] tools_2.15.1 XML_3.9-4 zlibbioc_1.2.0 On 1 October 2012 16:54, Ou, Jianhong <jianhong.ou@umassmed.edu<mailto:jianhong.ou@umassmed.edu>> wrote: Hi Antonio, May I know the version of ChipPeakAnno you are using? Yours sincerely, Jianhong Ou jianhong.ou@umassmed.edu<mailto:jianhong.ou@umassmed.edu> On Oct 1, 2012, at 10:36 AM, António Miguel de Jesus Domingues wrote: > I've been trying to generate a set of high-confidence peaks that are common > to my ChIP-seq replicates using ChipPeakAnno. The issue I'm having is > matching the number of overlaping peaks seen on the venn digram resulting > from: > makeVennDiagram(RangedDataList(peaks1,peaks2), NameOfPeaks=c("TF1","TF2"), > totalTest=(Npeaks1 + Npeaks2), useFeature=FALSE, minoverlap = 100, > select= "first") > > and the number of peaks ($MergedPeaks) from: > findOverlappingPeaks(peaks1, peaks2, minoverlap = 100, select= "first", > NameOfPeaks1="TF1", NameOfPeaks2="TF2") > > I believe the difference is because some of peaks 2 overlap more than peaks > in peaks1. Comparing peaks2 vs peaks one does not solve the problem and > select= "first" is already being used. Also the $MergedPeaks data that is > outputted from makeVennDiagram does not match the number of overlaps: > $MergedPeaks > RangedData with 18650 rows and 0 value columns across 24 spaces > > [1] 19039 > [1] 21061 > $p.value > [1] 0 > > $vennCounts > Replicate1 Replicate2 Counts > [1,] 0 0 17300 > [2,] 0 1 3761 > [3,] 1 0 1739 > [4,] 1 1 17300 > attr(,"class") > [1] "VennCounts" > > > I would like to understand from where does this difference arises so that I > ultimately have consistent results in visual and table format. > > Cheers, > António > > > -- > -- > António Miguel de Jesus Domingues, PhD > Neugebauer group > Max Planck Institute of Molecular Cell Biology and Genetics, Dresden > Pfotenhauerstrasse 108 > 01307 Dresden > Germany > > e-mail: domingue@mpi-cbg.de<mailto:domingue@mpi-cbg.de> > tel. +49 351 210 2481<tel:%2b49%20351%20210%202481> > The Unbearable Lightness of Molecular Biology > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org<mailto:bioconductor@r-project.org> > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- -- António Miguel de Jesus Domingues, PhD Neugebauer group Max Planck Institute of Molecular Cell Biology and Genetics, Dresden Pfotenhauerstrasse 108 01307 Dresden Germany e-mail: domingue@mpi-cbg.de<mailto:domingue@mpi-cbg.de> tel. +49 351 210 2481 The Unbearable Lightness of Molecular Biology [[alternative HTML version deleted]]

ADD REPLY • link 12.5 years ago Ou, Jianhong ★ 1.3k

Login before adding your answer.