Question

Inconsistent peak width with DiffBind "summit" function

0

Entering edit mode

meisan406 • 0

@meisan406-7061

Last seen 8.1 years ago

United States

This is a question on the "summit" function in DiffBind.

I would like to generate a set of consensus peaks with width of 201bp, centered around the peak summit. The codes I used were:

dba.object <- dba( sampleSheet = metadata, peakCaller="bed", scoreCol=5, bRemoveM=TRUE, bLowerScoreBetter=TRUE, config=data.frame(bUsePval=TRUE), bCorPlot=FALSE)

count.data <- dba.count( dba.object, minOverlap=1, score=DBA_SCORE_READS, bCorPlot=FALSE, summits=100,
bUseSummarizeOverlaps=F )

To retrieve the count matrix, along with the peak coordinates, I used:

count.matrix <- dba.peakset(count.data, bRetrieve=T, DataType=DBA_DATA_FRAME)

However, I realized that there were a very small set of peaks larger than 201bp in the final consensus peak set (approximately 5% of the consensus peak set). I am unable to figure out why this would be so and would appreciate any assistance on this.

My saved dba.object can be retrieved from here: https://www.dropbox.com/s/eh3qpanm0lgdn6f/dba_object_debug.rds?dl=0

The DiffBind version I used was 2.2.9

Thank you,

Mei San

sessionInfo()

R version 3.3.2 (2016-10-31)

Platform: x86_64-centos-linux-gnu (64-bit)

Running under: CentOS Linux 7 (Core)


locale:

[1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              

[3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    

[5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   

[7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 

[9] LC_ADDRESS=C               LC_TELEPHONE=C            

[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       


attached base packages:

[1] parallel  stats4    stats     graphics  grDevices utils     datasets

[8] methods   base     


other attached packages:

[1] DiffBind_2.2.9             SummarizedExperiment_1.4.0

[3] Biobase_2.34.0             GenomicRanges_1.26.2      

[5] GenomeInfoDb_1.10.2        IRanges_2.8.0             

[7] S4Vectors_0.14.4           BiocGenerics_0.22.0       


loaded via a namespace (and not attached):

[1] Rcpp_0.12.11             locfit_1.5-9.1           lattice_0.20-34         

[4] GO.db_3.4.0              Rsamtools_1.26.1         Biostrings_2.42.1       

[7] gtools_3.5.0             assertthat_0.2.0         digest_0.6.12           

[10] R6_2.2.2                 plyr_1.8.4               BatchJobs_1.6           

[13] backports_1.0.5          ShortRead_1.32.1         RSQLite_1.0.0           

[16] ggplot2_2.2.1            gplots_3.0.1             zlibbioc_1.20.0         

[19] rlang_0.1.1              GenomicFeatures_1.26.2   lazyeval_0.2.0          

[22] annotate_1.52.1          gdata_2.18.0             Matrix_1.2-7.1          

[25] checkmate_1.8.2          systemPipeR_1.8.1        GOstats_2.40.0          

[28] splines_3.3.2            BiocParallel_1.8.1       stringr_1.2.0           

[31] pheatmap_1.0.8           RCurl_1.95-4.8           biomaRt_2.30.0          

[34] munsell_0.4.3            sendmailR_1.2-1          rtracklayer_1.34.1      

[37] pkgconfig_2.0.1          base64enc_0.1-3          BBmisc_1.11             

[40] fail_1.3                 tibble_1.3.3             edgeR_3.16.5            

[43] XML_3.98-1.5             AnnotationForge_1.16.0   dplyr_0.7.1             

[46] GenomicAlignments_1.10.0 bitops_1.0-6             grid_3.3.2              

[49] RBGL_1.50.0              xtable_1.8-2             GSEABase_1.36.0         

[52] gtable_0.2.0             DBI_0.5-1                magrittr_1.5            

[55] scales_0.4.1             graph_1.52.0             KernSmooth_2.23-15      

[58] amap_0.8-14              stringi_1.1.5            XVector_0.14.0          

[61] hwriter_1.3.2            genefilter_1.56.0        bindrcpp_0.2            

[64] limma_3.30.8             latticeExtra_0.6-28      brew_1.0-6              

[67] rjson_0.2.15             RColorBrewer_1.1-2       tools_3.3.2             

[70] glue_1.1.1               Category_2.40.0          survival_2.39-5         

[73] AnnotationDbi_1.36.0     colorspace_1.3-2         caTools_1.17.1          

[76] bindr_0.1

diffbind • 1.8k views

ADD COMMENT • link updated 8.1 years ago by Rory Stark ★ 5.2k • written 8.1 years ago by meisan406 • 0

score 0 · Answer 1 · 2017-11-27

0

Entering edit mode

Rory Stark ★ 5.2k

@rory-stark-5741

Last seen 11 months ago

Cambridge, UK

If two or more peaks overlap by at least one base after being re-centered around the summit, they will be merged into a single wider peak, consistent with how DiffBind generally handles overlapping peaks. That is why a subset of your peaks are wider than expected.

This issue has come up before, and I have always viewed it as a "feature". However upon further reflection I'm thinking it is actually a "bug", and intend to fix it in the next release. The new behavior will be to to check for merged peaks after re-centering (wider than expected), and re-center again over the highest summit, so the peak intervals are all uniform size.

-Rory

ADD COMMENT • link 8.1 years ago Rory Stark ★ 5.2k

0

Entering edit mode

Hi Rory,

Thank you for the clarification. I think the fix you are planning would fit in better with the goal of the "summit" function. I look forward to the new release.

Thanks,

Mei San

ADD REPLY • link 8.1 years ago meisan406 • 0

0

Entering edit mode

Sometime, when we have some spare time (haha!) we should consider using PeakSplitter or similar to split distinct but nearby peaks, rather than discarding all but one. It (probably) wouldn't be that hard to write a wrapper R package around PeakSplitter's C++ core... I'll see what Mali (the author) thinks of the idea.

Cheers, - Gord

ADD REPLY • link 8.1 years ago Gord Brown ▴ 670