Question: Inconsistent peak width with DiffBind "summit" function
0
gravatar for meisan406
24 months ago by
meisan4060
United States
meisan4060 wrote:

This is a question on the "summit" function in DiffBind.

I would like to generate a set of consensus peaks with width of 201bp, centered around the peak summit. The codes I used were:

dba.object <- dba( sampleSheet = metadata, peakCaller="bed", scoreCol=5, bRemoveM=TRUE, bLowerScoreBetter=TRUE, config=data.frame(bUsePval=TRUE), bCorPlot=FALSE)

count.data <- dba.count( dba.object, minOverlap=1, score=DBA_SCORE_READS, bCorPlot=FALSE, summits=100,
bUseSummarizeOverlaps=F )

 

 

To retrieve the count matrix, along with the peak coordinates, I used:

count.matrix <- dba.peakset(count.data, bRetrieve=T, DataType=DBA_DATA_FRAME)

 

However, I realized that there were a very small set of peaks larger than 201bp in the final consensus peak set (approximately 5% of the consensus peak set). I am unable to figure out why this would be so and would appreciate any assistance on this. 

My saved dba.object can be retrieved from here: https://www.dropbox.com/s/eh3qpanm0lgdn6f/dba_object_debug.rds?dl=0

The DiffBind version I used was 2.2.9

Thank you,

Mei San

 

sessionInfo()

R version 3.3.2 (2016-10-31)

Platform: x86_64-centos-linux-gnu (64-bit)

Running under: CentOS Linux 7 (Core)


locale:

[1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              

[3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    

[5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   

[7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 

[9] LC_ADDRESS=C               LC_TELEPHONE=C            

[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       


attached base packages:

[1] parallel  stats4    stats     graphics  grDevices utils     datasets

[8] methods   base     


other attached packages:

[1] DiffBind_2.2.9             SummarizedExperiment_1.4.0

[3] Biobase_2.34.0             GenomicRanges_1.26.2      

[5] GenomeInfoDb_1.10.2        IRanges_2.8.0             

[7] S4Vectors_0.14.4           BiocGenerics_0.22.0       


loaded via a namespace (and not attached):

[1] Rcpp_0.12.11             locfit_1.5-9.1           lattice_0.20-34         

[4] GO.db_3.4.0              Rsamtools_1.26.1         Biostrings_2.42.1       

[7] gtools_3.5.0             assertthat_0.2.0         digest_0.6.12           

[10] R6_2.2.2                 plyr_1.8.4               BatchJobs_1.6           

[13] backports_1.0.5          ShortRead_1.32.1         RSQLite_1.0.0           

[16] ggplot2_2.2.1            gplots_3.0.1             zlibbioc_1.20.0         

[19] rlang_0.1.1              GenomicFeatures_1.26.2   lazyeval_0.2.0          

[22] annotate_1.52.1          gdata_2.18.0             Matrix_1.2-7.1          

[25] checkmate_1.8.2          systemPipeR_1.8.1        GOstats_2.40.0          

[28] splines_3.3.2            BiocParallel_1.8.1       stringr_1.2.0           

[31] pheatmap_1.0.8           RCurl_1.95-4.8           biomaRt_2.30.0          

[34] munsell_0.4.3            sendmailR_1.2-1          rtracklayer_1.34.1      

[37] pkgconfig_2.0.1          base64enc_0.1-3          BBmisc_1.11             

[40] fail_1.3                 tibble_1.3.3             edgeR_3.16.5            

[43] XML_3.98-1.5             AnnotationForge_1.16.0   dplyr_0.7.1             

[46] GenomicAlignments_1.10.0 bitops_1.0-6             grid_3.3.2              

[49] RBGL_1.50.0              xtable_1.8-2             GSEABase_1.36.0         

[52] gtable_0.2.0             DBI_0.5-1                magrittr_1.5            

[55] scales_0.4.1             graph_1.52.0             KernSmooth_2.23-15      

[58] amap_0.8-14              stringi_1.1.5            XVector_0.14.0          

[61] hwriter_1.3.2            genefilter_1.56.0        bindrcpp_0.2            

[64] limma_3.30.8             latticeExtra_0.6-28      brew_1.0-6              

[67] rjson_0.2.15             RColorBrewer_1.1-2       tools_3.3.2             

[70] glue_1.1.1               Category_2.40.0          survival_2.39-5         

[73] AnnotationDbi_1.36.0     colorspace_1.3-2         caTools_1.17.1          

[76] bindr_0.1               
diffbind • 438 views
ADD COMMENTlink modified 24 months ago by Rory Stark2.9k • written 24 months ago by meisan4060
Answer: Inconsistent peak width with DiffBind "summit" function
0
gravatar for Rory Stark
24 months ago by
Rory Stark2.9k
CRUK, Cambridge, UK
Rory Stark2.9k wrote:

If two or more peaks overlap by at least one base after being re-centered around the summit, they will be merged into a single wider peak, consistent with how DiffBind generally handles overlapping peaks. That is why a subset of your peaks are wider than expected.

This issue has come up before, and I have always viewed it as a "feature". However upon further reflection I'm thinking it is actually a "bug", and intend to fix it in the next release. The new behavior will be to to check for merged peaks after re-centering (wider than expected), and re-center again over the highest summit, so the peak intervals are all uniform size.

-Rory

ADD COMMENTlink written 24 months ago by Rory Stark2.9k

Hi Rory,

Thank you for the clarification. I think the fix you are planning would fit in better with the goal of the "summit" function. I look forward to the new release.

Thanks,

Mei San

ADD REPLYlink written 24 months ago by meisan4060

Sometime, when we have some spare time (haha!) we should consider using PeakSplitter or similar to split distinct but nearby peaks, rather than discarding all but one.  It (probably) wouldn't be that hard to write a wrapper R package around PeakSplitter's C++ core... I'll see what Mali (the author) thinks of the idea.

 Cheers, - Gord

ADD REPLYlink written 24 months ago by Gord Brown590
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 190 users visited in the last hour