Hi DiffBind Team,
I've been looking to use your package to analyse some ChIP seq data and have been following the vignette for guidance.
My data has no replicates and I'm fully aware that this is not a good study design, but it's the data we have to work with. I have been working with two peak callers (MACS and SPP) and was hoping to treat these two sets of peaks as "replicates" in DiffBind. I'm convinced I've read a suggestion for this in the past but can't find the statement anywhere in the DiffBind documentation.
I've set up the sample sheet as follows, using the same Sample ID/Bam file but different peak callers:
SampleID,Tissue,Factor,Treatment,Replicate,bamReads,ControlID,bamControl,Peaks,PeakCaller
WT_HU,WT,Rif1,HU,1,../2016_020B_3_WT_HU_sorted_alignment.bam,WT_HU_1,../2016_020B_1_WT_HU_sorted_alignment.bam,../MACS2/WT_HU_peaks.narrowPeak,macs
MT_HU,MT,Rif1,HU,1,../2016_020B_4_MT_HU_sorted_alignment.bam,MT_HU_2,../2016_020B_2_MT_HU_sorted_alignment.bam,../MACS2/MT_HU_peaks.narrowPeak,macs
WT_G1,WT,Rif1,G1,1,../2016_024B_3_WT_G1_sorted_alignment.bam,WT_G1_1,../2016_024B_1_WT_G1_sorted_alignment.bam,../MACS2/WT_G1_peaks.narrowPeak,macs
MT_G1,MT,Rif1,G1,1,../2016_024B_4_MT_G1_sorted_alignment.bam,MT_G1_2,../2016_024B_2_MT_G1_sorted_alignment.bam,../MACS2/MT_G1_peaks.narrowPeak,macs
WT_S60,WT,Rif1,S60,1,../2017_006B_S1_WT_S60_sorted_alignment.bam,WT_S60_1,../2017_006B_1_WT_S60_sorted_alignment.bam,../MACS2/WT_S60_peaks.narrowPeak,macs
MT_S60,MT,Rif1,S60,1,../2017_006B_S5_MT_S60_sorted_alignment.bam,MT_S60_5,../2017_006B_5_MT_S60_sorted_alignment.bam,../MACS2/MT_S60_peaks.narrowPeak,macs
WT_S90,WT,Rif1,S90,1,../2017_006B_S4_WT_S90_sorted_alignment.bam,WT_S90_4,../2017_006B_4_WT_S90_sorted_alignment.bam,../MACS2/WT_S90_peaks.narrowPeak,macs
MT_S90,MT,Rif1,S90,1,../2017_006B_S8_MT_S90_sorted_alignment.bam,MT_S90_8,../2017_006B_8_MT_S90_sorted_alignment.bam,../MACS2/MT_S90_peaks.narrowPeak,macs
WT_HU,WT,Rif1,HU,2,../2016_020B_3_WT_HU_sorted_alignment.bam,WT_HU_1,../2016_020B_1_WT_HU_sorted_alignment.bam,../SPP_Peaks/2016_020B_1_WT_HU_sorted_alignment_VS_2016_020B_3_WT_HU_sorted_alignment.narrowPeak,narrow
MT_HU,MT,Rif1,HU,2,../2016_020B_4_MT_HU_sorted_alignment.bam,MT_HU_2,../2016_020B_2_MT_HU_sorted_alignment.bam,../SPP_Peaks/2016_020B_2_MT_HU_sorted_alignment_VS_2016_020B_4_MT_HU_sorted_alignment.narrowPeak,narrow
WT_G1,WT,Rif1,G1,2,../2016_024B_3_WT_G1_sorted_alignment.bam,WT_G1_1,../2016_024B_1_WT_G1_sorted_alignment.bam,../SPP_Peaks/2016_024B_1_WT_G1_sorted_alignment_VS_2016_024B_3_WT_G1_sorted_alignment.narrowPeak,narrow
MT_G1,MT,Rif1,G1,2,../2016_024B_4_MT_G1_sorted_alignment.bam,MT_G1_2,../2016_024B_2_MT_G1_sorted_alignment.bam,../SPP_Peaks/2016_024B_2_MT_G1_sorted_alignment_VS_2016_024B_4_MT_G1_sorted_alignment.narrowPeak,narrow
WT_S60,WT,Rif1,S60,2,../2017_006B_S1_WT_S60_sorted_alignment.bam,WT_S60_1,../2017_006B_1_WT_S60_sorted_alignment.bam,../SPP_Peaks/2017_006B_1_WT_S60_sorted_alignment_VS_2017_006B_S1_WT_S60_sorted_alignment.narrowPeak,narrow
MT_S60,MT,Rif1,S60,2,../2017_006B_S5_MT_S60_sorted_alignment.bam,MT_S60_5,../2017_006B_5_MT_S60_sorted_alignment.bam,../SPP_Peaks/2017_006B_5_MT_S60_sorted_alignment_VS_2017_006B_S5_MT_S60_sorted_alignment.narrowPeak,narrow
WT_S90,WT,Rif1,S90,2,../2017_006B_S4_WT_S90_sorted_alignment.bam,WT_S90_4,../2017_006B_4_WT_S90_sorted_alignment.bam,../SPP_Peaks/2017_006B_4_WT_S90_sorted_alignment_VS_2017_006B_S4_WT_S90_sorted_alignment.narrowPeak,narrow
MT_S90,MT,Rif1,S90,2,../2017_006B_S8_MT_S90_sorted_alignment.bam,MT_S90_8,../2017_006B_8_MT_S90_sorted_alignment.bam,../SPP_Peaks/2017_006B_8_MT_S90_sorted_alignment_VS_2017_006B_S8_MT_S90_sorted_alignment.narrowPeak,narrow
Then ran:
data<- dba(sampleSheet = samples)
Giving:
data 16 Samples, 1079 sites in matrix (1896 total): ID Tissue Factor Treatment Replicate Caller Intervals 1 WT_HU WT Rif1 HU 1 macs 1055 2 MT_HU MT Rif1 HU 1 macs 1461 3 WT_G1 WT Rif1 G1 1 macs 631 4 MT_G1 MT Rif1 G1 1 macs 772 5 WT_S60 WT Rif1 S60 1 macs 289 6 MT_S60 MT Rif1 S60 1 macs 371 7 WT_S90 WT Rif1 S90 1 macs 443 8 MT_S90 MT Rif1 S90 1 macs 517 9 WT_HU WT Rif1 HU 2 narrow 215 10 MT_HU MT Rif1 HU 2 narrow 515 11 WT_G1 WT Rif1 G1 2 narrow 106 12 MT_G1 MT Rif1 G1 2 narrow 171 13 WT_S60 WT Rif1 S60 2 narrow 116 14 MT_S60 MT Rif1 S60 2 narrow 48 15 WT_S90 WT Rif1 S90 2 narrow 115 16 MT_S90 MT Rif1 S90 2 narrow 57
After running the dba.count function, I'm losing all of the information for Replicate 2?
data<- dba.count(data, summits=T) data 8 Samples, 1079 sites in matrix: ID Tissue Factor Treatment Replicate Caller Intervals FRiP 1 WT_HU WT Rif1 HU 1 counts 1079 0.11 2 MT_HU MT Rif1 HU 1 counts 1079 0.11 3 WT_G1 WT Rif1 G1 1 counts 1079 0.11 4 MT_G1 MT Rif1 G1 1 counts 1079 0.11 5 WT_S60 WT Rif1 S60 1 counts 1079 0.11 6 MT_S60 MT Rif1 S60 1 counts 1079 0.11 7 WT_S90 WT Rif1 S90 1 counts 1079 0.11 8 MT_S90 MT Rif1 S90 1 counts 1079 0.12
This isn't the case in the vignette? After dba.count using the tamoxifen data the replicate information is kept? I've tried doing this giving each of my samples different IDs (suffixed _m or _s) and I'm still seeing the same, with only replicate one being kept.
Why is this the case and should I be approaching this differently? Happy to send over any files if needed.
Thanks!
Sophie
FYI:
sessionInfo() R version 3.3.2 (2016-10-31) Platform: x86_64-apple-darwin13.4.0 (64-bit) Running under: OS X El Capitan 10.11.6 locale: [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8 attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods [9] base other attached packages: [1] DiffBind_2.2.12 SummarizedExperiment_1.4.0 Biobase_2.34.0 [4] GenomicRanges_1.26.4 GenomeInfoDb_1.10.3 IRanges_2.8.2 [7] S4Vectors_0.12.2 BiocGenerics_0.20.0 loaded via a namespace (and not attached): [1] Category_2.40.0 bitops_1.0-6 bit64_0.9-7 [4] RColorBrewer_1.1-2 tools_3.3.2 backports_1.1.0 [7] R6_2.2.2 rpart_4.1-11 KernSmooth_2.23-15 [10] Hmisc_4.0-3 DBI_0.7 lazyeval_0.2.0 [13] colorspace_1.3-2 nnet_7.3-12 gridExtra_2.2.1 [16] DESeq2_1.14.1 bit_1.1-12 sendmailR_1.2-1 [19] graph_1.52.0 htmlTable_1.9 rtracklayer_1.34.2 [22] caTools_1.17.1 scales_0.4.1 checkmate_1.8.2 [25] BatchJobs_1.6 genefilter_1.56.0 RBGL_1.50.0 [28] stringr_1.2.0 digest_0.6.12 Rsamtools_1.26.2 [31] foreign_0.8-69 AnnotationForge_1.16.1 XVector_0.14.1 [34] htmltools_0.3.6 base64enc_0.1-3 pkgconfig_2.0.1 [37] limma_3.30.13 htmlwidgets_0.8 rlang_0.1.1 [40] RSQLite_2.0 BBmisc_1.11 bindr_0.1 [43] GOstats_2.40.0 hwriter_1.3.2 BiocParallel_1.8.2 [46] gtools_3.5.0 acepack_1.4.1 dplyr_0.7.1 [49] RCurl_1.95-4.8 magrittr_1.5 GO.db_3.4.0 [52] Formula_1.2-1 Matrix_1.2-10 Rcpp_0.12.11 [55] munsell_0.4.3 stringi_1.1.5 edgeR_3.16.5 [58] zlibbioc_1.20.0 gplots_3.0.1 fail_1.3 [61] plyr_1.8.4 grid_3.3.2 blob_1.1.0 [64] gdata_2.18.0 lattice_0.20-35 Biostrings_2.42.1 [67] splines_3.3.2 GenomicFeatures_1.26.4 annotate_1.52.1 [70] locfit_1.5-9.1 knitr_1.16 rjson_0.2.15 [73] systemPipeR_1.8.1 geneplotter_1.52.0 biomaRt_2.30.0 [76] XML_3.98-1.9 glue_1.1.1 ShortRead_1.32.1 [79] latticeExtra_0.6-28 data.table_1.10.4 gtable_0.2.0 [82] amap_0.8-14 assertthat_0.2.0 ggplot2_2.2.1 [85] xtable_1.8-2 survival_2.41-3 tibble_1.3.3 [88] pheatmap_1.0.8 GenomicAlignments_1.10.1 AnnotationDbi_1.36.2 [91] memoise_1.1.0 bindrcpp_0.2 cluster_2.0.6 [94] brew_1.0-6 GSEABase_1.36.0
Thanks very much Rory for such a quick response - makes perfect sense!