Question

Low Differentially Bound Sites found with DiffBind

0

Entering edit mode

gannoncottone • 0

@73c917c6

Last seen 3.4 years ago

United States

Hello,

I am currently running an analysis using DiffBind on Cut&Run data to identify differentially expressed sites. On several occasions, after running DiffBind with default parameters as outlined in the vignette we are getting very low differential peak output from both edgeR and DESeq2 implementations. What could account for not seeing any significantly differentially bound sites?

Below is the output of my command session running DiffBind:

> library(DiffBind)
LTxding required package: GenomicRanges
LTxding required package: stats4
LTxding required package: BiocGenerics

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, basename, cbind, colnames, dirname, do.call, duplicated,
    eval, evalq, Filter, Find, get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank, rbind, Reduce,
    rownames, sapply, setdiff, sort, table, tapply, union, unique, unsplit, which.max, which.min

LTxding required package: S4Vectors

Attaching package: ‘S4Vectors’

The following objects are masked from ‘package:base’:

    expand.grid, I, unname

LTxding required package: IRanges
LTxding required package: GenomeInfoDb
LTxding required package: SummarizedExperiment
LTxding required package: MatrixGenerics
LTxding required package: matrixStats

Attaching package: ‘MatrixGenerics’

The following objects are masked from ‘package:matrixStats’:

    colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse, colCounts, colCummaxs, colCummins,
    colCumprods, colCumsums, colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs, colMads,
    colMaxs, colMeans2, colMedians, colMins, colOrderStats, colProds, colQuantiles, colRanges,
    colRanks, colSdDiffs, colSds, colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
    colWeightedMeans, colWeightedMedians, colWeightedSds, colWeightedVars, rowAlls, rowAnyNAs,
    rowAnys, rowAvgsPerColSet, rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
    rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps, rowMadDiffs, rowMads, rowMaxs,
    rowMeans2, rowMedians, rowMins, rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
    rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars, rowWeightedMads,
    rowWeightedMeans, rowWeightedMedians, rowWeightedSds, rowWeightedVars

LTxding required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with 'browseVignettes()'. To cite Bioconductor,
    see 'citation("Biobase")', and for packages 'citation("pkgname")'.


Attaching package: ‘Biobase’

The following object is masked from ‘package:MatrixGenerics’:

    rowMedians

The following objects are masked from ‘package:matrixStats’:

    anyMissing, rowMedians

 >>> DiffBind 3.4.11

> library(tidyverse)
── Attaching packages ────────────────────────────────────────────────────────────────────── tidyverse 1.3.1 ──
✔ ggplot2 3.3.6     ✔ purrr   0.3.4
✔ tibble  3.1.6     ✔ dplyr   1.0.9
✔ tidyr   1.2.0     ✔ stringr 1.4.0
✔ readr   2.1.2     ✔ forcats 0.5.1
── Conflicts ───────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::collapse()   masks IRanges::collapse()
✖ dplyr::combine()    masks Biobase::combine(), BiocGenerics::combine()
✖ dplyr::count()      masks matrixStats::count()
✖ dplyr::desc()       masks IRanges::desc()
✖ tidyr::expand()     masks S4Vectors::expand()
✖ dplyr::filter()     masks stats::filter()
✖ dplyr::first()      masks S4Vectors::first()
✖ dplyr::lag()        masks stats::lag()
✖ ggplot2::Position() masks BiocGenerics::Position(), base::Position()
✖ purrr::reduce()     masks GenomicRanges::reduce(), IRanges::reduce()
✖ dplyr::rename()     masks S4Vectors::rename()
✖ dplyr::slice()      masks IRanges::slice()
> library(rtracklayer)

> #Check column names______________
> names(h3k27me3broad_unique)
 [1] "SampleID"   "Tissue"     "Factor"     "Condition"  "Treatment"  "Replicate"  "bamReads"   "ControlID" 
 [9] "bamControl" "Peaks"      "PeakCaller"

> #Read in peaksets______________
> h3k27me3broad_unique <- dba(sampleSheet=h3k27me3broad_unique)
veh.rep1 cellLine H3K27me3 vehicle Control 1 narrow
veh.rep2 cellLine H3K27me3 vehicle Control 2 narrow
veh.rep3 cellLine H3K27me3 vehicle Control 3 narrow
Tx.rep1 cellLine H3K27me3 Tx treatment 1 narrow
Tx.rep2 cellLine H3K27me3 Tx treatment 2 narrow
Tx.rep3 cellLine H3K27me3 Tx treatment 3 narrow

> #Check that peaks were identified. # of matrix sites should be > 0 and Intervals column gernerated
> h3k27me3broad_unique
6 Samples, 32872 sites in matrix (98986 total):
           ID Tissue   Factor Condition Treatment Replicate Intervals
1 veh.rep1 cellLine H3K27me3   vehicle       Control         1     19001
2 veh.rep2 cellLine H3K27me3   vehicle       Control         2     47836
3 veh.rep3 cellLine H3K27me3   vehicle       Control         3     29014
4  Tx.rep1 cellLine H3K27me3        Tx     treatment         1     32643
5  Tx.rep2 cellLine H3K27me3        Tx     treatment         2     25650
6  Tx.rep3 cellLine H3K27me3        Tx     treatment         3      9167

> #Compute count information for each region (Affinity binding matrix):______________
> #bUseSummarizeOverlaps: to use a more standard counting procedure than the built-in one by default.
> h3k27me3broad_unique <- dba.count(h3k27me3broad_unique, bUseSummarizeOverlaps=TRUE)
Computing summits...
Re-centering peaks...


> #FRiP value:______________
> h3k27me3broad_unique
6 Samples, 30833 sites in matrix:
           ID Tissue   Factor Condition Treatment Replicate    Reads FRiP
1 veh.rep1 cellLine H3K27me3   vehicle       Control         1 11137455 0.05
2 veh.rep2 cellLine H3K27me3   vehicle       Control         2 11515996 0.06
3 veh.rep3 cellLine H3K27me3   vehicle       Control         3 10348575 0.06
4  Tx.rep1 cellLine H3K27me3        Tx     treatment         1 11752746 0.04
5  Tx.rep2 cellLine H3K27me3        Tx     treatment         2 10714697 0.05
6  Tx.rep3 cellLine H3K27me3        Tx     treatment         3  6604986 0.05

> #Establish comparisons to make (Contrast):______________
> #categories changed based on which column of metadata will be "compared:
> h3k27me3broad_unique <- dba.contrast(h3k27me3broad_unique,categories=DBA_CONDITION)
Computing results names...

> #Below prevents BPPARAM error during dba.analyze______________
> library(BiocParallel)
> register(SerialParam())
'BiocParallel' did not register default BiocParallelParam:
  missing value where TRUE/FALSE needed

> #Differential Enrichment Analysis:______________
> h3k27me3broad_unique <- dba.analyze(h3k27me3broad_unique, method=DBA_ALL_METHODS)
Applying Blacklist/Greylists...
Genome detected: Hsapiens.UCSC.hg38
Applying blacklist...
Removed: 45 of 30833 intervals.
Counting control reads for greylist...
Building greylist: /projects/b1122/Mariana/RAW_data/Cut_and_Run/data/bams/2-NE_S10.bam
coverage: 10942705 bp (0.35%)
Master greylist: 596 ranges, 10942705 bases
Removed: 80 of 30788 intervals.
Removed 125 (of 30833) consensus peaks.
Normalize edgeR with defaults...
Normalize DESeq2 with defaults...
Analyzing...

warning: solve(): system is singular (rcond: 5.44457e-17); attempting approx solution

warning: solve(): system is singular (rcond: 6.74445e-17); attempting approx solution

warning: solve(): system is singular (rcond: 8.3574e-17); attempting approx solution

warning: solve(): system is singular (rcond: 1.22132e-16); attempting approx solution
> #Summary of results for each tool (edgeR often is more stringent --> fewer peaks)______________

> dba.show(h3k27me3broad_unique, bContrasts=T)
     Factor Group Samples  Group2 Samples2 DB.edgeR DB.DESeq2 
1 Condition    Tx       3 vehicle        3        0         1


sessionInfo( )

R version 4.1.1 (2021-08-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server 7.5 (Maipo)

Matrix products: default
BLAS:   /hpc/software/R/4.1.1/lib64/R/lib/libRblas.so
LAPACK: /hpc/software/R/4.1.1/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] Rsubread_2.8.2              mixsqp_0.3-47               rtracklayer_1.54.0         
 [4] forcats_0.5.1               stringr_1.4.0               dplyr_1.0.9                
 [7] purrr_0.3.4                 readr_2.1.2                 tidyr_1.2.0                
[10] tibble_3.1.6                ggplot2_3.3.6               tidyverse_1.3.1            
[13] DiffBind_3.4.11             SummarizedExperiment_1.24.0 Biobase_2.54.0             
[16] MatrixGenerics_1.6.0        matrixStats_0.62.0          GenomicRanges_1.46.1       
[19] GenomeInfoDb_1.30.1         IRanges_2.28.0              S4Vectors_0.32.4           
[22] BiocGenerics_0.40.0        

loaded via a namespace (and not attached):
  [1] amap_0.8-18              colorspace_2.0-2         rjson_0.2.21            
  [4] deldir_1.0-6             hwriter_1.3.2.1          ellipsis_0.3.2          
  [7] XVector_0.34.0           fs_1.5.2                 rstudioapi_0.13         
 [10] ggrepel_0.9.1            bit64_4.0.5              lubridate_1.8.0         
 [13] AnnotationDbi_1.56.2     fansi_1.0.2              mvtnorm_1.1-3           
 [16] apeglm_1.16.0            xml2_1.3.3               splines_4.1.1           
 [19] cachem_1.0.6             geneplotter_1.72.0       jsonlite_1.8.0          
 [22] Rsamtools_2.10.0         broom_1.0.0              annotate_1.72.0         
 [25] dbplyr_2.2.1             ashr_2.2-54              png_0.1-7               
 [28] GreyListChIP_1.26.0      compiler_4.1.1           httr_1.4.3              
 [31] backports_1.4.1          assertthat_0.2.1         Matrix_1.3-4            
 [34] fastmap_1.1.0            limma_3.50.3             cli_3.3.0               
 [37] htmltools_0.5.2          tools_4.1.1              coda_0.19-4             
 [40] gtable_0.3.0             glue_1.6.2               GenomeInfoDbData_1.2.7  
 [43] systemPipeR_2.0.8        ShortRead_1.52.0         Rcpp_1.0.8.3            
 [46] bbmle_1.0.25             cellranger_1.1.0         vctrs_0.4.1             
 [49] Biostrings_2.62.0        rvest_1.0.2              lifecycle_1.0.1         
 [52] irlba_2.3.5              restfulr_0.0.15          gtools_3.9.2.2          
 [55] XML_3.99-0.10            zlibbioc_1.40.0          MASS_7.3-54             
 [58] scales_1.1.1             BSgenome_1.62.0          hms_1.1.1               
 [61] parallel_4.1.1           RColorBrewer_1.1-2       yaml_2.3.5              
 [64] memoise_2.0.1            emdbook_1.3.12           bdsmatrix_1.3-6         
 [67] latticeExtra_0.6-30      stringi_1.7.6            RSQLite_2.2.14          
 [70] SQUAREM_2021.1           genefilter_1.76.0        BiocIO_1.4.0            
 [73] caTools_1.18.2           BiocParallel_1.28.3      truncnorm_1.0-8         
 [76] rlang_1.0.3              pkgconfig_2.0.3          bitops_1.0-7            
 [79] lattice_0.20-44          invgamma_1.1             GenomicAlignments_1.30.0
 [82] htmlwidgets_1.5.4        bit_4.0.4                tidyselect_1.1.2        
 [85] plyr_1.8.7               magrittr_2.0.3           DESeq2_1.34.0           
 [88] R6_2.5.1                 gplots_3.1.3             generics_0.1.2          
 [91] DelayedArray_0.20.0      DBI_1.1.3                withr_2.5.0             
 [94] haven_2.5.0              pillar_1.7.0             survival_3.2-11         
 [97] KEGGREST_1.34.0          RCurl_1.98-1.7           modelr_0.1.8            
[100] crayon_1.4.2             interp_1.1-2             KernSmooth_2.23-20      
[103] utf8_1.2.2               tzdb_0.3.0               jpeg_0.1-9              
[106] readxl_1.4.0             locfit_1.5-9.5           grid_4.1.1              
[109] blob_1.2.3               reprex_2.0.1             digest_0.6.29           
[112] xtable_1.8-4             numDeriv_2016.8-1.1      munsell_0.5.0

Cut&Run DiffBind • 1.4k views

ADD COMMENT • link 3.5 years ago gannoncottone • 0

score 1 · Answer 1 · 2022-07-06

1

Entering edit mode

Rory Stark ★ 5.2k

@rory-stark-5741

Last seen 12 months ago

Cambridge, UK

The script looks fine for a basic analysis.

There are a number of possible reasons you may not be getting any significantly differentially bound sites, including:

There is a batch effect that needs to be modelled
There are not enough replicates to capture the variance between experiments, especially if the variance between replicates is greater than the biological variance between sample groups
There are one or more "failed" samples. (I notice that general your FRiP numbers are on the low side for good quality ChIPs of H3K27me3)
The data are over-normalized, dampening the biological signal (I don't think that is the case here)
There actually are aren't significant biological differences in H3K27me3 levels between the sample groups

I would suggest doing some more exploratory plots of the data:

MA plots would show systematic differences (or lack thereof) between the sample groups
Venn diagrams would show agreement between sample replicates
Boxplots would show variance between replicates in each sample group
PCA (and correlation heatmaps) would show batch effects

ADD COMMENT • link 3.5 years ago Rory Stark ★ 5.2k

0

Entering edit mode

I will look into plots of the data more thoroughly and see if any of the trends fit the reasons for low differential binding sites! Thank you for your insight.

ADD REPLY • link 3.5 years ago gannoncottone • 0

0

Entering edit mode

Dr. Stark, thank you for your insight several days ago! I went back to check the quality of our Cut&Run peaks (called with MACS2) using ChIPQC as well as ran a number of the exploratory plots you suggested in the original post. We are specifically interested in looking at the broad peak file generated from MACS2. Additionally, we went back and checked our library qualities as well as bam mapping QC, both of which appeared normal. Below are the results from ChIPQC & exploratory plots:

Results from H3K27 ChIPQC - showing low RiP%. Notably, in our other samples looking at H3K4, RiP% was significantly higher (in the 50%-60% range)
PCA Results based on condition & replicates
dba.show results after dba.count. Note: I set summits to fit the mean peak size, 442. Using default parameters for dba.count yielded only 1 differential site. Could this be because for H3K27 methylation sites are more broad?
MA plot comparing treatment and control. Based on this plot, how would one determine if the result due to technical variance or biological variance?

I appreciate any insight you may have and can also provide more information as needed! My session info is attached below. Also, I apologize for having to input data from the command line as an image. The code on the comment box would not format it properly so it would be readable.

> sessionInfo()

R version 4.1.1 (2021-08-10) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Red Hat Enterprise Linux Server 7.5 (Maipo)

Matrix products: default BLAS: /hpc/software/R/4.1.1/lib64/R/lib/libRblas.so LAPACK: /hpc/software/R/4.1.1/lib64/R/lib/libRlapack.so

locale: 1 LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
5 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages: 1 stats4 stats graphics grDevices utils datasets methods base

other attached packages: 1 BSgenome.Hsapiens.UCSC.hg38_1.4.4 BiocManager_1.30.18 GreyListChIP_1.26.0
4 BSgenome_1.62.0 Biostrings_2.62.0 XVector_0.34.0
[7] edgeR_3.36.0 limma_3.50.3 rtracklayer_1.54.0
[10] forcats_0.5.1 stringr_1.4.0 dplyr_1.0.9
[13] purrr_0.3.4 readr_2.1.2 tidyr_1.2.0
[16] tibble_3.1.6 tidyverse_1.3.1 BiocParallel_1.28.3
[19] TxDb.Hsapiens.UCSC.hg38.knownGene_3.14.0 GenomicFeatures_1.46.5 AnnotationDbi_1.56.2
[22] ChIPQC_1.30.0 DiffBind_3.4.11 SummarizedExperiment_1.24.0
[25] Biobase_2.54.0 MatrixGenerics_1.6.0 matrixStats_0.62.0
[28] GenomicRanges_1.46.1 GenomeInfoDb_1.30.1 IRanges_2.28.0
[31] S4Vectors_0.32.4 BiocGenerics_0.40.0 ggplot2_3.3.6

loaded via a namespace (and not attached): 1 readxl_1.4.0 backports_1.4.1
3 BiocFileCache_2.2.1 plyr_1.8.7
5 splines_4.1.1 amap_0.8-18
[7] digest_0.6.29 invgamma_1.1
[9] htmltools_0.5.2 SQUAREM_2021.1
[11] fansi_1.0.2 magrittr_2.0.3
[13] memoise_2.0.1 tzdb_0.3.0
[15] annotate_1.72.0 Nozzle.R1_1.1-1.1
[17] modelr_0.1.8 systemPipeR_2.0.8
[19] bdsmatrix_1.3-6 prettyunits_1.1.1
[21] jpeg_0.1-9 colorspace_2.0-2
[23] rvest_1.0.2 blob_1.2.3
[25] rappdirs_0.3.3 apeglm_1.16.0
[27] ggrepel_0.9.1 haven_2.5.0
[29] jsonlite_1.8.0 crayon_1.4.2
[31] RCurl_1.98-1.7 TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
[33] chipseq_1.44.0 genefilter_1.76.0
[35] survival_3.2-11 glue_1.6.2
[37] gtable_0.3.0 zlibbioc_1.40.0
[39] DelayedArray_0.20.0 scales_1.1.1
[41] mvtnorm_1.1-3 DBI_1.1.3
[43] TxDb.Mmusculus.UCSC.mm9.knownGene_3.2.2 Rcpp_1.0.8.3
[45] xtable_1.8-4 progress_1.2.2
[47] emdbook_1.3.12 bit_4.0.4
[49] truncnorm_1.0-8 htmlwidgets_1.5.4
[51] httr_1.4.3 gplots_3.1.3
[53] RColorBrewer_1.1-2 ellipsis_0.3.2
[55] pkgconfig_2.0.3 XML_3.99-0.10
[57] dbplyr_2.2.1 deldir_1.0-6
[59] locfit_1.5-9.5 utf8_1.2.2
[61] tidyselect_1.1.2 rlang_1.0.3
[63] reshape2_1.4.4 cellranger_1.1.0
[65] munsell_0.5.0 tools_4.1.1
[67] cachem_1.0.6 cli_3.3.0
[69] generics_0.1.3 RSQLite_2.2.14
[71] broom_1.0.0 fastmap_1.1.0
[73] yaml_2.3.5 fs_1.5.2
[75] bit64_4.0.5 caTools_1.18.2
[77] KEGGREST_1.34.0 TxDb.Rnorvegicus.UCSC.rn4.ensGene_3.2.2
[79] xml2_1.3.3 biomaRt_2.50.3
[81] compiler_4.1.1 rstudioapi_0.13
[83] filelock_1.0.2 curl_4.3.2
[85] png_0.1-7 reprex_2.0.1
[87] geneplotter_1.72.0 stringi_1.7.6
[89] TxDb.Hsapiens.UCSC.hg18.knownGene_3.2.2 lattice_0.20-44
[91] Matrix_1.3-4 vctrs_0.4.1
[93] pillar_1.7.0 lifecycle_1.0.1
[95] bitops_1.0-7 irlba_2.3.5
[97] TxDb.Mmusculus.UCSC.mm10.knownGene_3.10.0 TxDb.Celegans.UCSC.ce6.ensGene_3.2.2
[99] R6_2.5.1 BiocIO_1.4.0
[101] latticeExtra_0.6-30 hwriter_1.3.2.1
[103] ShortRead_1.52.0 KernSmooth_2.23-20
[105] MASS_7.3-54 gtools_3.9.2.2
[107] assertthat_0.2.1 DESeq2_1.34.0
[109] rjson_0.2.21 withr_2.5.0
[111] GenomicAlignments_1.30.0 Rsamtools_2.10.0
[113] GenomeInfoDbData_1.2.7 parallel_4.1.1
[115] hms_1.1.1 grid_4.1.1
[117] TxDb.Dmelanogaster.UCSC.dm3.ensGene_3.2.2 coda_0.19-4
[119] ashr_2.2-54 mixsqp_0.3-47
[121] bbmle_1.0.25 lubridate_1.8.0
[123] numDeriv_2016.8-1.1 interp_1.1-2
[125] restfulr_0.0.15

ADD REPLY • link 3.5 years ago gannoncottone • 0