When I was using DiffBind, I got the error message below from dba.plotBox():
Error in wilcox.test.default(toplot[[i]], toplot[[j]], paired = FALSE) :
not enough 'y' observations
Calls: runDiffBind ... pv.plotBoxplot -> pvalMethod -> wilcox.test.default
Could someone tell me the possible reason of this? Thanks in advance.
The code is shown as below:
samples = read.csv(input)
# read the peaks into a DBA object.
# 1) plot
# by default, it will plots a correlation heatmap
#Correlation heatmap using occupacy (peak caller score) data
#For the peaks of SICER, I used fold change values here
tamoxifen = dba(sampleSheet=input)
#calculate a binding matrix with cores based on read counts for every sample.
# 2) plot
# correlation heatmap using read count data by default
tamoxifen = dba.count(tamoxifen, minOverlap=3, bRemoveDuplicates = FALSE)
# Establishing a contrast.
tamoxifen<-dba.contrast(tamoxifen, minMembers=2)
# dba.analyze is the main differential analysis function
# 3) plot
# by default, it will plots a correlation heatmap if diff peaks are found
# correlation heatmap using read count data of diff peaks.
cat("-dba.analyze", "\n")
tamoxifen<-dba.analyze(tamoxifen, bSubControl = FALSE)
cat("-dba.report", "\n")
tamoxifen.DB<-dba.report(tamoxifen, th = thrd)
# MA plot of comparison: Sites identified as significantly differentially bound shown in red.
# 4) plot:
cat("dba.plotMA", "\n")
dba.plotMA(tamoxifen)
cat("dba.fold", "\n")
tamoxifen.DB[tamoxifen.DB$Fold<0,]
length(tamoxifen.DB[tamoxifen.DB$Fold<0,])
length(tamoxifen.DB[tamoxifen.DB$Fold>0,])
# 4) plot PCA
cat("dba.plotPCA", "\n")
dba.plotPCA(tamoxifen, label = DBA_REPLICATE)
cat("dba.plotPCA", "\n")
# 5) plot PCA using count data for only differentailly bound sites.
dba.plotPCA(tamoxifen, contrast=1, th = thrd, label = DBA_REPLICATE)
#cat("dba.plotBox", "\n")
# 6) Boxplot of read distribution for differential peaks
#dba.plotBox(tamoxifen)
cat("dba.plotHeatmap", "\n")
# 7) Heatmap showing binding affinities for differentially peaks
dba.plotHeatmap(tamoxifen, contrast=1, correlations=FALSE)
## we need elementMetadata to
library("GenomicRanges")
cat("Write the table", "\n")
df <- data.frame(seqnames=seqnames(tamoxifen.DB), starts=start(tamoxifen.DB), ends=end(tamoxifen.DB),elementMetadata(tamoxifen.DB))
write.table(df, file=out, quote=F, sep="\t", row.names = F)
Here is the session information:
> sessionInfo()
R version 3.3.0 (2016-05-03)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server release 6.9 (Santiago)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] Biobase_2.32.0 edgeR_3.14.0
[3] bit64_0.9-7 splines_3.3.0
[5] gtools_3.5.0 assertthat_0.2.0
[7] stats4_3.3.0 latticeExtra_0.6-28
[9] amap_0.8-14 RBGL_1.48.1
[11] blob_1.1.0 Rsamtools_1.24.0
[13] Category_2.38.0 RSQLite_2.0
[15] backports_1.1.0 lattice_0.20-35
[17] glue_1.1.1 limma_3.28.21
[19] digest_0.6.12 GenomicRanges_1.24.3
[21] RColorBrewer_1.1-2 XVector_0.12.1
[23] checkmate_1.7.4 colorspace_1.3-2
[25] Matrix_1.2-6 plyr_1.8.4
[27] GSEABase_1.34.1 XML_3.98-1.9
[29] pkgconfig_2.0.1 pheatmap_1.0.8
[31] ShortRead_1.30.0 biomaRt_2.28.0
[33] genefilter_1.54.2 zlibbioc_1.18.0
[35] xtable_1.8-2 GO.db_3.3.0
[37] scales_0.5.0 brew_1.0-6
[39] gdata_2.18.0 BiocParallel_1.6.6
[41] tibble_1.3.4 annotate_1.50.1
[43] IRanges_2.6.1 ggplot2_2.2.1.9000
[45] SummarizedExperiment_1.2.3 GenomicFeatures_1.24.2
[47] BiocGenerics_0.18.0 lazyeval_0.2.0
[49] survival_2.41-3 magrittr_1.5
[51] memoise_1.1.0 systemPipeR_1.6.2
[53] fail_1.3 gplots_3.0.1
[55] hwriter_1.3.2 DiffBind_2.0.2
[57] GOstats_2.38.0 graph_1.50.0
[59] tools_3.3.0 BBmisc_1.9
[61] stringr_1.2.0 sendmailR_1.2-1
[63] S4Vectors_0.10.3 munsell_0.4.3
[65] locfit_1.5-9.1 bindrcpp_0.2
[67] AnnotationDbi_1.34.4 Biostrings_2.40.2
[69] GenomeInfoDb_1.8.7 caTools_1.17.1
[71] rlang_0.1.2 grid_3.3.0
[73] RCurl_1.95-4.8 rjson_0.2.15
[75] AnnotationForge_1.14.2 bitops_1.0-6
[77] base64enc_0.1-3 gtable_0.2.0
[79] DBI_0.7 R6_2.2.2
[81] GenomicAlignments_1.8.4 dplyr_0.7.2
[83] rtracklayer_1.32.0 bit_1.1-12
[85] bindr_0.1 KernSmooth_2.23-15
[87] stringi_1.1.5 parallel_3.3.0
[89] BatchJobs_1.6 Rcpp_0.12.12
Thanks for your reply. thrd is the threshold. I tried the latest version and I still got the message.
I sent you an email with the DBA object
tamoxifen.
<font face="monospace">Shaojun </font>
OK, I see what's going on. All of the differentially bound sites are + (fold change > 0; "gained"), and there are zero - ("loss") sites.
DiffBind
generates the plot just fine, but is failing to detect this condition properly and generates an error when computing the p-values of the differences.You can turn off the calculation of the p-values by setting
pvalMethod=NU
LL
, however there is another bug that will generate a different error. I'll fix both these issues soon. In the mean time, you are actually getting the plot, if not the p-values (which aren't very interesting). With all the differences occurring in one direction (seen indba.polotMA()
, or checking the signs of the fold changes in the report), the box plot isn't very illuminating.Cheers-
Rory
Thank you.