Hi,
I'm working with diffbind R package to perform differential analysis on h3k27ac peak data, however, in my case I have to generate a new DBA object containing custom read count across a number of patient samples. Following diffbind instructions, I tried to generate the DBA object starting from the 'dba' command using a Datasheet with all the required meta data (sampleID, Condition, Treatment ...), where the column 'Counts' refers to the path of tab-delimited file formatted as 'chrom', 'start','end','count' for each sample. However, when I try to create the dba...
```newdba <- dba(sampleSheet=datasheet.temp)```
The following error message come out:
"Error in peaks[, 1:3] : incorrect number of dimensions"
It seems to be due to something in the data.frame format but starting from this message it's really hard for me to understand what's wrong.
This is my 'datasheet.temp' object:
ls.str(datasheet.temp)
bamControl : chr [1:79] "/datasets/MDS_five_patients_ReseqTo20Mn_22_10_07/alignments/Input.bam"| __truncated__ ...
bamReads : chr [1:79] "/datasets/MDS_five_patients_ReseqTo20Mn_22_10_07/alignments/hlu1171.bam"| __truncated__ ...
Condition : chr [1:79] "AML" "LR" "AML" "AML" "AML" "AML" "LR" "LR" "LR" "LR" "LR" ...
ControlID : chr [1:79] "CD34_control_PE" "CD34_control_PE" "CD34_control_PE" ...
Counts : chr [1:79] "/analysis_of_h3k27ac_peaks/diffbind/hlu1171.counts.tsv" ...
Factor : chr [1:79] "h3k27ac" "h3k27ac" "h3k27ac" "h3k27ac" "h3k27ac" "h3k27ac" ...
PeakCaller : chr [1:79] "bed" "bed" "bed" "bed" "bed" "bed" "bed" "bed" "bed" "bed" ...
Peaks : chr [1:79] "/datasets/MDS_five_patients_ReseqTo20Mn_22_10_07/peak_calling/hlu1171.cleanedPeaks.bed"| __truncated__ ...
Replicate : chr [1:79] "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" ...
SampleID : chr [1:79] "hlu1171" "hlu1455" "hlu1520" "hlu642" "hlu685" "hlu719" ...
Tissue : chr [1:79] "Human Bone Marrow CD34+" "Human Bone Marrow CD34+" ...
Treatment : chr [1:79] "five" "five" "five" "five" "five" "five" "eight" "eight" ...
Whilst a single 'tab-delimited count file (as indicated in the 'Counts' column) is:
chr1 825646 828042 71
chr1 843236 844972 18
chr1 869535 870364 16
chr1 902147 933503 351
chr1 935089 946583 140
chr1 950841 969118 293
chr1 970366 980728 119
sessionInfo( )
R version 4.1.3 (2022-03-10)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] sva_3.42.0 BiocParallel_1.28.3
[3] genefilter_1.76.0 mgcv_1.8-41
[5] nlme_3.1-160 writexl_1.4.1
[7] ggplot2_3.3.6 RColorBrewer_1.1-3
[9] gplots_3.1.3 DiffBind_3.4.11
[11] SummarizedExperiment_1.24.0 Biobase_2.54.0
[13] MatrixGenerics_1.6.0 matrixStats_0.62.0
[15] GenomicRanges_1.46.1 GenomeInfoDb_1.30.1
[17] IRanges_2.28.0 S4Vectors_0.32.4
[19] BiocGenerics_0.40.0
loaded via a namespace (and not attached):
[1] bitops_1.0-7 bit64_4.0.5 httr_1.4.4
[4] numDeriv_2016.8-1.1 tools_4.1.3 utf8_1.2.2
[7] R6_2.5.1 irlba_2.3.5.1 KernSmooth_2.23-20
[10] DBI_1.1.3 colorspace_2.0-3 apeglm_1.16.0
[13] withr_2.5.0 tidyselect_1.2.0 bit_4.0.4
[16] compiler_4.1.3 cli_3.4.1 DelayedArray_0.20.0
[19] rtracklayer_1.54.0 caTools_1.18.2 scales_1.2.1
[22] SQUAREM_2021.1 mvtnorm_1.1-3 mixsqp_0.3-43
[25] stringr_1.4.1 digest_0.6.30 Rsamtools_2.10.0
[28] XVector_0.34.0 jpeg_0.1-9 pkgconfig_2.0.3
[31] htmltools_0.5.3 fastmap_1.1.0 invgamma_1.1
[34] bbmle_1.0.25 limma_3.50.3 BSgenome_1.62.0
[37] htmlwidgets_1.5.4 rlang_1.0.6 RSQLite_2.2.18
[40] BiocIO_1.4.0 generics_0.1.3 hwriter_1.3.2.1
[43] gtools_3.9.3 dplyr_1.0.10 RCurl_1.98-1.9
[46] magrittr_2.0.3 GenomeInfoDbData_1.2.7 interp_1.1-3
[49] Matrix_1.5-1 Rcpp_1.0.9 munsell_0.5.0
[52] fansi_1.0.3 lifecycle_1.0.3 edgeR_3.36.0
[55] stringi_1.7.8 yaml_2.3.6 MASS_7.3-58.1
[58] zlibbioc_1.40.0 plyr_1.8.7 blob_1.2.3
[61] grid_4.1.3 parallel_4.1.3 ggrepel_0.9.1
[64] bdsmatrix_1.3-6 crayon_1.5.2 deldir_1.0-6
[67] lattice_0.20-45 splines_4.1.3 Biostrings_2.62.0
[70] annotate_1.72.0 KEGGREST_1.34.0 locfit_1.5-9.6
[73] pillar_1.8.1 rjson_0.2.21 systemPipeR_2.0.0
[76] XML_3.99-0.12 glue_1.6.2 ShortRead_1.52.0
[79] GreyListChIP_1.26.0 latticeExtra_0.6-30 png_0.1-7
[82] vctrs_0.5.0 gtable_0.3.1 amap_0.8-19
[85] assertthat_0.2.1 cachem_1.0.6 ashr_2.2-54
[88] emdbook_1.3.12 xtable_1.8-4 restfulr_0.0.15
[91] coda_0.19-4 survival_3.4-0 truncnorm_1.0-8
[94] tibble_3.1.8 memoise_2.0.1 AnnotationDbi_1.56.2
[97] GenomicAlignments_1.30.0
Thanks a lot in advance if somebody can help me.
Cheers,
Francesco
I've tracked this down to a quirk/bug. The documentation says that if
Counts
is specified, thenPeaks
is ignored (it uses the peakset defined in theCount
files). Right now it is getting confused by having bothPeaks
andCounts
specified. I'll check in a fix, but in the meantime you can just setdataset.temp$Peaks <- NULL
and it should work.Thanks so much Rory, completely clear. I followed your suggestion and the error did not appear. Unfortunately, the 'dba' function stopped due to another error message:
Error in if (nrow(pv$merged) != nrow(pv$binding)) { : argument is of length zero
which strangely results after reading the second sample indicated in the datasheet.temp. I already checked previous cases like this in the forum but unfortunately I didn't find anything.
Any idea about the cause of this error message?
Thanks a lot!
Francesco
That issue was addressed from version
DiffBind_3.8.2
onwards, it should work in the current version.Yes, I replaced my diffbind with 3.8.4 version and it works. Thank you so much Rory! Francesco