Question

Issue in generating a new DBA object with custom counts

0

Entering edit mode

francesco.gandolfi • 0

@6266f0e5

Last seen 19 months ago

Italy

Hi,

I'm working with diffbind R package to perform differential analysis on h3k27ac peak data, however, in my case I have to generate a new DBA object containing custom read count across a number of patient samples. Following diffbind instructions, I tried to generate the DBA object starting from the 'dba' command using a Datasheet with all the required meta data (sampleID, Condition, Treatment ...), where the column 'Counts' refers to the path of tab-delimited file formatted as 'chrom', 'start','end','count' for each sample. However, when I try to create the dba...

```newdba <- dba(sampleSheet=datasheet.temp)```

The following error message come out:

"Error in peaks[, 1:3] : incorrect number of dimensions"

It seems to be due to something in the data.frame format but starting from this message it's really hard for me to understand what's wrong.

This is my 'datasheet.temp' object:

ls.str(datasheet.temp)
bamControl :  chr [1:79] "/datasets/MDS_five_patients_ReseqTo20Mn_22_10_07/alignments/Input.bam"| __truncated__ ... 
bamReads :  chr [1:79] "/datasets/MDS_five_patients_ReseqTo20Mn_22_10_07/alignments/hlu1171.bam"| __truncated__ ...
Condition :  chr [1:79] "AML" "LR" "AML" "AML" "AML" "AML" "LR" "LR" "LR" "LR" "LR" ...
ControlID :  chr [1:79] "CD34_control_PE" "CD34_control_PE" "CD34_control_PE" ...
Counts :  chr [1:79] "/analysis_of_h3k27ac_peaks/diffbind/hlu1171.counts.tsv" ...
Factor :  chr [1:79] "h3k27ac" "h3k27ac" "h3k27ac" "h3k27ac" "h3k27ac" "h3k27ac" ...
PeakCaller :  chr [1:79] "bed" "bed" "bed" "bed" "bed" "bed" "bed" "bed" "bed" "bed" ...
Peaks :  chr [1:79] "/datasets/MDS_five_patients_ReseqTo20Mn_22_10_07/peak_calling/hlu1171.cleanedPeaks.bed"| __truncated__ ...
Replicate :  chr [1:79] "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" ...
SampleID :  chr [1:79] "hlu1171" "hlu1455" "hlu1520" "hlu642" "hlu685" "hlu719" ...
Tissue :  chr [1:79] "Human Bone Marrow CD34+" "Human Bone Marrow CD34+" ...
Treatment :  chr [1:79] "five" "five" "five" "five" "five" "five" "eight" "eight" ...

Whilst a single 'tab-delimited count file (as indicated in the 'Counts' column) is:

chr1    825646  828042  71
chr1    843236  844972  18
chr1    869535  870364  16
chr1    902147  933503  351
chr1    935089  946583  140
chr1    950841  969118  293
chr1    970366  980728  119

sessionInfo( )
R version 4.1.3 (2022-03-10)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)


locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] sva_3.42.0                  BiocParallel_1.28.3        
 [3] genefilter_1.76.0           mgcv_1.8-41                
 [5] nlme_3.1-160                writexl_1.4.1              
 [7] ggplot2_3.3.6               RColorBrewer_1.1-3         
 [9] gplots_3.1.3                DiffBind_3.4.11            
[11] SummarizedExperiment_1.24.0 Biobase_2.54.0             
[13] MatrixGenerics_1.6.0        matrixStats_0.62.0         
[15] GenomicRanges_1.46.1        GenomeInfoDb_1.30.1        
[17] IRanges_2.28.0              S4Vectors_0.32.4           
[19] BiocGenerics_0.40.0        

loaded via a namespace (and not attached):
 [1] bitops_1.0-7             bit64_4.0.5              httr_1.4.4              
 [4] numDeriv_2016.8-1.1      tools_4.1.3              utf8_1.2.2              
 [7] R6_2.5.1                 irlba_2.3.5.1            KernSmooth_2.23-20      
[10] DBI_1.1.3                colorspace_2.0-3         apeglm_1.16.0           
[13] withr_2.5.0              tidyselect_1.2.0         bit_4.0.4               
[16] compiler_4.1.3           cli_3.4.1                DelayedArray_0.20.0     
[19] rtracklayer_1.54.0       caTools_1.18.2           scales_1.2.1            
[22] SQUAREM_2021.1           mvtnorm_1.1-3            mixsqp_0.3-43           
[25] stringr_1.4.1            digest_0.6.30            Rsamtools_2.10.0        
[28] XVector_0.34.0           jpeg_0.1-9               pkgconfig_2.0.3         
[31] htmltools_0.5.3          fastmap_1.1.0            invgamma_1.1            
[34] bbmle_1.0.25             limma_3.50.3             BSgenome_1.62.0         
[37] htmlwidgets_1.5.4        rlang_1.0.6              RSQLite_2.2.18          
[40] BiocIO_1.4.0             generics_0.1.3           hwriter_1.3.2.1         
[43] gtools_3.9.3             dplyr_1.0.10             RCurl_1.98-1.9          
[46] magrittr_2.0.3           GenomeInfoDbData_1.2.7   interp_1.1-3            
[49] Matrix_1.5-1             Rcpp_1.0.9               munsell_0.5.0           
[52] fansi_1.0.3              lifecycle_1.0.3          edgeR_3.36.0            
[55] stringi_1.7.8            yaml_2.3.6               MASS_7.3-58.1           
[58] zlibbioc_1.40.0          plyr_1.8.7               blob_1.2.3              
[61] grid_4.1.3               parallel_4.1.3           ggrepel_0.9.1           
[64] bdsmatrix_1.3-6          crayon_1.5.2             deldir_1.0-6            
[67] lattice_0.20-45          splines_4.1.3            Biostrings_2.62.0       
[70] annotate_1.72.0          KEGGREST_1.34.0          locfit_1.5-9.6          
[73] pillar_1.8.1             rjson_0.2.21             systemPipeR_2.0.0       
[76] XML_3.99-0.12            glue_1.6.2               ShortRead_1.52.0        
[79] GreyListChIP_1.26.0      latticeExtra_0.6-30      png_0.1-7               
[82] vctrs_0.5.0              gtable_0.3.1             amap_0.8-19             
[85] assertthat_0.2.1         cachem_1.0.6             ashr_2.2-54             
[88] emdbook_1.3.12           xtable_1.8-4             restfulr_0.0.15         
[91] coda_0.19-4              survival_3.4-0           truncnorm_1.0-8         
[94] tibble_3.1.8             memoise_2.0.1            AnnotationDbi_1.56.2    
[97] GenomicAlignments_1.30.0

Thanks a lot in advance if somebody can help me.

Cheers,

Francesco

DiffBind dbaobject • 1.4k views

ADD COMMENT • link 23 months ago • updated 22 months ago francesco.gandolfi • 0

score 0 · Answer 1 · 2023-01-06

0

Entering edit mode

Rory Stark ★ 5.2k

@rory-stark-5741

Last seen 5 weeks ago

Cambridge, UK

This should be fixed in the current version (DiffBind_3.8.3).

ADD COMMENT • link 23 months ago Rory Stark ★ 5.2k

score 0 · Answer 2 · 2023-01-09

0

Entering edit mode

francesco.gandolfi • 0

@6266f0e5

Last seen 19 months ago

Italy

Ok, thanks a lot Rory. Francesco

ADD COMMENT • link 23 months ago francesco.gandolfi • 0

score 0 · Answer 3 · 2023-01-11

0

Entering edit mode

francesco.gandolfi • 0

@6266f0e5

Last seen 19 months ago

Italy

Hi Rory,

After a try with the new DiffBind version (3.8.0 and 3.8.3) , the error still appears. I guess something was wrong in the dba object structure imported from the datasheet. Indeed, I tried to launch the same dba command but using a datasheet without the (last) 'Counts' column and the function works. Now I'm checking the format of the counts.tsv files , if I understood correctly, these are tab-delimited files with chromosome, start, end and counts right? Actually I followed that format. Regions are also sorted by chromosome and position. Do you know if there is something else to consider in creating these count files? Thanks a lot. Francesco

ADD COMMENT • link 23 months ago francesco.gandolfi • 0

0

Entering edit mode

I've tracked this down to a quirk/bug. The documentation says that if Counts is specified, then Peaks is ignored (it uses the peakset defined in the Count files). Right now it is getting confused by having both Peaks and Counts specified. I'll check in a fix, but in the meantime you can just set dataset.temp$Peaks <- NULL and it should work.

ADD REPLY • link 22 months ago Rory Stark ★ 5.2k

0

Entering edit mode

Thanks so much Rory, completely clear. I followed your suggestion and the error did not appear. Unfortunately, the 'dba' function stopped due to another error message:

Error in if (nrow(pv$merged) != nrow(pv$binding)) { : argument is of length zero

which strangely results after reading the second sample indicated in the datasheet.temp. I already checked previous cases like this in the forum but unfortunately I didn't find anything.

Any idea about the cause of this error message?

Thanks a lot!

Francesco

ADD REPLY • link 22 months ago francesco.gandolfi • 0

0

Entering edit mode

That issue was addressed from version DiffBind_3.8.2 onwards, it should work in the current version.

ADD REPLY • link 22 months ago Rory Stark ★ 5.2k

0

Entering edit mode

Yes, I replaced my diffbind with 3.8.4 version and it works. Thank you so much Rory! Francesco

ADD REPLY • link 22 months ago francesco.gandolfi • 0