Search
Question: DiffBind: failed to run dba.count in SSH
0
gravatar for zhaolin20042008
6 months ago by
zhaolin200420080 wrote:

Hi, I have tried to run R package- DiffBind in SSH (Linux). I have R script works will with windows R 3.3.3 but failed at SSH sever linux R. It stops at dba.count step. Could anyone help me? Does it mean the bam file is required instead of bed file?  Thank you. 

#!/usr/bin/env Rscript
args = commandArgs(trailingOnly=TRUE)
library(DiffBind)
setwd("/net/wonderland/home/zhaolinz/practice/batch8_macs2")
#list.files()
#list.files("diff")
diff <- dba(sampleSheet="CLA050417.csv",minOverlap=2,config=data.frame(RunParallel=TRUE, reportInit="diff",AnalysisMethod=DBA_DESEQ2,minQCth=5,fragmentSize=300,bCorPlot=FALSE,th=0.1, bUsePval=FALSE))
diff_count <- dba.count(diff, minOverlap=2,score=DBA_SCORE_TMM_READS_FULL, fragmentSize=diff$config$fragmentSize, filter=0, bRemoveDuplicates=FALSE, filterFun=max,bCorPlot=diff$config$bCorPlot,
bUseSummarizeOverlaps=FALSE, readFormat=DBA_READS_DEFAULT,bParallel=diff$config$RunParallel)

.....

<font face="monospace">The error is here  </font>

 *** caught segfault ***

address (nil), cause 'unknown'

Traceback:
 1: .Call("croi_count_reads", bamfile, as.integer(insertLength),     as.integer(fileType), as.integer(bufferSize), as.integer(minMappingQual),     as.character(intervals[[1]]), as.integer(intervals[[2]]),     as.integer(intervals[[3]]), as.integer(icount), as.logical(bWithoutDupes),     as.logical(bSummits), counts, summits.vec, heights.vec)
 2: cpp_count_reads(bamfile, insertLength, fileType, bufferSize,     intervals, bWithoutDupes, summits, minMappingQuality)
 3: pv.getCounts(bamfile = countrec$bamfile, intervals = intervals,     insertLength = countrec$insert, bWithoutDupes = bWithoutDupes,     bLowMem = bLowMem, yieldSize = yieldSize, mode = mode, singleEnd = singleEnd,     scanbamparam = scanbamparam, fileType = fileType, summits = summits,     fragments = fragments, minMappingQuality = minMappingQuality)
 4: FUN(X[[i]], ...)
 5: lapply(X = S, FUN = FUN, ...)
 6: doTryCatch(return(expr), name, parentenv, handler)
 7: tryCatchOne(expr, names, parentenv, handlers[[1L]])
 8: tryCatchList(expr, classes, parentenv, handlers)
 9: tryCatch(expr, error = function(e) {    call <- conditionCall(e)    if (!is.null(call)) {        if (identical(call[[1L]], quote(doTryCatch)))             call <- sys.call(-4L)        dcall <- deparse(call)[1L]        pr
efix <- paste("Error in", dcall, ": ")        LONG <- 75L        msg <- conditionMessage(e)        sm <- strsplit(msg, "\n")[[1L]]        w <- 14L + nchar(dcall, type = "w") + nchar(sm[1L], type = "w")        if is.na(w))  
           w <- 14L + nchar(dcall, type = "b") + nchar(sm[1L],                 type = "b")        if (w > LONG)             prefix <- paste0(prefix, "\n  ")    }    else prefix <- "Error : "    msg <- paste0(prefix, conditio
nMessage(e), "\n")    .Internal(seterrmessage(msg[1L]))    if (!silent && identical(getOption("show.error.messages"),         TRUE)) {        cat(msg, file = outFile)        .Internal(printDeferredWarnings())    }    invisible(structure(msg, class = "try-error", condition = e))})
10: try(lapply(X = S, FUN = FUN, ...), silent = TRUE)
11: sendMaster(try(lapply(X = S, FUN = FUN, ...), silent = TRUE))
12: FUN(X[[i]], ...)
13: lapply(seq_len(cores), inner.do)
14: mclapply(arglist, fn, ..., mc.preschedule = TRUE, mc.allow.recursive = TRUE)
15: config$lapplyFun(config, params, arglist, fn, ...)
16: dba.parallel.lapply(pv$config, params, todorecs, pv.do_getCounts,     bed, bWithoutDupes = bWithoutDupes, bLowMem, yieldSize, mode,     singleEnd, scanbamparam, readFormat, summits, fragments,     minMappingQuality)
17: pv.counts(DBA, peaks = peaks, minOverlap = minOverlap, defaultScore = score,     bLog = bLog, insertLength = fragmentSize, bOnlyCounts = T,     bCalledMasks = TRUE, minMaxval = filter, bParallel = bParallel,     bUseLast = bUseLast, bWithoutDupes = bRemoveDuplicates, bScaleControl = bScaleControl,     filterFun = filterFun, bLowMem = bUseSummarizeOverlaps, readFormat = readFormat,     summits = summits, minMappingQuality = mapQCth)
18: dba.count(diff, minOverlap = 3)

ADD COMMENTlink modified 6 months ago by Gord Brown540 • written 6 months ago by zhaolin200420080

Hi,

Could you post the output of "sessionInfo()" so we can see what version you're running?  There were bugs in earlier versions of DiffBind when using bed files for counts.  Just to clarify your comment, are your reads in .bed (or .bed.gz) format instead of .bam?  We used to support .bed for reads as well as peaks, but we decided to remove it (though I can't off-hand recall whether or not I actually did take the code for it).  And if your reads are indeed .bed, post the first few lines of one of them.

(As a side note, if you tag your post with 'diffbind' we'll get email when you post, so you might get a faster response.)

ADD REPLYlink written 6 months ago by Gord Brown540
SampleID Tissue Factor Condition Treatment Replicate bamReads Peaks PeakCaller
4471_64502 CD4 CLA Positive 0h 1 64502_macs2_summits.bed  64502_macs2_peaks.narrowPeak  narrow
4471_64503 CD4 CLA Negative 0h 1 64503_macs2_summits.bed  64503_macs2_peaks.narrowPeak   narrow
4471_64504 CD8 CLA Positive 0h 1 64504_macs2_summits.bed  64504_macs2_peaks.narrowPeak  narrow
ADD REPLYlink written 6 months ago by zhaolin200420080

Your sample sheet looks reasonable.  (Normally we'd store all the bam or bed or peak files in subdirectories called bam or bed or peak, so the sample sheet would have to include the full (or relative) paths to the files, but you would get a different error if it couldn't find the files at all.)

Please post the output of the sessionInfo() command, and the first few lines of one of your .bed files, so I can see whether the old bug is the problem.

(It turns out that we do still support .bed as a format for reads.)

 - Gord

ADD REPLYlink written 6 months ago by Gord Brown540

It works when DBA was built but error stopped at dba.count(). I had ran the same script using R program in PC and it worked fine. However, short of memory space led me to follow the SSH linux system. 

300073_57756 CD4 CLA Positive 0h 9 narrow
300073_57757 CD4 CLA Negative 0h 9 narrow
300073_57758 CD8 CLA Positive 0h 9 narrow
300073_57759 CD8 CLA Negative 0h 9 narrow

 *** caught segfault ***
address (nil), cause 'unknown'

Traceback:
 1: .Call("croi_count_reads", bamfile, as.integer(insertLength),     as.integer(fileType), as.integer(bufferSize), as.integer(minMappingQual),     as.character(intervals[[1]]), as.integer(intervals[[2]]),     as.integer(intervals[[3]]), as.integer(icount), as.logical(bWithoutDupes),     as.logical(bSummits), counts, summits.vec, heights.vec)

ADD REPLYlink written 6 months ago by zhaolin200420080

There is a command in R called "sessionInfo()" that shows what versions of R and the currently-loaded packages you are running.  For example, on a slightly out-of-date version R, if I execute the command "sessionInfo()" at the R prompt, I get these results:

> sessionInfo()
R version 3.2.4 (2016-03-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 8 (jessie)

locale:
 [1] LC_CTYPE=en_GB.utf8       LC_NUMERIC=C             
 [3] LC_TIME=en_GB.utf8        LC_COLLATE=C             
 [5] LC_MONETARY=en_GB.utf8    LC_MESSAGES=en_GB.utf8   
 [7] LC_PAPER=en_GB.utf8       LC_NAME=C                
 [9] LC_ADDRESS=C              LC_TELEPHONE=C           
[11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C      

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] DiffBind_1.16.3            RSQLite_1.0.0             
 [3] DBI_0.3.1                  locfit_1.5-9.1            
 [5] GenomicAlignments_1.6.3    Rsamtools_1.22.0          
 [7] Biostrings_2.38.4          XVector_0.10.0            
 [9] limma_3.26.8               SummarizedExperiment_1.0.2
[11] Biobase_2.30.0             GenomicRanges_1.22.4      
[13] GenomeInfoDb_1.6.3         IRanges_2.4.8             
[15] S4Vectors_0.8.11           BiocGenerics_0.16.1       

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.3             lattice_0.20-33         GO.db_3.2.2            
 [4] gtools_3.5.0            digest_0.6.9            plyr_1.8.3             
 [7] futile.options_1.0.0    BatchJobs_1.6           backports_1.0.1        
[10] ShortRead_1.28.0        ggplot2_2.1.0           gplots_2.17.0          
[13] zlibbioc_1.16.0         GenomicFeatures_1.22.13 annotate_1.48.0        
[16] gdata_2.17.0            Matrix_1.2-4            checkmate_1.7.3        
[19] systemPipeR_1.4.8       GOstats_2.36.0          splines_3.2.4          
[22] BiocParallel_1.4.3      stringr_1.0.0           pheatmap_1.0.8         
[25] RCurl_1.95-4.8          biomaRt_2.26.1          munsell_0.4.3          
[28] sendmailR_1.2-1         rtracklayer_1.30.2      base64enc_0.1-3        
[31] BBmisc_1.9              fail_1.3                edgeR_3.12.0           
[34] XML_3.98-1.4            AnnotationForge_1.12.2  bitops_1.0-6           
[37] grid_3.2.4              RBGL_1.46.0             xtable_1.8-2           
[40] GSEABase_1.32.0         gtable_0.2.0            magrittr_1.5           
[43] scales_0.4.0            graph_1.48.0            KernSmooth_2.23-15     
[46] amap_0.8-14             stringi_1.0-1           hwriter_1.3.2          
[49] genefilter_1.52.1       latticeExtra_0.6-28     futile.logger_1.4.1    
[52] brew_1.0-6              rjson_0.2.15            lambda.r_1.1.7         
[55] RColorBrewer_1.1-2      tools_3.2.4             Category_2.36.0        
[58] survival_2.38-3         AnnotationDbi_1.32.3    colorspace_1.2-6       
[61] caTools_1.17.1         

Could you (after running the command "library(DiffBind)" ) please type the command "sessionInfo()" at the R command prompt and post the results?

After you've done that, could you, at the SSH command prompt, i.e. the shell command line, type

head 64502_macs2_peaks.narrowPeak

or within an R session, try

system("head 64502_macs2_peaks.narrowPeak")

so that I can see the first few lines of one of your read files?

I've asked you for these things twice before; this is the third time.  I can't diagnose the problem without this information.  

ADD REPLYlink modified 6 months ago • written 6 months ago by gdbzork0

> sessionInfo()

R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS

Matrix products: default
BLAS: /usr/lib/atlas-base/atlas/libblas.so.3.0
LAPACK: /usr/lib/lapack/liblapack.so.3.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] DiffBind_1.16.3            RSQLite_1.1-2             
 [3] locfit_1.5-9.1             GenomicAlignments_1.6.3   
 [5] Rsamtools_1.22.0           Biostrings_2.38.4         
 [7] XVector_0.10.0             limma_3.26.9              
 [9] SummarizedExperiment_1.0.2 Biobase_2.30.0            
[11] GenomicRanges_1.22.4       GenomeInfoDb_1.6.3        
[13] IRanges_2.4.8              S4Vectors_0.8.11          
[15] BiocGenerics_0.16.1       

      

ADD REPLYlink written 6 months ago by zhaolin200420080

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.10            lattice_0.20-35         GO.db_3.2.2            
 [4] gtools_3.5.0            digest_0.6.12           plyr_1.8.4             
 [7] backports_1.0.5         futile.options_1.0.0    BatchJobs_1.6          
[10] ShortRead_1.28.0        ggplot2_2.2.1           gplots_3.0.1           
[13] zlibbioc_1.16.0         GenomicFeatures_1.22.13 lazyeval_0.2.0         
[16] annotate_1.48.0         gdata_2.17.0            Matrix_1.2-10          
[19] checkmate_1.8.2         systemPipeR_1.4.8       GOstats_2.36.0         
[22] splines_3.4.0           BiocParallel_1.4.3      stringr_1.2.0          
[25] pheatmap_1.0.8          RCurl_1.95-4.8          biomaRt_2.26.1         
[28] munsell_0.4.3           sendmailR_1.2-1         compiler_3.4.0         
[31] rtracklayer_1.30.4      base64enc_0.1-3         BBmisc_1.11            
[34] fail_1.3                tibble_1.3.0            edgeR_3.12.1           
[37] XML_3.98-1.7            AnnotationForge_1.12.2  bitops_1.0-6           
[40] grid_3.4.0              RBGL_1.46.0             xtable_1.8-2           
[43] GSEABase_1.32.0         gtable_0.2.0            DBI_0.6-1              
[46] magrittr_1.5            scales_0.4.1            graph_1.48.0           
[49] KernSmooth_2.23-15      amap_0.8-14             stringi_1.1.5          
[52] hwriter_1.3.2           genefilter_1.52.1       latticeExtra_0.6-28    
[55] futile.logger_1.4.3     brew_1.0-6              rjson_0.2.15           
[58] lambda.r_1.1.9          RColorBrewer_1.1-2      tools_3.4.0            
[61] Category_2.36.0         survival_2.41-3         AnnotationDbi_1.32.3   
[64] colorspace_1.3-2        caTools_1.17.1          memoise_1.1.0  

ADD REPLYlink written 6 months ago by zhaolin200420080

> system("head 64502_macs2_peaks.narrowPeak")
1       10340   10634   64502_macs2_peak_1      59      .       4.99269 8.081125.94287  187
1       565190  565464  64502_macs2_peak_2      759     .       8.94809 79.14370
        75.99088        155
1       566790  567199  64502_macs2_peak_3      80      .       3.00136 10.24183
        8.04778 272
1       569261  569564  64502_macs2_peak_4      937     .       10.13605       97.11760 93.74061        188
1       713661  714645  64502_macs2_peak_5      1063    .       16.07595       109.91768        106.36751       549
1       752597  752887  64502_macs2_peak_6      34      .       3.92283 5.496043.44459  118
1       762453  763263  64502_macs2_peak_7      578     .       14.14343       60.75053 57.81964        410
1       779676  780263  64502_macs2_peak_8      290     .       11.01695       31.59573 29.03273        391
1       793444  793891  64502_macs2_peak_9      77      .       5.70594 9.955587.76720  111
1       794057  794275  64502_macs2_peak_10     59      .       4.99269 8.081125.94287  133

 

ADD REPLYlink written 6 months ago by zhaolin200420080

Sorry for the late response. We are updating all the internet system within the department. 

ADD REPLYlink written 6 months ago by zhaolin200420080
0
gravatar for Gord Brown
6 months ago by
Gord Brown540
United Kingdom
Gord Brown540 wrote:

Your sessionInfo() tells me that you're running an extremely old version of DiffBind, and R.  Your read bed files probably have 5 columns, which is not legal bed format (although it is the format some aligners will give you).  You need either 3 columns, or 6 or more.  Ideally, upgrade to the current R (3.4, as of a few days ago), or ensure your bed files have exactly 3 or at least 6 columns.  Either one should fix the problem, though upgrading is preferable.  There are many bug fixes and improvements between DiffBind 1.16.3 and 2.5.1.

[edit] I also just noticed that your "read" files have names like "64502_macs2_summits.bed" which is an unlikely name for the output of an aligner.  You have to give the actual reads from the aligner in the "bamReads" column.  The workflow should be something like:

1) from Fastq, align to the reference genome to get bam (or bed) files.

2) With the aligned reads, call peaks using (in your case) MACS2, to get peaks (and summits).

3) Give both the *aligned reads* (not the summits) and the *peaks* to DiffBind for differential analysis.

Since your reads files are named "something_macs2_summits.bed", they are probably the summits from MACS, not the aligned reads from the aligner. 

 - Gord

ADD COMMENTlink modified 6 months ago • written 6 months ago by Gord Brown540

Many thanks. 

1. I will update DIffBind 2.5. 

2. Then I will change aligner bam file at the "bamReds" column.

Z Zhang

ADD REPLYlink modified 6 months ago • written 6 months ago by zhaolin200420080
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 153 users visited in the last hour