Dear Developers,
Thank you for the nice software! I am using DiffBind for merging peaks and re-counting the reads in the consensus regions and have a few questions about diffBind.
- the last column in the peaks after loading the peaks using dba (peakcaller is set to 'bed'). It seems it extracted the first three columns and the fifth column. However the value for the fifth column is different from the original file. Such as:
nova = dba(sampleSheet=samples)
head(nova$peaks[[1]])
V1 V2 V3 V5
1 chr1 835381 835586 0.09638554
2 chr1 842890 843095 0.10843373
3 chr1 853018 853247 0.15060241
4 chr1 866363 866568 0.08433735
5 chr1 882661 882887 0.21084337
6 chr1 894995 895521 0.09638554
colnames(samples)
[1] "SampleID" "Tissue" "bamReads" "ControlID" "bamControl"
[6] "Peaks" "PeakCaller"
In the original file:
V1 V2 V3 V4 V5 V6 V7 V8 V9
1 chr1 835381 835586 10_10589255_q0.01_peak_1 16 . 2.97909 3.23418 1.62779
2 chr1 842890 843095 10_10589255_q0.01_peak_2 18 . 3.13408 3.50458 1.83233
3 chr1 853018 853247 10_10589255_q0.01_peak_3 25 . 3.36780 4.39197 2.51693
4 chr1 866363 866568 10_10589255_q0.01_peak_4 14 . 2.61507 2.97812 1.44712
5 chr1 882661 882887 10_10589255_q0.01_peak_5 35 . 4.17826 5.78446 3.59553
6 chr1 894995 895521 10_10589255_q0.01_peak_6 16 . 2.86883 3.23247 1.63777
Can you please explain what the value means in the DBA peaks (such as 0.09638554)? How was it calculated?
- I have more than 1000 peak files/bam files. When I re-count the reads in the consensus regions, I always meet errors. Can I do the re-count for the same regions for one sample each time? If so, please guide me how to do it.
nova2 <- dba.count(nova, minOverlap=50, summits=FALSE, bScaleControl=TRUE, bParallel=4)
after running quite a while, the following errors come out:
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
...
[W::bam_hdr_read] bgzf_check_EOF: Cannot allocate memory
[E::bam_hdr_read] Invalid BAM binary header
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[E::bam_hdr_read] Error reading BGZF stream
*** caught segfault ***
address 0x10, cause 'memory not mapped'
Traceback:
1: cpp_count_reads(bamfile, insertLength, fileType, bufferSize, intervals, bWithoutDupes, summits, minMappingQuality)
2: pv.getCounts(bamfile = countrec$bamfile, intervals = intervals, insertLength = countrec$insert, bWithoutDupes = bWithoutDupes, bLowMem = bLowMem, yieldSize = yieldSize, mode = mode, singleEnd = singleEnd, scanbamparam = scanbamparam, fileType = fileType, summits = summits, fragments = fragments, minMappingQuality = minMappingQuality)
3: FUN(X[[i]], ...)
4: lapply(X = S, FUN = FUN, ...)
5: doTryCatch(return(expr), name, parentenv, handler)
6: tryCatchOne(expr, names, parentenv, handlers[[1L]])
7: tryCatchList(expr, classes, parentenv, handlers)
8: tryCatch(expr, error = function(e) { call <- conditionCall(e) if (!is.null(call)) { if (identical(call[[1L]], quote(doTryCatch))) call <- sys.call(-4L) dcall <- deparse(call)[1L] prefix <- paste("Error in", dcall, ": ") LONG <- 75L sm <- strsplit(conditionMessage(e), "\n")[[1L]] w <- 14L + nchar(dcall, type = "w") + nchar(sm[1L], type = "w") if (is.na(w)) w <- 14L + nchar(dcall, type = "b") + nchar(sm[1L], type = "b") if (w > LONG) prefix <- paste0(prefix, "\n ") } else prefix <- "Error : " msg <- paste0(prefix, conditionMessage(e), "\n") .Internal(seterrmessage(msg[1L])) if (!silent && isTRUE(getOption("show.error.messages"))) { cat(msg, file = outFile) .Internal(printDeferredWarnings()) } invisible(structure(msg, class = "try-error", condition = e))})
9: try(lapply(X = S, FUN = FUN, ...), silent = TRUE)
10: sendMaster(try(lapply(X = S, FUN = FUN, ...), silent = TRUE))
11: FUN(X[[i]], ...)
12: lapply(seq_len(cores), inner.do)
13: mclapply(arglist, fn, ..., mc.preschedule = TRUE, mc.allow.recursive = TRUE)
14: config$lapplyFun(config, params, arglist, fn, ...)
15: dba.parallel.lapply(pv$config, params, todorecs, pv.do_getCounts, bed, bWithoutDupes = bWithoutDupes, bLowMem, yieldSize, mode, singleEnd, scanbamparam, readFormat, summits, fragments, minMappingQuality)
16: pv.counts(DBA, peaks = peaks, minOverlap = minOverlap, defaultScore = score, bLog = bLog, insertLength = fragmentSize, bOnlyCounts = T, bCalledMasks = TRUE, minMaxval = filter, bParallel = bParallel, bUseLast = bUseLast, bWithoutDupes = bRemoveDuplicates, bScaleControl = bScaleControl, filterFun = filterFun, bLowMem = bUseSummarizeOverlaps, readFormat = readFormat, summits = summits, minMappingQuality = mapQCth)
17: dba.count(nova, minOverlap = 50, summits = FALSE, bScaleControl = TRUE, score = DBA_SCORE_RPKM, bParallel = 4)
Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection:
*** caught segfault ***
address 0x10, cause 'memory not mapped'
- for broad peaks, if I use summit parameter in the dba.count, will the summit be generated for each merged peak per sample? Can the summit be generated according to the distribution of the reads in all samples?
Thank you very much!
Tongjun
Thank you! It helps so much!
Hi Dr. Stark, I have questions about the reads number counted by diffBind function of dba.count. I used DiffBind_3.0.15 under R/4.0.5. Here is the command I used:
My input is a mixed of paired-end (read length is 150 or 151) and single-end (read length is 51) samples. Are the reads counted by dba.count for paired-end doubled by 2 folds (because of paired-end)? Although you explained earlier that by setting bUseSummarizeOverlaps=FALSE, the paired-end samples can be treated as single-end sample. In the output of info$Reads, the reads is way more than it should be for paired-end samples and also a little more for the single-end samples.
What is the default mapping quality used for filtering the reads?
Will a mixed of paired-end and single-end samples affect the read counting?
Thank you very much!