Question

DiffBind Error when dba.count on a pre-defined peak set

0

Entering edit mode

tangming2005 ▴ 190

@tangming2005-6754

Last seen 4 days ago

United States

Hi,

I have run Diffbind on a set of ChIP-seq samples and generated a dba.count object. Now, I want to count reads on a set of peaks (promoters/enhancers) that I pre-defined and got this error:

H3K27ac.dba<- dba(sampleSheet="my_full_diffbind.csv", scoreCol= 7, filter=80, peakFormat = "macs")

## using RPKM for normalization (reads number in the bam files)
H3K27ac_RPKM<- dba.count(H3K27ac.dba, minOverlap=2, 

                      fragmentSize = 200, bParallel = T,
                      score = DBA_SCORE_RPKM)

mypeaks<- import("my_enhancers.bed", format="BED")

mypeaks.count <- dba.count(H3K27ac_RPKM, peaks= mypeaks, bParallel = T,

+                                      fragmentSize = 200, score = DBA_SCORE_RPKM, minOverlap = 0)

Error in `$<-.data.frame`(`*tmp*`, "chrmap", value = c("chr1", "chr10",  : 

  replacement has 24 rows, data has 45728

Thanks very much.

Ming

diffbind chipseq • 1.2k views

ADD COMMENT • link updated 7.6 years ago by Rory Stark ★ 5.2k • written 7.6 years ago by tangming2005 ▴ 190

score 0 · Answer 1 · 2016-10-14

0

Entering edit mode

Rory Stark ★ 5.2k

@rory-stark-5741

Last seen 4 days ago

Cambridge, UK

Hi Ming-

There appears to be a bug here. We'll have to wait a short while to fix it in the new release as the current release is closed to new fixes.

I do have a workaround for you however: if you set bUseSummarizeOverlaps=TRUE when calling dba.count(), this should work (although it won't support the fragmentSize=200 parameter).

I am a little curious about what you are trying to do, specifically why you are calling dba.count() two times, and passing in the result of the first call to the second call. If you want to count based on your predefined enhancers, you can pass them into the first call to dba.count(). Also, whenever you pass in pre-defined peaks, the minOverlap parameter is ignored as it only applies when a new set of consensus peaks are being derived.

Apologies for the inconvenience, I'll post here when we are able to check in a fix.

-Rory

ADD COMMENT • link 7.6 years ago Rory Stark ★ 5.2k

0

Entering edit mode

I was trying to reuse the precomputed `dba.count()` to save some time (I saved it as an r object, and later loaded it) .

The first `dba.count` is for the peaks called by MACS, the second `dba.count` is for the same samples but using pre-defined regions.

I later use dba.count directly with my enhancer regions. only thing I need to do is to modify the samplesheet .csv file to specify the enhancer peak format as bed, which initially I was lazy to do.

Thanks very much!

Tommy

ADD REPLY • link 7.6 years ago tangming2005 ▴ 190

1

Entering edit mode

There's no real benefit to "precomputing" (counting). When you call dba.count() with a different peakset, it re-does all of the counting from scratch.

ADD REPLY • link 7.6 years ago Rory Stark ★ 5.2k