Question

How does DiffBind deal with single-end and paired-end data together

0

Entering edit mode

Xiaojie Cheng • 0

@user-24527

Last seen 3.5 years ago

Hongkou

Hello, Recently I'm trying to perform differential binding analysis of ChIP-seq data using DiffBind. I have two group samples, which are from single-end data and paired-end data respectively, and I want to detect differential peaks between the two groups. To do that, firstly I got the dba object by adding all the samples and then run the dba.count() function. However, I found that all FRiP values for single-end data were zero since reads were automatically detected and counted as paired-end (paired-end samples come first in the sampleSheet). Then I reordered the sampleSheet with single-end samples listed first. This time counts were right for S.E but double for P.E, which might affect the outcome.

I also tried to run dba.count() for P.E and S.E groups separately and then combined them using dba.peakset function, but I got error when performed dba.analyze(). So, can DiffBind deal with single-end and paired-end data together? And what should I do if it can? Thank you very much!

Best, Xiaojie Cheng

DiffBind ssss • 1.5k views

ADD COMMENT • link 3.9 years ago Xiaojie Cheng • 0

score 1 · Answer 1 · 2021-01-14

Indeed, DiffBind does not currently support experimental which mix paired-end and single-end sequencing data. It's on the feature list for a future release.

You are on the right track for a workaround, which is to count them separately and them combine them. There are a number of issues to ensure everything matches up. Here is a script using the vignette data that divided the samples into two experiments, counts them, and combines them:

tamoxifen <- dba(sampleSheet="tamoxifen.csv")

tam <- dba.count(tamoxifen, filter=0, score=DBA_SCORE_READS) # Count ALL consensus peaks (re-centered around summits)
conspeaks <- dba.peakset(tam, bRetrieve=TRUE)  #Extract consensus peaks

tam1 <- dba(tamoxifen, mask=1:5)  #Separate first subgroup and count without summits or filter
tam1 <- dba.count(tam1, peaks=conspeaks, summits=FALSE, filter=0, score=DBA_SCORE_READS)

tam2 <- dba(tamoxifen, mask=6:11) #Separate second subgroup and count without summits or filter
tam2 <- dba.count(tam2, peaks=conspeaks, summits=FALSE, filter=0, score=DBA_SCORE_READS)

tamoxifen.counts <- dba.peakset(tam1, peaks=tam2) #Combine separate subgroups
tamoxifen.counts <- dba.count(tamoxifen.counts, peaks=NULL, filter=1, score=DBA_SCORE_NORMALIZED) #Apply filter

dba.analyze(tamoxifen.counts, bBlacklist=FALSE, bGreylist=FALSE) # Analyze

(NB: from the next update, DiffBind_3.0.11, you can remove all the score= parameters).

score 0 · Answer 2 · 2021-01-18

Thank you very much for your help! Following your script, I got the correct read counts of both sing-end and paired-end data and run dba.analyze() successfully. I failed because I didn't extract consensus peak of them firstly before I count them separately. And it is great to know that DiffBind will support this situation for the future release.