How does DiffBind deal with single-end and paired-end data together
2
0
Entering edit mode
@user-24527
Last seen 3.5 years ago
Hongkou

Hello, Recently I'm trying to perform differential binding analysis of ChIP-seq data using DiffBind. I have two group samples, which are from single-end data and paired-end data respectively, and I want to detect differential peaks between the two groups. To do that, firstly I got the dba object by adding all the samples and then run the dba.count() function. However, I found that all FRiP values for single-end data were zero since reads were automatically detected and counted as paired-end (paired-end samples come first in the sampleSheet). Then I reordered the sampleSheet with single-end samples listed first. This time counts were right for S.E but double for P.E, which might affect the outcome.

I also tried to run dba.count() for P.E and S.E groups separately and then combined them using dba.peakset function, but I got error when performed dba.analyze(). So, can DiffBind deal with single-end and paired-end data together? And what should I do if it can? Thank you very much!

Best, Xiaojie Cheng

DiffBind ssss • 1.5k views
ADD COMMENT
1
Entering edit mode
Rory Stark ★ 5.2k
@rory-stark-5741
Last seen 5 weeks ago
Cambridge, UK

Indeed, DiffBind does not currently support experimental which mix paired-end and single-end sequencing data. It's on the feature list for a future release.

You are on the right track for a workaround, which is to count them separately and them combine them. There are a number of issues to ensure everything matches up. Here is a script using the vignette data that divided the samples into two experiments, counts them, and combines them:

tamoxifen <- dba(sampleSheet="tamoxifen.csv")

tam <- dba.count(tamoxifen, filter=0, score=DBA_SCORE_READS) # Count ALL consensus peaks (re-centered around summits)
conspeaks <- dba.peakset(tam, bRetrieve=TRUE)  #Extract consensus peaks

tam1 <- dba(tamoxifen, mask=1:5)  #Separate first subgroup and count without summits or filter
tam1 <- dba.count(tam1, peaks=conspeaks, summits=FALSE, filter=0, score=DBA_SCORE_READS)

tam2 <- dba(tamoxifen, mask=6:11) #Separate second subgroup and count without summits or filter
tam2 <- dba.count(tam2, peaks=conspeaks, summits=FALSE, filter=0, score=DBA_SCORE_READS)

tamoxifen.counts <- dba.peakset(tam1, peaks=tam2) #Combine separate subgroups
tamoxifen.counts <- dba.count(tamoxifen.counts, peaks=NULL, filter=1, score=DBA_SCORE_NORMALIZED) #Apply filter

dba.analyze(tamoxifen.counts, bBlacklist=FALSE, bGreylist=FALSE) # Analyze

(NB: from the next update, DiffBind_3.0.11, you can remove all the score= parameters).

ADD COMMENT
0
Entering edit mode
@user-24527
Last seen 3.5 years ago
Hongkou

Thank you very much for your help! Following your script, I got the correct read counts of both sing-end and paired-end data and run dba.analyze() successfully. I failed because I didn't extract consensus peak of them firstly before I count them separately. And it is great to know that DiffBind will support this situation for the future release.

ADD COMMENT

Login before adding your answer.

Traffic: 531 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6