How does DiffBind deal with single-end and paired-end data together
Entering edit mode
1831513 • 0
Last seen 3 hours ago

Hello, Recently I'm trying to perform differential binding analysis of ChIP-seq data using DiffBind. I have two group samples, which are from single-end data and paired-end data respectively, and I want to detect differential peaks between the two groups. To do that, firstly I got the dba object by adding all the samples and then run the dba.count() function. However, I found that all FRiP values for single-end data were zero since reads were automatically detected and counted as paired-end (paired-end samples come first in the sampleSheet). Then I reordered the sampleSheet with single-end samples listed first. This time counts were right for S.E but double for P.E, which might affect the outcome.

I also tried to run dba.count() for P.E and S.E groups separately and then combined them using dba.peakset function, but I got error when performed dba.analyze(). So, can DiffBind deal with single-end and paired-end data together? And what should I do if it can? Thank you very much!

Best, Xiaojie Cheng

DiffBind ssss • 50 views
Entering edit mode
Rory Stark ♦ 3.5k
Last seen 3 days ago
CRUK, Cambridge, UK

Indeed, DiffBind does not currently support experimental which mix paired-end and single-end sequencing data. It's on the feature list for a future release.

You are on the right track for a workaround, which is to count them separately and them combine them. There are a number of issues to ensure everything matches up. Here is a script using the vignette data that divided the samples into two experiments, counts them, and combines them:

tamoxifen <- dba(sampleSheet="tamoxifen.csv")

tam <- dba.count(tamoxifen, filter=0, score=DBA_SCORE_READS) # Count ALL consensus peaks (re-centered around summits)
conspeaks <- dba.peakset(tam, bRetrieve=TRUE)  #Extract consensus peaks

tam1 <- dba(tamoxifen, mask=1:5)  #Separate first subgroup and count without summits or filter
tam1 <- dba.count(tam1, peaks=conspeaks, summits=FALSE, filter=0, score=DBA_SCORE_READS)

tam2 <- dba(tamoxifen, mask=6:11) #Separate second subgroup and count without summits or filter
tam2 <- dba.count(tam2, peaks=conspeaks, summits=FALSE, filter=0, score=DBA_SCORE_READS)

tamoxifen.counts <- dba.peakset(tam1, peaks=tam2) #Combine separate subgroups
tamoxifen.counts <- dba.count(tamoxifen.counts, peaks=NULL, filter=1, score=DBA_SCORE_NORMALIZED) #Apply filter

dba.analyze(tamoxifen.counts, bBlacklist=FALSE, bGreylist=FALSE) # Analyze

(NB: from the next update, DiffBind_3.0.11, you can remove all the score= parameters).


Login before adding your answer.

Similar Posts
Loading Similar Posts
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.3