Question: Handing duplicates in DiffBind
gravatar for loretta
8 months ago by
loretta0 wrote:

Hi Rory,

I had been previously, not marking duplicates or removing them in my ChIP-Seq datasets, and instead setting the dba.count bRemoveDuplicates to TRUE. Firstly, how adverse is this strategy when determining differential peaks? And secondly, how does setting bUseSummarizeOverlaps to TRUEcompare to bRemoveDuplicates?


ADD COMMENTlink modified 8 months ago by Rory Stark2.9k • written 8 months ago by loretta0
Answer: Handing duplicates in DiffBind
gravatar for Rory Stark
8 months ago by
Rory Stark2.9k
CRUK, Cambridge, UK
Rory Stark2.9k wrote:

If you want to remove duplicates, you need to mark duplicates before running dba.count(), whatever bUseSummarizeOverlaps is set to. If duplicated are not marked, even if you set bRemoveDuplicates=TRUE, no duplicates will be identified.

However for differential analysis, we strongly recommend not removing duplicates. In a well-prepared ChIP-seq experiment, most of the duplicate reads will be "true"duplicates indicating high levels of enrichment. The degree to which this is true will depend on how the sequencing is done (single-end vs paired-end, read length, number of reads). If you remove duplicates, you are clipping the signal, so you might be unable to detect, for example, a difference between one sample group where 30% of the DNA is bound at a particular interval and one where 90% of the DNA is bound. It also helps to use blacklists and greylists as many problematic duplicates are located at the blacklisted intervals.

If your ChIP reads have a high proportion of duplicates (say, greater than 50%), there may be issues with the ChIP, leaving more artifactual duplicates, which you may be better off removing (after marking them in the BAM).

ADD COMMENTlink written 8 months ago by Rory Stark2.9k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 305 users visited in the last hour