Question

Counting Raw Reads with DiffBind

0

Entering edit mode

jess.schust • 0

@jessschust-9100

Last seen 8.5 years ago

United States

Hello,

I am trying to use diffbind on paired end read data. I have done the following to get our proper peaks of interest ( we have technical replicates:

myDBA=dba(sampleSheet="All_samplesheet.csv", attributes=c(DBA_ID,DBA_CONDITION,DBA_REPLICATE))

myDBA=dba.peakset(myDBA, consensus=-DBA_REPLICATE, minOverlap=3)

I then used Rory's instructions to extract the raw read counts (this is what I need for our next step):

myDBA <- dba.count(myDBA, peaks=NULL, score=DBA_SCORE_READS)

counts <- dba.peakset(myDBA, bRetrieve=TRUE)

However, when I look at the bam files (using IGV) there appear to be way more reads than are reported in the count matrix. How exactly are the reads counted? I have been using your raw read matrix to move forward in my analysis and I am concerned that I am using at the wrong counts.

Any advice would be greatly appreciated.

~Jessica

diffbind • 1.8k views

ADD COMMENT • link updated 8.5 years ago by Gord Brown ▴ 650 • written 8.5 years ago by jess.schust • 0

score 4 · Accepted Answer · 2015-11-04

Hi, Jessica,

Sorry for the slow response. Rory passed along the issue (and your data) to me. As far as I can tell the answer is that DiffBind by default removes reads that have a mapping quality below 15. Particularly near centromeres and telomeres, quite a few reads tend to fall into this category. We usually remove them prior to analysis, since we aren't confident that they're correctly mapped.

If you re-count with:

myDBA <- dba.count(myDBA, score=DBA_SCORE_READS, mapQCth=0, bRemoveDuplicates=FALSE)

you should get everything. Let me know if you still don't see the numbers you're expecting. Also, you can try adding the parameter "bUseSummarizedOverlaps=TRUE" which uses a slightly different counting algorithm.

Cheers,

- Gord