Hi,
I'm trying to extract the consensus peakset that DiffBind constructs, with the raw counts in these regions. I have done this two different ways:
1) After running dba.count()
, dba.contrast()
and dba.analyze()
with default settings:
counts_1 <- dba.report(DBA, th=1, bCounts = TRUE, bNormalized = FALSE, DataType = DBA_DATA_FRAME)
2) (I had to run dba.count()
normally first, otherwise the second line of code did not run)
DBA <- dba.count(DBA)
DBA <- dba.count(DBA, peaks=NULL, score=DBA_SCORE_READS)
counts_2 <- dba.peakset(DBA, bRetrieve = TRUE, DataType = DBA_DATA_FRAME)
These two methods both yield an object with 104976 consensus peak regions. However, some of these contain 1s in every column (so there are no reads in any of my samples). My first question is, why is this? If these regions have no reads in any sample, why do they make it into the consensus peak set?
My second question is: The number of these regions without reads in different with the two methods. With method 1, I find 11476 such regions; with method 2, there are 9686, even though the total number of regions is the same. What causes this discrepancy, and what is the right way to extract raw read counts in consensus peaks?
Thank you in advance for your answer!
Hi Rory,
Thank you for your answer! I used
bUseSummarizeOverlaps=TRUE
, and that did get rid of the empty rows. I think the problem was that my reads are paired-end. Please could you explain why the default counting method undercounts overlapping regions if the reads are paired-end? And also, how can I set this inDBA$config
? (Apparently the default issingleEnd = TRUE
, but I couldn't figure out how to change this).Thank you very much for your help!
Set
before calling
Hi Rory,
Thank you for your answer. I have done some comparisons, and I have found that I get far fewer differentially accessible regions when I set
DBA$config$singleEnd = FALSE
than when I don't set this (I usebUseSummarizeOverlaps=TRUE
in both cases). Can you explain why that would be the case? And what would you recommend for paired-end data - what is the benefit of settingDBA$config$singleEnd = FALSE
?Thank you very much!
There could be a number of reasons for this. It could be that in paired-end mode the reads are overlapping multiple features and not being counted. It would be interesting to look at the difference in the read count matrices when
singleEnd
isFALSE
vs when it isTRUE
. If you were to send me DBA objects with the counts done in each way I could have a look.Hi Rory,
Thank you for getting back to me. I'm not sure what the best way to get these objects to you is (I don't think I can upload them here), so I'm emailing them to you. I hope that is okay.
Thank you for your help with this! Best wishes, Hasse
If the bam files contain paired-end reads, and you set
bUseSummarizeOverlaps=TRUE
, then you need to make sure thatsingleEnd=FALSE
. If not, each end of the read will be counted separately, so you'll get twice as many counts as you should (each fragment will be counted twice). The counts won't come out to exactly double when `singleEnd=TRUE, as in some cases each end will overlap a different peak (these would be discarded in the paired-end case as they can not be uniquely assigned to a single peak interval).