Question

Clarification about the functionality of dba.count and dba.analyze

0

Entering edit mode

JKUMAR12 • 0

@jkumar12-8658

Last seen 8.7 years ago

United States

I am trying to get some clarity on dba.count and dba.analyze.

1) For dba.count, does the function look at overlapping peaks and instead of using the score assigned by the peak caller (which I understand dba() does), it uses the number of reads found at that peak for a sample and calculates how well the reads correlate with that of the same peak in another sample? Or does it just look overall at reads throughout the entire genome regardless of where peaks are called?

2) When you set up the contrasts and then execute dba.analyze, are peaks of each sample in each contrast group pooled together to do the diff analysis?

Thank you!
Jaya

diffbind • 1.2k views

ADD COMMENT • link updated 8.7 years ago by Rory Stark ★ 5.1k • written 8.7 years ago by JKUMAR12 • 0

score 1 · Accepted Answer · 2015-08-20

Hello Jaya-

1. dba.count() counts reads for all consensus peaks for all samples. Using the default overlapping method of making a consensus peakset, it will look in regions identified as peaks in at least two samples, but count the reads in those regions for every sample, whether or not the peak was identified for that sample.

2. When dba.analyze() is invoked using either edgeR or DESeq/DESeq2, the replicate samples in each group aren't really pooled. Instead they are used to determine how well the samples within the group agree. Groups whose replicate samples have lower variance will result in better confidence scores (lower p-values and FDR).

Hope this helps-

Rory