Hello all,
I am looking for advice on the best way to count the number of reads in each peak of a bed file on a
sample-by-sample basis, rather than from a consensus peak set. In other words, I have a peak file and bam file
for sample 1 and I want to be able to count reads in each peak, normalize by rpkm or rpm, and then export the peak table
with counts.
I understand how to do this using dba.count, but I would like to do this on a case-by-case basis, so that I
can look at the distribution of read counts for each sample in my set (10 total samples) using only the peaks
called in that sample, prior to generating and counting from the consensus set.
Is there a way to do this using either ChIPQC or DiffBind? I really like these two packages and will definitely
use them for downstream analyses on the same samples, so thought I would start there. It seems like one option is
to create a ChIPQC sample object for each sample and retrieve the peaks. However, I am not clear what the "Counts"
vector is in this object. I think it is total reads in the peak? rather than normalized to library size? I guess
I could also retrieve the total number of reads, construct a new data frame and do the calcs in R without that much
extra work.
Any help is greatly appreciated!
Hi Rory,
Thanks for the quick reply. I'll go the Bioconductor route.