Hi Rory Stark,
I am a little confused about the following questions regarding the use of DiffBind:
How does DiffBind handle normalization? Does it use the raw counts from BAM files? Are the total reads counts used for normalization those in peak regions? Also, when we choose DeSeq2 medthod, does it normalize the data as we analyze RNA-seq read counts, which require raw read counts as input?
Is it possible to use DiffBind without spike-in normalization?
Thanks in advance.
Thanks for your quick reply!
Basing on your response, DiffBind normalization based on the total reads of a bam file by default, right? And the final results of DiffBind: Conc_Resistant and Conc_Responsive which as the vignette suggested were the log2 (mean normalized reads of each group). So, the mean normalized reads were reads from each peak region of each group? Am I correct?
Thanks for your reply!
Yes I think you've got it correct. The default normalization only adjusts read counts by the relative number of reads in each bam file.
The Conc_ values take the overlapping read counts for each consensus peak for each sample in each sample group, adjust them by the normalization factors, and report the
log2(mean())
values.