DiffBind---Normalization problem-spike in or no spike in beforehand?
1
0
Entering edit mode
skyyks123 • 0
@c8a61312
Last seen 13 months ago
Japan

Hi Rory Stark,

I am a little confused about the following questions regarding the use of DiffBind:

How does DiffBind handle normalization? Does it use the raw counts from BAM files? Are the total reads counts used for normalization those in peak regions? Also, when we choose DeSeq2 medthod, does it normalize the data as we analyze RNA-seq read counts, which require raw read counts as input?

Is it possible to use DiffBind without spike-in normalization?

Thanks in advance.

DiffBind • 904 views
ADD COMMENT
1
Entering edit mode
Rory Stark ★ 5.1k
@rory-stark-5741
Last seen 7 days ago
Cambridge, UK

Yes, you can do normalization without spike-in reads, using the raw counts from the BAM files.

The default normalization method in DiffBind is the minimal, "first-do-no-harm" method of computing normalization factors based solely on the ratios of the total number raw reads in the supplied BAM files.

You can also use reads large background bins by calling dba.normalize() and setting background=TRUE.

You can chose to normalize based only on the reads overlapping peak regions by setting library=DBA_LIBSIZE_PEAKREADS (or library="RiP"), for Reads-in-Peaks). In general this is not advisable unless you are confident that most of the consensus peaks do not have big changes in binding affinity.

There is a detailed chapter in the DiffBind vignette that examines the different normalization options in detail. Also, the help page for dba.normalize() may be useful.

ADD COMMENT
0
Entering edit mode

Thanks for your quick reply!

Basing on your response, DiffBind normalization based on the total reads of a bam file by default, right? And the final results of DiffBind: Conc_Resistant and Conc_Responsive which as the vignette suggested were the log2 (mean normalized reads of each group). So, the mean normalized reads were reads from each peak region of each group? Am I correct?

Thanks for your reply!

ADD REPLY
0
Entering edit mode

Yes I think you've got it correct. The default normalization only adjusts read counts by the relative number of reads in each bam file.

The Conc_ values take the overlapping read counts for each consensus peak for each sample in each sample group, adjust them by the normalization factors, and report the log2(mean()) values.

ADD REPLY

Login before adding your answer.

Traffic: 735 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6