Question

DiffBind---Normalization problem-spike in or no spike in beforehand?

0

Entering edit mode

skyyks123 • 0

@c8a61312

Last seen 13 months ago

Japan

Hi Rory Stark,

I am a little confused about the following questions regarding the use of DiffBind:

How does DiffBind handle normalization? Does it use the raw counts from BAM files? Are the total reads counts used for normalization those in peak regions? Also, when we choose DeSeq2 medthod, does it normalize the data as we analyze RNA-seq read counts, which require raw read counts as input?

Is it possible to use DiffBind without spike-in normalization?

Thanks in advance.

DiffBind • 904 views

ADD COMMENT • link updated 13 months ago by Rory Stark ★ 5.1k • written 14 months ago by skyyks123 • 0

score 1 · Answer 1 · 2023-02-24

1

Entering edit mode

Rory Stark ★ 5.1k

@rory-stark-5741

Last seen 7 days ago

Cambridge, UK

Yes, you can do normalization without spike-in reads, using the raw counts from the BAM files.

The default normalization method in DiffBind is the minimal, "first-do-no-harm" method of computing normalization factors based solely on the ratios of the total number raw reads in the supplied BAM files.

You can also use reads large background bins by calling dba.normalize() and setting background=TRUE.

You can chose to normalize based only on the reads overlapping peak regions by setting library=DBA_LIBSIZE_PEAKREADS (or library="RiP"), for Reads-in-Peaks). In general this is not advisable unless you are confident that most of the consensus peaks do not have big changes in binding affinity.

There is a detailed chapter in the DiffBind vignette that examines the different normalization options in detail. Also, the help page for dba.normalize() may be useful.

ADD COMMENT • link 14 months ago Rory Stark ★ 5.1k

0

Entering edit mode

Thanks for your quick reply!

Basing on your response, DiffBind normalization based on the total reads of a bam file by default, right? And the final results of DiffBind: Conc_Resistant and Conc_Responsive which as the vignette suggested were the log2 (mean normalized reads of each group). So, the mean normalized reads were reads from each peak region of each group? Am I correct?

Thanks for your reply!

ADD REPLY • link 14 months ago skyyks123 • 0

0

Entering edit mode

Yes I think you've got it correct. The default normalization only adjusts read counts by the relative number of reads in each bam file.

The Conc_ values take the overlapping read counts for each consensus peak for each sample in each sample group, adjust them by the normalization factors, and report the log2(mean()) values.

ADD REPLY • link 13 months ago Rory Stark ★ 5.1k