CUT&Tag differential peak analysis using Diffbind, with pre-calculated normalization factor.
Entering edit mode
mwong043 • 0
Last seen 4 months ago

Hi. I would like to perform differential peak analysis for my CUT&Tag data. I converted the bam files of the samples to bedgraph files. I then normalized the signal of the bedbraph files using a normalization factor calculated based on the E. coli bacterial read count. Since my Tn5 enzyme was extracted in E coli, the bacterial read count can therefore serve as an internal spike in control. I then use SEACR, which is the recommended peak caller for CUT&Tag experiments to call peaks for my samples.

For the next step I would like to perform differential peak analysis using Diffbind. But my question is how to normalize the samples using pre-calculated normalizing factor? In the Reference Manual it mentioned that users can supply normalization factors by using the DBA_NORM_USER command, how exactly this can be done?

For example, I have 6 samples in 2 groups, with the following normalization factor : Group 1: 0.85, 0.96, 1.2,

Group 2: 1.1, 0.9, 0.8

Thanks. Matthew

DiffBind • 434 views
Entering edit mode
Rory Stark ★ 4.1k
Last seen 2 hours ago
CRUK, Cambridge, UK

You can supply your own normalization factors by calling the the dba.normalize() function after running dba.count(). The normalize parameter should be a vector of the same length as the number of samples. Larger values should correspond to samples with greater numbers of E. coli reads.

Alternatively, in addition to the SEACR peaks and CUT&Tag bam files, you can supply the E. coli reads to DiffBind and have it re-compute the normalization factors. These can be specified as a separate E. coli-aligned bam files using the Spikein column of the sample sheet, then set spikein=TRUE (instead of using the normalize parameter). If your E. coli reads are included in the same bam files as your CUT&Tag reads, instead of using specifying Spikein files, set spikein to a vector of the E. coli chromosome names.

Entering edit mode

Thanks Rory. I tried the first method by defining a vector containing the normalisation factors for my samples and it worked.

I am just wondering how to manually define the normalisation factors for the IgG control samples as well?

Thanks for your help.


Entering edit mode

If the IgG reads are included as control tracks, they are not normalized. In this case they would usually be used to generate greylists (and/or have their reads subtracted from the primary CUT&Tag samples). In the subtraction case, read counts to be subtracted are scaled based on the relative library sizes of the primary and controls samples.

Generally IgG controls are not explicitly included as primary samples in the model; if they are, the would probably use the same normalization method as the CUT&Tag samples.

(I saw a message go by regarding a problem providing a vector of normalization values, but that message seems to have disappeared, so let me know if you are still having an issue with that).


Login before adding your answer.

Traffic: 398 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6