Question

Downsampling ChIP-seq BAM files with spike in normalization factor before feeding into Diffbind

0

Entering edit mode

bioinfouser2 ▴ 10

@bioinfouser2-15147

Last seen 4.2 years ago

I am planning to use exogenous chromatin as a spike in control with my actual sample from mouse to perform ChIP-seq for peak calling and differential binding analysis for histone modifications. this involves down-sampling the uniquely mapped read files to the calculated normalization factor from the spike in. This is generally helpful for peak calling and visualizing in IGV browser. But I have not found any reference on whether it is considered in differential binding analysis. Also, nothing is mentioned in the Diffbind vignette. Could you please explain how this strategy might affect using the diffbind package?

I greatly appreciate your time to read and answer to my question! Thank you in advance!

diffbind chipseq chipqc • 3.2k views

ADD COMMENT • link updated 4.1 years ago by Rory Stark ★ 5.2k • written 6.8 years ago by bioinfouser2 ▴ 10

score 1 · Answer 1 · 2018-04-12

In you downsampled the reads for visualization and peak calling, you should use the down-sampled files for differential binding analysis. Normalization is even more important for differential analysis than for peak calling etc. Be aware that DiffBind will still normalize the reads to the relative number of reads in each library, so this may negate some of what you are trying to do. You can bypass DiffBind's normalization by setting the library sizes to appear all the same, as discussed in this thread: MAnorm output as DiffBind input..

I'm not exactly which normalization procedure you are using. We have looked at some methods where we use the control reads to calculate normalization factors rather than actually downsampling reads (hate to throw away good data!). See for example this paper:

Novel Quantitative ChIP-seq Methods Measure Absolute Fold-Change in ER Binding Upon Fulvestrant Treatment

a version of which is to be published in NAR soon.

score 0 · Answer 2 · 2020-11-06

Direct support for exogenous spike-in normalization is now available in the latest release of DiffBind. The vignette has an example using Drosophila chromatin. You can supply the spike-in alignments in the sample sheet as separate bam files, or specify a specific set of chromosomes if you used a single combined reference. The reads are counted, and then you can these to calculate normalization factors (either by the total number of aligned reads, or using TMM or RLE). So long as the spike-ins were quantified correctly in the original experiment. these methods work fine. There is also a clean interface to supply externally calculated normalization factors, or a matrix of offsets, if you want to use one of the more complicated modelling methods for dealing with spike-in data.

In general, unless there are severe imbalances, downsampling and altering read counts, especially of ChIP or ATAC data, should be avoided (downsampling Input controls for peak calling may be ok; some peak callers will scale background controls for you). The normalization factors should deal with quite a bit of variation in library sizes and ChIP efficiencies.