Question

DiffBind with spike in

0

Entering edit mode

joselynn_wallace • 0

@9540571b

Last seen 11 months ago

United States

I am hoping to get a little more information about how to use DiffBind to deal with spike-in recalibration and the vignette is a little sparse on this topic. In section 7.6 there's the following code:


spikes <- dba.normalize(spikes, spikein = spikes.spikeins)

The spikes.spikeins is loaded as part of load(system.file('extra/spikes.rda',package='DiffBind'). The vignette states:

Note that precalculated background reads are included for the example in an object named spikes.spikeins, so we do not need to recount them for the vignette; we can pass the pre-calculated ones in instead. Normally, with access to thes pike-in reads, setting spikein=TRUE will result in the spike-in reads being counted.

I am wondering if we can get a little more code describing how spikes.spikeins was made -- dba.counts maybe?

DiffBind • 1.6k views

ADD COMMENT • link updated 3.7 years ago by Rory Stark ★ 5.2k • written 3.7 years ago by joselynn_wallace • 0

score 0 · Answer 1 · 2021-03-16

Spike-ins are part of normalization and are calculated in the dba.normalize() function. The help page for dba.normalize() has some documentation for how to use the spikein parameter.

The primary way to include spike-in reads is by including a separate set of aligned bam files in your sample sheet (using a column named Spikein). If spikein=TRUE, the total number of aligned reads in these tracks will be used to calculate normalization factors.

If your spike-in reads are included in the main (ChIP/ATAC) bam files, but fall on a distinct set of chromosomes (ie if you aligned to a hybrid reference genome), you don't need to add Spikein bam files to the sample sheet; you just set spikein to the chromosome names with the spike-in reads and the total number of reads on these chromosomes in the main bam files will be used to calculate normalization factors.

You can also limit the spike-in counts to pre-defined intervals in either the primary or Spikein bam files by setting spikein= to a GRanges object containing known intervals.

If you want to see the code that generates the example spikes objects, you can access it within the package:

file.edit(system.file('extra/GenerateSpikein.R',package='DiffBind'))

This script assumes that the BrundleData package is installed in a subdirectory called holding.

If you have a more specific spike-in scenario I can suggest how to include them.