edgeR itself can incorporate alternative normalization schemes fairly easily; the real question is whether the assumptions behind the spike-in process are applicable. IIRC, there are two major assumptions:
- The spike-in antibody (usually against some Drosophila histone mark) is subject to the same technical biases as the actual antibody against your desired target.
- Your spike-in addition is sufficiently accurate so that the ratio of the concentrations of spike-in chromatin to your actual chromatin of interest is constant across samples.
To make it all work, you can align reads to a combined genome containing both your human and spike-in reference sequence. Then it's a simple matter of:
- Identifying enriched regions in the combined genome. The safest way to do so is to pool reads from all samples together for a single round of peak calling.
- Creating the usual
DGEList where each row corresponds to an enriched region in the combined genome.
- Subsetting the
DGEList to your regions from the spike-in genome (do not set
keep.lib.sizes=FALSE!) and run
- Transfer the normalization factors from the subset back to the full
Steps 3 and 4 would look something like this, assuming your
DGEList is named
y and you have a
is.spike.in <- as.logical(seqnames(locations) %in% c("I", "II", "III")) # I dunno, whatever the spike-in chromosome names are.
ysub <- y[is.spike.in,]
ysub <- calcNormFactors(ysub)
y$samples$norm.factors <- ysub$samples$norm.factors
Here, the TMM step assumes that any difference in the spike-in coverage is technical and should be removed. The transfer of the normalization factors back to
y further assumes that the biases affecting the spike-in chromatin are also applicable to the actual test chromatin.
And that's it. After that, it's just the usual edgeR workflow. Personally I always felt that these assumptions were pretty sketchy, and I would prefer to use the binning approach (see Section 4.1 here for some background). But to each their own.
I'll also add that just adding in yeast DNA is not really all that informative. The main appeal of spike-ins is to capture differences in immunoprecipitation efficiency across samples. If you're just throwing in yeast DNA without an antibody against it, you don't get that information; at that point, you might as well save yourself the trouble and use TMM on the bins, especially given that your TF probably isn't binding enough of the genome to compromise the accuracy of the binning approach.