I've successfully run csaw and identified windows with significantly different ChIP-seq enrichment across my experimental conditions. Instead of producing the genome browser-like plots outlined in the user manual, I would simply like to export the total (or ideally, mean per base) number of reads, scaled by the normalization factors, for each input sample so that I can represent the signal over differential windows as a heatmap. Perhaps unsurprisingly, simply writing out a BED file of the significant window coordinates, then running BEDtools to compute coverage over those windows, then scaling by sequencing depth isn't necessarily producing the same answer. In other words, I had thousands of windows that csaw called as having increased enrichment, but my manual scaling of the read counts by sequencing depth over those windows actually shows the opposite trend. Perhaps instead of manually scaling by sequencing depth like I've been doing, I could simply scale by the csaw scaling factors, but if there were a way to directly export the normalized counts exactly as csaw saw them, that would save a few steps.
Here's the code that I've been running:
bam.files <- c("x1.bam","x2.bam","y1.bam","y2.bam") design <- model.matrix(~factor(c('x', 'x', 'y', 'y'))) colnames(design) <- c("intercept", "cell.type") # Load in data from BAM files. param <- readParam(minq=50) data <- windowCounts(bam.files, ext=250, width=150, param=param) # Filter based on union set of peak calls from MACS2 peaks=import("unionPeaks.bed",format="bed") suppressWarnings(keep <- overlapsAny(rowRanges(data), peaks)) filtered.data <- data[keep,] # Calculate normalization factors. binned <- windowCounts(bam.files, bin=TRUE, width=10000, param=param) normfacs <- normOffsets(binned) filtered.data$norm.factors <- normfacs # Identify DB windows. y <- asDGEList(filtered.data, norm.factors=normfacs) y <- estimateDisp(y, design) fit <- glmQLFit(y, design, robust=TRUE) results <- glmQLFTest(fit) rowData(filtered.data) <- cbind(rowData(filtered.data), results$table) # Correct for multiple testing. merged <- mergeWindows(rowRanges(filtered.data), tol=1000L,max.width=2000L) tabcom <- combineTests(merged$id, results$table) # Write out results is.sig <- tabcom$FDR <= 0.05 tabbest <- getBestTest(merged$id, results$table) out.ranges <- merged$region elementMetadata(out.ranges) <- data.frame(tabcom,best.pos=mid(ranges(rowRanges(filtered.data[tabbest$best]))),best.logFC=tabbest$logFC) simplified <- out.ranges[is.sig] simplified$score <- -10*log10(simplified$FDR) export(con="csaw_results.bed", object=simplified) is.sig.pos <- (tabbest$logFC > 0)[is.sig] print(summary(is.sig.pos))