I have a question about interpreting my cross-correlation plots from native ChIP-seq data with 100bp SE reads aligned uniquely to the genome using bowtie2. The bam files have duplicates marked and removed for this step.
To estimate fragment length I used
correlateReads() from the csaw package then plotted a very different graph than the smooth one showed in the user guide... I still see a prominent peak occurring at my expected fragment size of 147 bp for histone data, but it's hard to interpret what the other peaks are. I don't have a lot of data-sets to test this on, but how critical is the smoothness of this plot? From the commentary and other readings I understand that a cross correlation plot will have a second peak at the read-length if the efficiency was poor or duplicates not removed, but there is no mention about jaggedness. Based on the browser tracks and peak-calling results I always assumed my efficiency of pull down was very good, as the data is very clean and clear.
At first I thought the shape may be due to read-trimming and processing, but raw and processed .bam files give the same result.
Any help with interpretation is appreciated.
Cross correlation plot:
library(rtracklayer) ch <- import.chain("~/ChIPdata/reference/ChainFiles/mm9ToMm10.over.chain") original <- import("~/ChIPdata/reference/ChainFiles/mm9-blacklist.bed") blacklist <- liftOver(x=original, chain=ch) blacklist <- unlist(blacklist) library(csaw) x <- correlateReads(bamfile, max.dist=500, param=readParam(minq=20, discard=blacklist, dedup=T)) plot(1:length(x)-1, x, xlab="Delay (bp)", ylab="CCF", type="l")