I want to use DESeq2 for differential peak detection for ChIP-seq data.
I know there are many methods implemented for this purpose https://github.com/crazyhottommy/ChIP-seq-analysis#differential-peak-detection
Diffbind internally uses DESeq2 and EdgR, but I want to take the other way:
Say I have untreat and treat group for my ChIP-seq data, each with three replicates.
for ChIP-seq, we usually include a input control, which is just genomic DNA without pulling down with a specific antibody.
I call peaks with MACS2 for all of them, and merge all the peaks into a super-set. https://github.com/crazyhottommy/ChIP-seq-analysis/blob/master/part1_peak_calling.md#merge-peaks
Now, I can count from the bam files for each replicate and input, how many reads in each peak. https://github.com/crazyhottommy/ChIP-seq-analysis/blob/master/part2_Preparing-ChIP-seq-count-table.md
A final count table will look like this:
lib size 10million 10million 10 million 20 million 12 million 12 million 12 million 24 million
untreat_1 untreat_2 untreat_3 untreat_input_control treat_1 treat_2 treat_3 treat_input_control
peak_1 20 25 30 3 10 12 13 5
peak_2 10 28 13 8 64 43 45 7
How should I use this raw count table with DESeq2?
I should subtract input_control raw counts from the IP counts ( but I will need to normalize to the lib size first?),
but then I will get decimals which violates the DESeq2 assumption https://www.biostars.org/p/143458/#157303