ATAC-seq: Sample size when pooling sample reads to call peaks
I am trying to perform differential ATAC-seq on two conditions with 22 and 5 samples. My plan is to call consensus peaks by merging all the samples reads, then counting the 5' of each sample in each peak and performing DeSeq2 differential binding. However, when it comes to merging all the samples reads, I am considering downsampling the samples reads to the number of reads in the lowest count sample before merging so that samples with higher number of reads don't dominate the peak calling. I am wondering if this is what people are doing?

I have been reading the following thread:

Differential binding of ATAC-seq data with no replicates (Differential binding of ATAC-seq data with no replicates)

And the paper "De novo detection of differentially bound regions for ChIP-seq data using peaks and windows: controlling error rates correctly": 

"This paper provides some recommendations to maintain error control for de novo counting strategies. For peak-based methods, peak calling should be performed on pooled libraries to avoid the loss of type I error control from data snooping."



