Hi there,
What is the recommendation on dealing with missing values when utilizing gcapc::refineSites
to adjust counts for GC effect?
The method is suitable for narrow epigenomic marks, and can be applied to "any peak algorithm based on coverage computed in bins".
For narrow marks, one would compute read enrichment in bins narrower than 250bp or so. In hg19, nearly 10% of 250bp bins have GC content outside of the default filtering range of gcapc::refineSites
(gcrange
argument, between 0.3 and 0.8). The resulting GC-adjusted counts of these sites would be NA
by default.
By excluding NA
cases, one could miss relevant binding sites. By setting gcrange=c(0,1)
, gcapc::refineSites
can be influenced by outlier effects in high and low GC-content regions, which can influence the detection of other binding sites.