I am using Ensembl gene model for rat. I want to count reads that are mapped to CpG islands which overlaps with the regions flanked by geneStart-2kbase and geneStop + 2kbase. I want to avoid false read count by reads mapped to non-methylated areas, therefore I want count reads overlapping CpG islands only. These are main steps I am thinking about:
- Get genomic ranges for CpG islands.
- Count reads overlapping CpG islands using `htseq` or `Rsubread`
- Load gene model and extend genes ranges with 2kb up and downstream.
- Find overlapping CpG islands with the extended gene ranges
- Sum read counts of CpG island groupby geneNames.
What are packages and methods for each of these steps? Any suggestion about alternative workflow is also very welcom.