I have a question regarding the MEDIPS.CpGenrich function.
I have done a MeDIP-seq experiment. I go many more reads back from sequencing than expected. As an example, after filtering in Galaxy a total of 58,712,585 first mate reads are imported to MEDIPS for one of the samples.
With this deep level of sequencing I have reads aligned to almost the entire genome (since MeDIP only enrich for the methylated fraction of the genome). I can, however, see that the number of reads across eg. CpG Islands drops as expected.
My question is, how the MEDIPS.CpGenrich function counts the C's, G's and CpG's. From the supplementary Methods for "Computational analysis of genome-wide DNA-methylation during the differentiation of human embryonic stem cells along the endodermal lineage", Chavez et al., Genome Research 2010, I read that:
"the CpG enrichment approach examines how strong the genomic regions underlying the obtained short reads are enriched for CpGs compared to the frequency of CpGs present in the refernce genome".
... As I understand this, it means that the CpG's in the genomic region underlying the obtained short reads are counted, and the number of reads covering a given region is not taken into account!? So, if I have reads covering almost the entire genome, I will get a low or no enrichment? Or will regions with many reads weight more in the calculation?