Hi all,
I'm currently analyzing data from the Illumina Infinium Mouse Methylation BeadChip and have encountered some inconsistencies between the annotation data (downloaded via Bioconductor or BioMart, reference strain C57BL/6J) and public genome databases (e.g., UCSC, Ensembl).
When I manually map differentially methylated CpG sites (dmCpGs) based on the chromosomal coordinates provided in the annotation file, they often do not correspond to the same genes or regions shown in UCSC or Ensembl (tested with both mm10/GRCm38 and mm39/GRCm39). In some cases, the CpG is annotated to a gene in the Illumina manifest, but the coordinate does not fall within or near that gene in the genome browsers.
In addition, I would like to add to my analysis the exact position of the dmCpG sites relative to promoter regions and transcription start sites (TSS), but the current annotation inconsistencies make this unreliable.
Questions:
- What genome build was used to generate the current annotation files?
- Are the CpG probe annotations based on custom gene models or lifted-over between assemblies?
- Is there an updated annotation file that better aligns with UCSC/Ensembl references?
- Are there recommended tools or workflows to reliably annotate CpG sites in terms of proximity to TSS or promoter regions?
Thanks in advance for your help!