EPIC array hg38 annotation?
Entering edit mode
mycetes • 0
Last seen 3 days ago

Hi! I am currently performing a DMP+DMR analysis for EPIC-array methylation data using minfi and DMRcate. As i wish to compare results from the methylation to my results for a matching expression dataset which has been aligned to hg38, i naturally need to map my probes to their respective hg38 coordinates.

After doing some research, i found that no annotation packages for hg38 exist for minfi. And while there exists one external package for the b5 annotation with hg38 coordinates for probes on the array, it still uses the hg19 coordinates as base for the CpGs. As a consequence of this, all my results are mapped to hg19. For CpGs, remapping their coordinates is simple. However, for DMRs this process becomes a lot more tedious and risky (seeing how one must first identify the overlapping CpGs for the region, and then update the coordinates of the region with respect to the new coordinates of the CpGs).

As manifest files for the EPIC hg38 mappings exist, is there any way to implement a custom annotation for use with minfi, or is the package strictly locked to the "IlluminaHumanMethylationEPICanno" annotation packages? I have of course looked into DMRcate's "extractRanges" function where the reference genome to align DMRs to can be specified. However, they clearly state that no liftover for probes is performed, and i do not know what their "reference" annotation for the array is when supplying a beta-matrix rather then a GenomicRatio set.

Any help regarding this matter would be highly appreciated.

minfi DMRcate • 222 views
Entering edit mode
Tim Peters ▴ 160
Last seen 23 days ago

Hi there,

If you supply a beta matrix to cpg.annotate(), then yes, it will automatically annotate to hg19: https://github.com/timpeters82/DMRcate-devel/blob/master/R/cpg.annotate.R#L18-L22. Like you say, no hg38 annotation package for EPICv1 exists in Bioconductor (the EPICv2 uses hg38, but that's another story).

What you can do is lift over the CpG coordinates themselves to hg38 prior to calling DMRs. Here is an (admittedly unwieldy) way of doing it. First lift over the annotation itself:


EPIClocs <- IlluminaHumanMethylationEPICanno.ilm10b4.hg19::Locations
EPICGRhg19 <- GRanges(paste(EPIClocs$chr, EPIClocs$pos, sep=":"))
names(EPICGRhg19) <- rownames(EPIClocs)

genome(EPICGRhg19) <- "hg19"
seqlevelsStyle(EPICGRhg19) <- "UCSC"

#Get liftOver chain file from ftp://hgdownload.cse.ucsc.edu/goldenPath/hg19/liftOver/hg19ToHg38.over.chain.gz, and gunzip
ch <- import.chain("hg19ToHg38.over.chain")
EPICGRhg38 <- unlist(liftOver(EPICGRhg19, ch))

length(EPICGRhg19) - length(EPICGRhg38)
# 246

So there are 246 probes that fail to lift over to hg38 - we'll just have to accept this. Next, let's create a new CpGannotated object from an old one with these lifted over loci. Say your object is called myannotation:

myannotation.hg19 <- myannotation@ranges
retain <- names(myannotation.hg19) %in% names(EPICGRhg38)
myannotation.hg19 <- myannotation.hg19[retain]
myannotation.hg38 <- EPICGRhg38[names(myannotation.hg19)]
values(myannotation.hg38) <- values(myannotation.hg19)

myannotation <- new("CpGannotated", ranges=myannotation.hg38)

And this can be passed to dmrcate() to call DMRs. Hope this helps.

Cheers, Tim

Entering edit mode

Thank you for the lengthy and thorough reply Tim! As I am quite new to this type of analysis, I highly appreciate it.


Login before adding your answer.

Traffic: 660 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6