Annotation for EPICv2
1
0
Entering edit mode
nurkenber • 0
@bba1982e
Last seen 39 minutes ago
Kazakhstan

I am trying to get a more complete annotation for EPICv2. The annotation from Illumina "Infinium MethylationEPIC v2.0" (link below), has 614365 out of 937055 probes not having information on associated gene and a lot of "unknowns" when it comes to gene features such as TSS1500 or 5UTR, as seen on picture below. When I check the position of CpG on UCSC I can find related gene. For example, cg06979118 is overlapped with SHANK2 gene, but has no gene annotation in "Infinium MethylationEPIC v2.0".

enter image description here

I followed ChAMP tutorial for analysis, that uses ChAMPdata (v2.31.1) package, which loads up the annotation based on "Infinium MethylationEPIC v2.0". The newer and developmental ChAMPdata versions do not have annotation for EPICv2 at all.

I have few questions

1) Has anyone used ChAMP and was able to load more complete annotation for EPICv2, meaning to find ChAMPdata package with complete annotation for EPICv2, since functions in ChAMP call data("probe.features.epicv2") even in champ.GSEA,

2) I have a half solution to my problem, I can annotate DMP after I find them but even then I am still not sure where to get a complete annotation. I can overlap probes used for EPICv1 with probes used for EPICv2. But then I am not sure how to perform GSEA analysis in this case since champ.GSEA calls data("probe.features.epicv2"), even though I provide it with DMPs and DMRs. Maybe someone could suggest the pipeline for that?

Infinium MethylationEPIC v2.0: https://support.illumina.com/array/array_kits/infinium-methylationepic-beadchip-kit/downloads.html

Tutorial for ChAMP: https://github.com/YuanTian1991/ChAMP-DemoRun/blob/main/EPICv2/illumina_demo_data_iScan/main.md

EpicV2 ChAMPdata • 517 views
ADD COMMENT
0
Entering edit mode

Don't know if any of these are of use or if these are what ChAMP uses under the hood:

IlluminaHumanMethylationEPICv2anno.20a1.hg38

IlluminaHumanMethylationEPICv2manifest

EPICv2manifest

ADD REPLY
0
Entering edit mode

Thank you for your reply.

I welcome any help I can get)

I checked the first two are based on the same file, Infinium MethylationEPIC v2.0

The last one seems to be also based on the same file, but it has more columns with more information

ADD REPLY
1
Entering edit mode
@james-w-macdonald-5106
Last seen 2 days ago
United States

The CpG you point out is in the intronic region of SHANK2, so will not be in any of the locations you have in your barplot.

enter image description here

The annotations we have for this array come directly from Illumina, and we (and here by 'we' I mean Zuguang Gu who submitted the annotations) simply process their data to put into packages that are useful for minfi and ChAMP to use. If you want more sophisticated location information you could always use ChIPseeker or ChIPPeakAnno or the functions in GenomicRanges that those packages are based on.

0
Entering edit mode

Thank you for your detailed reply

Could you please help me understand something and I understand that it is a question best adressed to Illumina themselves but I just want to take a shot here.

For EPICv1 the number of probes with annotated "genes" and "features" rows is much higher compared to the probes in EPICv2. Also, the feature "Body" is missing in EPIC v2, which is exactly where the cg06979118 would end up. It makes me question whether it is best to do EPICv1 array rather than EPICv2 array since the number of genes will be higher in EPICv1 and it should yield more terms in terms of number and significance in GSEA analysis. The question is, am I wrong in thinking this way?

In EPIC v1 617,287 out of 866,895 probes were annotated

In EPIC v2 322,690 out of 937,055 probes were annotated

Could you also correct me if I am wrong here. As I understood champ.GSEA calls data("probe.features.epicv2"), so even if I annotate the DMPs and DMRs, the "gene" and "feature" annotations will be taken from freshly loaded probe.features. So, if I want to do GSEA with ChIPseeker or ChIPPeakAnno annotated cpgs, I would need to use another method for GSEA?

ADD REPLY
1
Entering edit mode

I don't think I can help with your questions. Like I said, the annotation data are based on the manifest files that Illumina provides, and I know nothing other than where you can get the files. It wouldn't be that difficult to recapitulate yourself, although it can be complicated. As an example, the plot I showed in my previous post indicates that the CpG is intronic for two transcripts and upstream of two others. So is it upstream or intronic?

It appears that most of the gene set testing methods available for methylation arrays rely on the annotation from Illumina (they all appear to call minfi::getAnnotation), so it might take some work in order to use your own annotations.

ADD REPLY
0
Entering edit mode

Got it

Thank you

ADD REPLY

Login before adding your answer.

Traffic: 722 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6