Question

Same methylation probe ID with different locus annotations in Illumina 27k and 450k plotforms

0

Entering edit mode

yuabrahamliu • 0

@yuabrahamliu-17670

Last seen 3.7 years ago

Hello everyone,

I want to ask a question about the probe annoations in the package IlluminaHumanMethylation27kanno.ilmn12.hg19 (27k) and the package IlluminaHumanMethylation450kanno.ilmn12.hg19 (450k).

I found for many methylation probes, their loci annotations in the 27k package and the 450k package are different. For example, if I check the loci for the probes cg00000292, cg00002426, and cg00003994 using the 27k package as


IlluminaHumanMethylation27kanno.ilmn12.hg19::Locations[c('cg00000292', 'cg00002426', 'cg00003994'),]

And also check their loci using the 450k package as


IlluminaHumanMethylation450kanno.ilmn12.hg19::Locations[c('cg00000292', 'cg00002426', 'cg00003994'),]

For the 27k result for these 3 probes, it is

DataFrame with 3 rows and 3 columns
                   chr       pos      strand
           <character> <integer> <character>
cg00000292       chr16  28797601           +
cg00002426        chr3  57718583           +
cg00003994        chr7  15692387           +

While for the 450k result for these 3 probes, it is

DataFrame with 3 rows and 3 columns
                   chr       pos      strand
           <character> <integer> <character>
cg00000292       chr16  28890100           +
cg00002426        chr3  57743543           +
cg00003994        chr7  15725862           -

It means the same probe IDs have different loci in 27k and 450k. I know the probe chemistry of Illumina 27k and 450k are different, but is it mean even the coordinates for the same probe ID are also different? If so, is it possible to combine the shared probes (with the same probe IDs) from samples detected on 27k and 450k together? Thank you so much!

IlluminaHumanMethylation27kanno.ilmn12.hg19 • 1.5k views

ADD COMMENT • link updated 3.7 years ago by Kevin Blighe ★ 4.0k • written 3.7 years ago by yuabrahamliu • 0

score 0 · Answer 1 · 2021-03-03

0

Entering edit mode

Kevin Blighe ★ 4.0k

@kevin

Last seen 13 days ago

Republic of Ireland

Hi,

It seems that the manufacturer (Illumina) simply moved the target locus, which often occurs across array designs. In the case of cg00000292, the change is somewhat dramatic, but one can infer that, on the 27k, it may be targeting a promoter region; whereas, on the 450k, it is targeting exon 1 (of ATP2A1).

What do you mean by 'combine' these together? I would personally keep separate the analysis of both arrays, unless you are just doing univariate analysis (analysing each probe separately). However, even in the case where you are doing univariate analysis, the arrays should be processed separately, in my opinion. To 'combine' them, in this case, I would just assign a unique ID to each probe, perhaps by simply adding a '_27k' or '_450k' prefix.

There is, however, a function in the minfi Bioconductor package that claims to be able to combine arrays - please take a look: https://rdrr.io/bioc/minfi/man/combineArrays.html

Kevin

ADD COMMENT • link 3.7 years ago Kevin Blighe ★ 4.0k

0

Entering edit mode

Thank you for the reply Kevin. I further checked the sequences of the probes and found something noteworthy. For the 27k package, the sequences for the 3 probes were obtained via

IlluminaHumanMethylation27kanno.ilmn12.hg19::Manifest[c('cg00000292', 'cg00002426', 'cg00003994'),c('ProbeSeqA', 'ProbeSeqB')]

While for the 450k package, it is

IlluminaHumanMethylation450kanno.ilmn12.hg19::Manifest[c('cg00000292', 'cg00002426', 'cg00003994'),c('ProbeSeqA', 'ProbeSeqB')]

Then the 27k result is

DataFrame with 3 rows and 2 columns
                                                    ProbeSeqA
                                                  <character>
cg00000292 AAACATTAATTACCAACCACTCTTCCAAAAAACACTTACCATTAAAACCA
cg00002426 AATATAATAACATTACCTTACCCATCTTATAATCAAACCAAACAAAAACA
cg00003994 AATAATAATAATACCCCCTATAATACTAACTAACAAACATACCCTCTTCA
                                                    ProbeSeqB
                                                  <character>
cg00000292 AAACATTAATTACCAACCGCTCTTCCAAAAAACACTTACCATTAAAACCG
cg00002426 AATATAATAACATTACCTTACCCGTCTTATAATCAAACCAAACGAAAACG
cg00003994 AATAATAATAATACCCCCTATAATACTAACTAACAAACATACCCTCTTCG

While the 450k result is

DataFrame with 3 rows and 2 columns
                                                    ProbeSeqA
                                                  <character>
cg00000292 AAAACATTAATTACCAACCRCTCTTCCAAAAAACACTTACCATTAAAACC
cg00002426 CAATATAATAACATTACCTTACCCRTCTTATAATCAAACCAAACRAAAAC
cg00003994 TAATAATAATAATACCCCCTATAATACTAACTAACAAACATACCCTCTTC
             ProbeSeqB
           <character>
cg00000292            
cg00002426            
cg00003994

For the ProbeSeqA of 27k, ProbeSeqB of 27k, and ProbeSeqA of 450k, their only difference is the last bases of the sequences, which is because of the probing chemistry, but other parts are the same. If so, their loci annotation should be the same or only with a small difference, but the loci results in the former post showed a large difference. For the probe "cg00003994", even the strand has been changed. Actually, I am wondering whether there is any problem with the loi annotations. Thank you so much!

ADD REPLY • link 3.7 years ago yuabrahamliu • 0

0

Entering edit mode

I am admittedly not sure, but can only assume that Illumina, through experimentation, concluded that these probes had greater affinity for the loci on the 450k, but I am not sure. Perhaps browsing the manufacturer's resources for these will help. They usually have more in depth information, and possibly even a log of changes from one array to the next (Affymetrix has such information).

ADD REPLY • link 3.7 years ago Kevin Blighe ★ 4.0k