annotation EpicV2
1
0
Entering edit mode
@2b1a21ea
Last seen 8 months ago
Netherlands

Hello everyone,

I am doing EpicV2 minfi analysis and I have a problem with annotations. I noticed that some CpG probes are doubled, for example: cg22367159_BC12 cg22367159_BC13

They have the same chromosomal position but different AddressA and B in annotation. They also differ in betas, pvalues etc. I don't know what to do with these results. Has anyone else encountered this problem and has a solution/explanation for it? Should I remove all duplicated Cg`s? (over 6000) . Example beta:

cg22367159_BC12 0.6393478, cg22367159_BC13 0.6610640

Thanks for every answer

minfiDataEPIC EpicV2 • 1.5k views
ADD COMMENT
0
Entering edit mode

You'll have to provide more information than that. If I process using preprocessFunnorm, I don't see any duplicates.

> eset
class: GenomicRatioSet 
dim: 865859 64 
metadata(0):
assays(2): Beta CN
rownames(865859): cg14817997 cg26928153 ... cg07587934 cg16855331
rowData names(0):
colnames(64): 206466480050_R01C01 206466480050_R02C01 ...
  206466470100_R07C01 206466470100_R08C01
colData names(12): Sample_Name Sample_Well ... yMed predictedSex
Annotation
  array: IlluminaHumanMethylationEPIC
  annotation: ilm10b4.hg19
Preprocessing
  Method: NA
  minfi version: NA
  Manifest version: NA
> grep("cg22367159", rownames(eset))
[1] 397
> any(duplicated(rownames(eset)))
[1] FALSE
ADD REPLY
0
Entering edit mode

Hi James, Thank you very much for your answer. Below you will find my code and output.

> BasePath=idatspath
> RGset=read.metharray.exp(BasePath,targets, force = T)
> RGset@annotation <- c(array = "IlluminaHumanMethylationEPICv2", annotation = "20a1.hg38")
> RGset
class: RGChannelSet 
dim: 1105209 11 
metadata(0):
assays(2): Green Red
rownames(1105209): 1600157 1600179 ... 99810982 99810990
rowData names(0):
colnames(11): 207600980050_R01C01 207600980050_R02C01 ... 207600980049_R08C01 207678520097_R01C01
colData names(9): Basename Sample_ID ... Gender filenames
Annotation
  array: IlluminaHumanMethylationEPICv2
  annotation: 20a1.hg38

> grep("cg22367159", rownames(RGset))

integer(0)

> any(duplicated(rownames(RGset)))

1 FALSE

> betaraw=getBeta(RGset)
> betaraw[c("cg22367159_BC12","cg22367159_BC13"),]

betaraw

ADD REPLY
0
Entering edit mode

If you ever find yourself using the @ function, you should seriously reconsider. If you are meant to be able to change something in an S4 object, there will be an accessor function. In this case it's annotation<-, so the correct way to change the annotation is

annotation(RGset) <- c(array = "IlluminaHumanMethylationEPICv2", annotation = "20a1.hg38")

## or you could use
annotation(RGset)["annotation"] <- "20a1.hg38"

I don't see the number of dups that you do though (after running preprocessFunnorm).

> eset2
class: GenomicRatioSet 
dim: 130767 64 
metadata(0):
assays(2): Beta CN
rownames(130767): cg06402284_TC21 cg00006223_TC21 ... cg01757887_BC21
  cg04509201_BC21
rowData names(0):
colnames(64): 206466480050_R01C01 206466480050_R02C01 ...
  206466470100_R07C01 206466470100_R08C01
colData names(12): Sample_Name Sample_Well ... yMed predictedSex
Annotation
  array: IlluminaHumanMethylationEPICv2
  annotation: 20a1.hg38
Preprocessing
  Method: NA
  minfi version: NA
  Manifest version: NA
> shortids <- sapply(strsplit(rownames(eset2), "_"), "[", 1)
> table(table(shortids))

     1      2      3 
130494    129      5

I don't know why Illumina put these dups on the array, so It's hard to know what one should do with them. There are three obvious choices. Ignore them (this is what I normally do), remove the duplicates, or average them.

To average, I would use avereps from limma.

> library(limma)
> oldassays <- assays(eset2)
> neweset <- eset2[!duplicated(shortids),]
> newassays <- lapply(oldassays, function(x) {tmp <- avereps(x, shortids); rownames(tmp) <- rownames(neweset); return(tmp)})
> assays(neweset) <- newassays
> neweset
class: GenomicRatioSet 
dim: 130628 64 
metadata(0):
assays(2): Beta CN
rownames(130628): cg06402284_TC21 cg00006223_TC21 ... cg01757887_BC21
  cg04509201_BC21
rowData names(0):
colnames(64): 206466480050_R01C01 206466480050_R02C01 ...
  206466470100_R07C01 206466470100_R08C01
colData names(12): Sample_Name Sample_Well ... yMed predictedSex
Annotation
  array: IlluminaHumanMethylationEPICv2
  annotation: 20a1.hg38
Preprocessing
  Method: NA
  minfi version: NA
  Manifest version: NA
ADD REPLY
0
Entering edit mode

Hi James,

Thank you very much for your time and detailed answer.

ADD REPLY
0
Entering edit mode
Marco • 0
@4801a9a2
Last seen 5 months ago
Italy

Hi James, do you know a way to analyze EPICv1 and EPICv2 samples in the same analysis? I tried the combineArray command, but it doesn't work.

ADD COMMENT
0
Entering edit mode

Are you using the IlluminaHumanMethylationEPICv2anno.20a1.hg38 package (https://github.com/jokergoo/IlluminaHumanMethylationEPICv2anno.20a1.hg38 )? I used the "IlmnID" instead of "Probe ID" as the primary ID in this package because as you have seen, the probe IDs may be duplicated. According to the documentation from Illumina, the following two are basically replicates of a same probe (replicate 2 and 3)

cg22367159_BC12
cg22367159_BC13

I guess you can simply take the mean or any one of them.

ADD REPLY

Login before adding your answer.

Traffic: 463 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6