Methylation EPIC array: ChAMP package-champ.load added probes?
1
0
Entering edit mode
Ankit ▴ 20
@ankit-21750
Last seen 8 months ago
Italy

Hi everyone,

I am analyzing a Methylation EPIC array data using ChAMP package. I observed that after apply champ.load() function, some extra probes are getting added to my actual probes list. Those probes were not present in the file imported by champ.import function and also not added when I perform champ.filter.

Since I want to use "SWAN" normalization in champ.norm() function I had to use champ.load(). When I applied champ.load(), the number of probes matching with manifest (EPIC.mainfest.hg19) are less. Ideally manifest file should have all the probes and coordinates for array data. But in my case I found, NOT all the probes present in myLoad$beta (which is a file after champ.load) or myNorm match with manifest and there are few hundred probes remain unassigned. The doubt is champ.load()  seems to add some extra probes which are not present in my data. Here is my command:

>     myLoad_2<-champ.load(directory = getwd(),
>                          method="minfi",
>                          methValue="B",
>                          autoimpute=TRUE,
>                          filterDetP=TRUE,
>                          ProbeCutoff=0,
>                          SampleCutoff=0.1,
>                          detPcut=0.01,
>                          filterBeads=TRUE,
>                          beadCutoff=0.05,
>                          filterNoCG=TRUE,
>                          filterSNPs=TRUE,
>                          population=NULL,
>                          filterMultiHit=TRUE,
>                          filterXY=TRUE,
>                          force=FALSE,
>                          arraytype="EPIC")

Number of probes remained after filtering: 705948 Number of filterd probes matched with manifest: 705762 Some extra probes I checked manually which were not overlapped with manifest: 186 Also I tried to check the origin and Some of these 186 probes overlapped with human.450K.manifest.csv file. How it is possible that  when I give EPIC and probes were not present in my EPIC.manifest and also in my original data gets incorporated after champ.load().

I tried to look for solution but could not find anything.

I request you to please help me with the same.

Thank you

methylation epic champ champ.load() Methylation array • 3.3k views
ADD COMMENT
1
Entering edit mode
Yuan Tian ▴ 290
@yuan-tian-13904
Last seen 8 months ago
United Kingdom

Hi Ankit:

"EPIC.manifest.hg19" is not used for loading data, it's a data from Zhou SNP filter paper for SNP filtering. So it would only be used for SNP filtering, not loading data. The annotation file used for loading is "AnnoEPIC". You may try to match your beta matrix with that file.

I suspect the reason is: the "EPIC.manifest.hg19" used an old version of EPIC annotation (totally there are 4 versions), which contains slightly fewer probes then the latest version (4th) EPIC annotation (I remember the number is 196). That's why when you match the result generated by "AnnoEPIC" to that file, you may find fewer probes. If it's the reason, after my vacation, I will see if the SNP paper has provided a newer EPIC version filter list, then incorporate.

But in summary, I think "EPIC.manifest.hg19" is only used for SNP filtering, not influencing data loading, analysis .eg. I suspect it's an issue with various EPIC manifest version. If you want to check, it's more proper to check your result with "AnnoEPIC" data.

Best Tian

ADD COMMENT
0
Entering edit mode

Hi Yuan,

Thank you for the reply.

I tried to do as you suggested. Again there is a less overlap:

data(**AnnoEPIC**)
length(intersect(Anno$Annotation$CpG, rownames(myLoad_2$beta)))
**705762**

This is same as previous results. And those 186 probe still did not overlap with "AnnoEPIC".

I am not able to figure where is the problem.

The only observation I had is when I intersected this with 450K Anno file, 87 out of 186 overlapped.

:::Checking possible source of this inorporation:::

data(**Anno450K**)
length(intersect(Anno$Annotation$CpG, myoutersect))
 **87**

I don't understand why this 450K probes + other probes are incorporated in my EPIC data.

As I also previously mentioned, the probes remaining after filtering are 705948

Also, champ.import() + champ.filter() pipeline gives not extra probes added, What could be the issue only with champ.load()

Please help.

Thanks

ADD REPLY
0
Entering edit mode

I would like to add another observation:

myLoad <- champ.load(directory = getwd(), arraytype="EPIC")

705948

myLoad <- champ.load(directory = getwd(), method="minfi", arraytype="EPIC")

705762

The above two commands gives different number of filtered probes retained. I think it has to do with "minfi" method.

What is the issue? and what should I do?

Please suggest.

Thanks

ADD REPLY
0
Entering edit mode

Hi,

I tried checking with older versions also, but the results are same.

If any information, please let me know.

Thank you

ADD REPLY
1
Entering edit mode

Hi Ankit:

I did some test, with latest version ChAMP, and downloaded GSE137541 for testing, I saw the data was published recently, so I assume it is using the latest version EPIC.

1: All CpG in myLoad or myImport are matched with AnnoEPIC annotation if you use the default "ChAMP" method. AnnoEPIC is used in champ.import(), so which is what I expected. If you use minfi method, AnnoEPIC will not be used when loading.

2: champ.load() get exactly the same result as champ.import() + champ.filter(), which is what it should be, because if you check the champ.load() code, it's simply combined champ.import() and champ.filter() togather, without any modification. But one thing maybe worth notice is: If I load data with champ.import() then use champ.filter(), the beadcount parameter in champ.fiter() should be assigned as myImport$beadcount.

3: I compared "minfi" method and "ChAMP" method in champ.load() function. In 450K data they get exactly the same result. In EPIC data, "minfi" method indeed included 100+ more CpGs, but if I compared "common CpGs" between two loading results, they are exactly the same. I did a quick check, the difference was caused when data are read by minfi's function readmetharray(). So I am thinking maybe there is some tiny EPIC annotation version difference between two methods. But I need time to find out the difference. However, the difference only influence 100+ CpGs, all the rest are exactly the same.

If as you said, the old version is matched. I may assume "minfi" method get updated at some point because ChAMP's loading method/annotation has not been modified in past 2 years, so I guess maybe minfi slightly improved the annotation, included 100+ CpGs. However, I did not see any updating on Illumina website, the latest version was what published in 2017. So it's kind of wired...

I am now thinking your "extra" probes are imported by new "minfi" method right? (because ChAMP just simply employed minfi's function, so if minfi get upgraded, "minfi method"' get upgraded as well). And see if you add "beadcount" parameter in champ.filter(), the result would be the same. And third, you may select common CpGs between two loading methods, see if they are the same.

I am currently on vacation (with extremely poor internet), so will check it next week when I back to work.

Best Tian

ADD REPLY
0
Entering edit mode

Hi Yuan,

Thank you for the reply.

I tried to check the ChAMP pipeline with both 450K and two different EPIC data sets we have and with respect to your comments observations are as follows:

1: All CpG in myLoad or myImport are matched with AnnoEPIC annotation if you use the default "ChAMP" method.

-> Yes I agree. With "ChAMP" method, all CpG probes matched with AnnoEPIC annotation in BOTH the datasets.

2: champ.load() get exactly the same result as champ.import() + champ.filter().

-> Yes, I checked. The filtered probes in champ.load() or champ.import() + champ.filter() remains exactly the same in both of our EPIC datasets.

:Scripts used with EPIC dataset 1:

"minfi" method

---same script as previous---

dim(myLoad_2$beta)

705948 16

"ChAMP" method (default)

myLoad <- champ.load(directory = getwd(), arraytype="EPIC")

dim(myLoad$beta)

705762 16

myImport <- champ.import(directory = getwd(), arraytype="EPIC")

dim(myImport$beta)

865918 16

myfilter <- champ.filter(beta=myImport$beta,pd=myImport$pd,detP=myImport$detP,beadcount=myImport$beadcount, arraytype = "EPIC")

dim(myfilter$beta)

705762 16

:Scripts used with EPIC dataset 2:

"minfi" method

---same script as previous---

dim(myLoad_2$beta)

741073 10

"ChAMP" method (default)

myLoad <- champ.load(directory = getwd(), arraytype="EPIC")

dim(myLoad$beta)

740861 10

myImport <- champ.import(directory = getwd(), arraytype="EPIC")

dim(myImport$beta)

865918 10

myfilter <- champ.filter(beta=myImport$beta,pd=myImport$pd,detP=myImport$detP,beadcount=myImport$beadcount, arraytype = "EPIC")

dim(myfilter$beta)

740861 10

3: I compared "minfi" method and "ChAMP" method in champ.load() function. In 450K data they get exactly the same result.

-> Yes I tested it with one of our 450K data. The filtered probes remains same both with "minfi" and "ChAMP" method.

"minfi" Load method + SWAN normalization

dim(DATAmyNorm2) #Probes after champ.load()

409155 9

dim(MERGEmyNorm3) #Overlapped with MANIFEST

409155 65

"ChAMP" method + BMIQ normalization

dim(DATAmyNorm2)

409155 9

dim(MERGEmyNorm3)

409155 65

In EPIC data, "minfi" method indeed included 100+ more CpGs, but if I compared "common CpGs" between two loading results, they are exactly the same.

-> With one of the EPIC dataset, there is a complete overlap (except those extra CpGs) but with one other dataset one probe ID did not match + extra CpGs (venn diagram attached).

:a). EPIC dataset 1 (full match):

dim(myLoad$beta)

705762 16

dim(myLoad_2$beta)

705948 16

----Overlap----

length(intersect(rownames(myLoad$beta), rownames(myLoad_2$beta)))

705762

::b). EPIC dataset 2 (one probe not matched):

dim((myLoad$beta))

740861 10

dim((myLoad_2$beta))

741073 10

----Overlap----

length(intersect(rownames(myLoad$beta), rownames(myLoad_2$beta)))

740860

However, the difference only influence 100+ CpGs, all the rest are exactly the same.

-> Will it affect normalization while running champ.norm(), for example SWAN normalization?

If as you said, the old version is matched.

->For the current analysis I used the new version of ChAMP and minfi.

ChAMP2.16.1 minfi1.32.0

By old version I meant, ChAMP 2.8.9. I also downgraded "minfi" to 1.22.1 following ChAMP 2.8.9 version page and retested the pipeline. The results were similar as obtained with newer version.

I am not sure if something else need to be downgraded to remove discrepancy between number of CpGs between two methods.

**And see if you add "beadcount" parameter in champ.filter(), the result would be the same.

-> Yes the result were same.

And third, you may select common CpGs between two loading methods, see if they are the same.

-> As I mentioned, in one dataset there is a full match (except those extra CpGs) but not in another dataset (one probe is different + extra CpGs).

Please help.

Thank you

Kind regards

Ankit

ADD REPLY
0
Entering edit mode

Hi

Any update related to this ?

Thanks

ADD REPLY
0
Entering edit mode

Hi did you resolve this issue?

ADD REPLY
0
Entering edit mode

Sorry I am busying on something, will check it this this weekend. 在 2020年2月10日 +0000 PM5:17,Ankit [bioc] noreply@bioconductor.org,写道:

ADD REPLY
0
Entering edit mode

Hi:

A short update for my investigation. I tried to find the different this weekend, but not make it yet. I can confirm that if you use minfi to load EPIC data, it would have 232 CpGs extra then ChAMP method. The rest is exactly the same. I compared them right after loading (champ.import() and read.metharray.exp() functions), without any filtering.

And weirdly, seems the 232 extra CpGs can not be found in B4 version annotation package. But can be found in B2 version annotation package. So I still suspect the reason could be minfi added B2-extra CpGs into it's default annotation (maybe to reduce waste...), while ChAMP used the official B4 annotation.

But as I said, except those extra CpGs, for the rest CpGs. the two methods return exactly the same result. So both should be reliable to be used. Since minfi's method is very well structured and sealed, I need more time to figure out the reason.

Best Tian

ADD REPLY
0
Entering edit mode

Hi Tian, Thanks for your investigation. So can I continue using minfi loading with present settings and does not care about extra CpGs added? I urgently need to process data and in absence of exact reason behind extra CpGs it would be difficult to explain the numbers of filtered out CpGs during analysis.

Please let me know if you figure out the reason. May be you can update ChAMP package for minfi loading with new B4 annotation.

Thanks

ADD REPLY
0
Entering edit mode

Hi Any updates? Did u resolve the issue?

ADD REPLY
0
Entering edit mode

Did you update the version? Please look into the matter as I will have to use only minfi loading with SWAN normalization. I have some restrictions to use champ default loading. I am willing that each number of the output while. Minfi load should be explainable. Please correct if any bug in the current version and update asap. Thanks

ADD REPLY
0
Entering edit mode

I would try to identify the reason this week. I suspect that's because minfi did not use B4 version, but I cannot modify mini's function, as it's another package, and I just used it's function for loading.

Best Tian

ADD REPLY
0
Entering edit mode

Hi Let me know if you have any updates?

ADD REPLY
0
Entering edit mode

Hi:

I can't modify the minfi's loading function, and force other people to use my annotation. However, I suspect the only difference is minfi used another annotation, which contains about 200+ CpGs, but the rest should b exactly the same.

Could you email me to discuss this? tianyuan1991hit@gmail.com

Best Tian

ADD REPLY

Login before adding your answer.

Traffic: 551 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6