IlluminaHumanMethylation450k.db Reference Versions
2
0
Entering edit mode
Dario Strbenac ★ 1.5k
@dario-strbenac-5916
Last seen 2 days ago
Australia
In the package IlluminaHumanMethylation450k.db, there are three data objects relating probes to chromosomes. They are IlluminaHumanMethylation450kCHR, IlluminaHumanMethylation450kCHR36, and IlluminaHumanMethylation450kCHR37. I wonder what the reason of having IlluminaHumanMethylation450kCHR is, and what reference was used, since that is not explained in the help page of IlluminaHumanMethylation450kCHR ? Is it redundant ? Also, the mapping to locations, IlluminaHumanMethylation450kCHRLOC, is only available for hg19. There should also be one for hg18, or otherwise the IlluminaHumanMethylation450kCHR36 should not be supported. I am referring to version 1.4.6 of the IlluminaHumanMethylation450k.db package. -------------------------------------- Dario Strbenac Research Assistant Cancer Epigenetics Garvan Institute of Medical Research Darlinghurst NSW 2010 Australia
• 1.0k views
ADD COMMENT
0
Entering edit mode
Tim Triche ★ 4.2k
@tim-triche-3561
Last seen 3.7 years ago
United States
CHR does what is expected of the mapping in that it returns the chromosome of the probe. It is constructed by overwriting the bimap for CHR with that for CHR37 on export. Without this kludge, tens of thousands of probes return NA as their chromosome, which is clearly incorrect. As it happens, due to a long-standing tradition of excluding 'promiscuous' probes, the default behavior of ALIAS2PROBE (for example) is also wrong. I'm about to upload 2.0.6 with that patched. The problem with gene-centric annotations of the sort used in Bioconductor .db packages is that they're gene-centric; the mapping from probes to genes, locations, chromosomes, GO annotations, KEGG pathways, and the like is done through EntrezGene IDs. There has been some discussion as to whether completely reannotating the chip might not be a better idea in this respect, i.e. mapping the probes to the nearest TSS. As I have gained more experience with the GRanges architecture, I have realized that GRanges are the more sensible approach to annotating the probes on the 450k. Nonetheless, the 450k.db package is out there so it ought to do what it's expected to, unless or until everything transitions to the manifest package that Kasper and Martin Aryee put together. On Sun, Nov 6, 2011 at 11:00 PM, Dario Strbenac <d.strbenac@garvan.org.au>wrote: > In the package IlluminaHumanMethylation450k.db, there are three data > objects relating probes to chromosomes. They are > IlluminaHumanMethylation450kCHR, IlluminaHumanMethylation450kCHR36, and > IlluminaHumanMethylation450kCHR37. I wonder what the reason of having > IlluminaHumanMethylation450kCHR is, and what reference was used, since that > is not explained in the help page of IlluminaHumanMethylation450kCHR ? Is > it redundant ? > > Also, the mapping to locations, IlluminaHumanMethylation450kCHRLOC, is > only available for hg19. There should also be one for hg18, or otherwise > the IlluminaHumanMethylation450kCHR36 should not be supported. > > I am referring to version 1.4.6 of the IlluminaHumanMethylation450k.db > package. > > -------------------------------------- > Dario Strbenac > Research Assistant > Cancer Epigenetics > Garvan Institute of Medical Research > Darlinghurst NSW 2010 > Australia > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- If people do not believe that mathematics is simple, it is only because they do not realize how complicated life is. John von Neumann<http: www-groups.dcs.st-="" and.ac.uk="" ~history="" biographies="" von_neumann.html=""> [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
The behavior of all of the mappings (with exceptions for those ones that Tim has has previously "adjusted" such as CHR), will "hide" the probes that match multiple entrez gene IDs. This happens because the 450k.db was made as a chip package, and chip packages were specifically designed to hide probes that behave that way by default. The reason for this behavior is because chip packages were originally designed to work primarily as mRNA microarrray platforms. So the default behavior is not really broken, or even really inappropriate. It's just that this is an atypical use case. But the data is all in there, and you absolutely CAN get to it with really very little trouble. You just have to use the toggleProbes method to expose it. You can use it like this: ## step 1: create a mapping that exposes ALL the probes regardless of how many genes the match: fullAliasMapping <- toggleProbes(IlluminaHumanMethylation450kALIAS2PROBE, "all") ## step 2: use that mapping instead of IlluminaHumanMethylation450kALIAS2PROBE head(toTable(fullAliasMapping)) ## You can compare the two mappings to see how they behave differently: dim(toTable(IlluminaHumanMethylation450kALIAS2PROBE)) dim(toTable(fullAliasMapping)) I understand that Tim is planning to modify this package so that it's default behavior is more in line with what users of this platform expect, which is a terrific thing for him to do. But in the meantime, the package is perfectly serviceable, you just have to know how to use the toggleProbes method. Marc On 11/07/2011 07:09 AM, Tim Triche, Jr. wrote: > CHR does what is expected of the mapping in that it returns the chromosome > of the probe. It is constructed by overwriting the bimap for CHR with > that for CHR37 on export. Without this kludge, tens of thousands of probes > return NA as their chromosome, which is clearly incorrect. > > As it happens, due to a long-standing tradition of excluding 'promiscuous' > probes, the default behavior of ALIAS2PROBE (for example) is also wrong. > I'm about to upload 2.0.6 with that patched. > > The problem with gene-centric annotations of the sort used in Bioconductor > .db packages is that they're gene-centric; the mapping from probes to > genes, locations, chromosomes, GO annotations, KEGG pathways, and the like > is done through EntrezGene IDs. There has been some discussion as to > whether completely reannotating the chip might not be a better idea in this > respect, i.e. mapping the probes to the nearest TSS. As I have gained more > experience with the GRanges architecture, I have realized that GRanges are > the more sensible approach to annotating the probes on the 450k. > > Nonetheless, the 450k.db package is out there so it ought to do what it's > expected to, unless or until everything transitions to the manifest package > that Kasper and Martin Aryee put together. > > > On Sun, Nov 6, 2011 at 11:00 PM, Dario Strbenac<d.strbenac at="" garvan.org.au="">wrote: > >> In the package IlluminaHumanMethylation450k.db, there are three data >> objects relating probes to chromosomes. They are >> IlluminaHumanMethylation450kCHR, IlluminaHumanMethylation450kCHR36, and >> IlluminaHumanMethylation450kCHR37. I wonder what the reason of having >> IlluminaHumanMethylation450kCHR is, and what reference was used, since that >> is not explained in the help page of IlluminaHumanMethylation450kCHR ? Is >> it redundant ? >> >> Also, the mapping to locations, IlluminaHumanMethylation450kCHRLOC, is >> only available for hg19. There should also be one for hg18, or otherwise >> the IlluminaHumanMethylation450kCHR36 should not be supported. >> >> I am referring to version 1.4.6 of the IlluminaHumanMethylation450k.db >> package. >> >> -------------------------------------- >> Dario Strbenac >> Research Assistant >> Cancer Epigenetics >> Garvan Institute of Medical Research >> Darlinghurst NSW 2010 >> Australia >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > >
ADD REPLY
0
Entering edit mode
Dario Strbenac ★ 1.5k
@dario-strbenac-5916
Last seen 2 days ago
Australia
Thanks for the explanations. Do you know of a persistent source of mappings of probes to genomic locations, such as : |----------------------------------- |probe_id | chr | pos | strand | |----------------------------------- This might be useful if, for example, I wanted to write a BED file out to use with a genome browser. I suppose it's possible to map the probes ourselves, but I was thinking of it being more reproducible and citable if some Bioconductor package had already published this.
ADD COMMENT
0
Entering edit mode
The manifest has probe locations, but they're keyed against which build they were designed to. So for example, about half the CpH probes are built against hg18/GRCh36 instead of hg19/GRCh37. I put together a probe package that attempts to make some sense of what is meant by 'strand' for Illumina's probes, the corresponding genomic forward sequence, and where the CpG goes. When using that approach, the interrogated CpG or CpH locus always appears at 'site' and extends 1 (for CpG) or 2 (for CpH) probes upstream of the site. So if I want to look for CpGs that overlap a SNP, using SNPlocs and GenomicRanges, it is enough to subsetOverlaps() and tabulate them. Same for HapMap SNPs, or H3K4Me2 enriched sites in a particular tissue type, DNAse hypersensitive sites, or what have you. The GRanges approach is just far more flexible than the gene-centric approach. So, I added a couple of "freeze-dried" GRanges objects to IlluminaHumanMethylation450k.db version 2.0+ , which are returned by getCpGR() and getCpHR(), to return GRanges objects with these data included. The probe package mostly just adds sequences into the mix, so that all of this can be verified against one's genomic build of choice. On Mon, Nov 7, 2011 at 3:00 PM, Dario Strbenac <d.strbenac@garvan.org.au>wrote: > Thanks for the explanations. Do you know of a persistent source of > mappings of probes to genomic locations, such as : > > |----------------------------------- > |probe_id | chr | pos | strand | > |----------------------------------- > > This might be useful if, for example, I wanted to write a BED file out to > use with a genome browser. I suppose it's possible to map the probes > ourselves, but I was thinking of it being more reproducible and citable if > some Bioconductor package had already published this. > -- If people do not believe that mathematics is simple, it is only because they do not realize how complicated life is. John von Neumann<http: www-groups.dcs.st-="" and.ac.uk="" ~history="" biographies="" von_neumann.html=""> [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 703 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6