Question: How to link methylation code from UCSC (hg19-cg07790169) to ENSEMBL genome code (goseq procedure)
0
20 months ago by
Switzerland
dusan.petrovic0 wrote:

Hello everyone,

I am new to this forum so I'm apologizing in advance if I am not posting in the most formal way.

I am using the goseq package from bioconductor in order to perform enrichment for markers that are differentially methylated (limma output). These markers have been measured  according to CpG arrays from 450K illumina (i.e. cg07790169) and I am retrieving them on ucsc website. I am a newbie at using the goseq library, so I do not know it in details. But my understanding is that it uses codes from Ensembl (i.e. ENSG00....) and not gene codes from UCSC.

Therefore, I would like to know whether there is a package that would allow to link in a very straightforward way a methylation marker (i.e. cg07790169) to an Ensembl gene code (i.e. ENSG00...) so that I could properly run goseq.

I could do it by copy and paste, but it would be extremely time consuming (I have several hundred methylation markers).

Thank you for your kind help and understanding

modified 19 months ago • written 20 months ago by dusan.petrovic0
Answer: How to link methylation code from UCSC (hg19-cg07790169) to ENSEMBL genome code
0
20 months ago by
United States
James W. MacDonald51k wrote:

The main use case for goseq is to perform GO hypergeometric testing with bias adjustments for gene length, when using RNA-Seq data. You aren't doing that, so why are you using goseq? There is no length bias inherent in the measurements from the Illumina 450K platform, nor does that platform give any measurements that are readily converted to gene expression.

I suppose you could naively attribute differentially methylated CpG islands to the nearest gene, and infer that the given gene is thus differentially expressed, but at that point you should be using something like GOstats or topGO, because your measurements don't have any length bias.

You don't say how you analyzed your Illumina data, but do note that the FDb.InfiniumMethylation.hg19 package is intended to provide genomic annotation (e.g., chromosomal positions, etc) for all of the probes on that array, and you could use that in concert with the TxDb.Hsapiens.UCSC.hg19.knownGene package to find the nearest gene for each CpG.

Anyway, you are trying to do some fairly non-standard stuff, which by definition means you will be pretty much on your own. This support site is really intended to help people with questions that are readily answered, whereas you appear to have pitched off into the deep end of the pool. If you are willing to do (quite a bit) of reading, you should be able to figure out what you need to do. Otherwise I would highly recommend finding somebody local with relevant experience.

And re: required reading, I would recommend the help pages for the FDb.InfiniumMethylation.hg19 package, particularly the getNearest function, which should be relevant.

James, there is some literature on using goseq for DNA methylation data, and there is in fact a Bioconductor package called 'missMethyl' which has a function called 'gometh' which was specifically developed to apply goseq methods to the illumina 450k platform.  The idea being that genes with differing # of CpGs are a priori more/less likely to appear in the DMR gene set, so gometh would help correct for that bias.

References:

Gene-set analysis is severely biased when applied to genome-wide methylation data

missMethyl

https://bioconductor.org/packages/release/bioc/html/missMethyl.html

Sure. But that corrects for the fact that there may be more or less CpGs on an Illumina array that are sufficiently close to a given gene, which may or may not have anything to do with the length of the gene, which is what goseq is concerned with. Or do I miss something?

Answer: How to link methylation code from UCSC (hg19-cg07790169) to ENSEMBL genome code
0
19 months ago by
Switzerland
dusan.petrovic0 wrote:

Hi James,

Best

Dusan

Dusan, see my comment above on James' post.  There are R packages which you will likely find useful for analyzing 450k data.  See 'missMethyl' .

Content
Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.