Question: How to link methylation code from UCSC (hg19-cg07790169) to ENSEMBL genome code (goseq procedure)
gravatar for dusan.petrovic
20 months ago by
dusan.petrovic0 wrote:

Hello everyone,

I am new to this forum so I'm apologizing in advance if I am not posting in the most formal way.

I am using the goseq package from bioconductor in order to perform enrichment for markers that are differentially methylated (limma output). These markers have been measured  according to CpG arrays from 450K illumina (i.e. cg07790169) and I am retrieving them on ucsc website. I am a newbie at using the goseq library, so I do not know it in details. But my understanding is that it uses codes from Ensembl (i.e. ENSG00....) and not gene codes from UCSC. 

Therefore, I would like to know whether there is a package that would allow to link in a very straightforward way a methylation marker (i.e. cg07790169) to an Ensembl gene code (i.e. ENSG00...) so that I could properly run goseq.

I could do it by copy and paste, but it would be extremely time consuming (I have several hundred methylation markers).


Thank you for your kind help and understanding

ADD COMMENTlink modified 19 months ago • written 20 months ago by dusan.petrovic0
Answer: How to link methylation code from UCSC (hg19-cg07790169) to ENSEMBL genome code
gravatar for James W. MacDonald
20 months ago by
United States
James W. MacDonald51k wrote:

The main use case for goseq is to perform GO hypergeometric testing with bias adjustments for gene length, when using RNA-Seq data. You aren't doing that, so why are you using goseq? There is no length bias inherent in the measurements from the Illumina 450K platform, nor does that platform give any measurements that are readily converted to gene expression.

I suppose you could naively attribute differentially methylated CpG islands to the nearest gene, and infer that the given gene is thus differentially expressed, but at that point you should be using something like GOstats or topGO, because your measurements don't have any length bias.

You don't say how you analyzed your Illumina data, but do note that the FDb.InfiniumMethylation.hg19 package is intended to provide genomic annotation (e.g., chromosomal positions, etc) for all of the probes on that array, and you could use that in concert with the TxDb.Hsapiens.UCSC.hg19.knownGene package to find the nearest gene for each CpG.

Anyway, you are trying to do some fairly non-standard stuff, which by definition means you will be pretty much on your own. This support site is really intended to help people with questions that are readily answered, whereas you appear to have pitched off into the deep end of the pool. If you are willing to do (quite a bit) of reading, you should be able to figure out what you need to do. Otherwise I would highly recommend finding somebody local with relevant experience.

ADD COMMENTlink written 20 months ago by James W. MacDonald51k

And re: required reading, I would recommend the help pages for the FDb.InfiniumMethylation.hg19 package, particularly the getNearest function, which should be relevant.

ADD REPLYlink written 20 months ago by James W. MacDonald51k

James, there is some literature on using goseq for DNA methylation data, and there is in fact a Bioconductor package called 'missMethyl' which has a function called 'gometh' which was specifically developed to apply goseq methods to the illumina 450k platform.  The idea being that genes with differing # of CpGs are a priori more/less likely to appear in the DMR gene set, so gometh would help correct for that bias.  


Gene-set analysis is severely biased when applied to genome-wide methylation data


ADD REPLYlink written 17 months ago by bmreilly0

Sure. But that corrects for the fact that there may be more or less CpGs on an Illumina array that are sufficiently close to a given gene, which may or may not have anything to do with the length of the gene, which is what goseq is concerned with. Or do I miss something?

ADD REPLYlink written 17 months ago by James W. MacDonald51k
Answer: How to link methylation code from UCSC (hg19-cg07790169) to ENSEMBL genome code
gravatar for dusan.petrovic
19 months ago by
dusan.petrovic0 wrote:

Hi James,

Thank you for your reply, I switched to topGO following your advice.





ADD COMMENTlink written 19 months ago by dusan.petrovic0

Dusan, see my comment above on James' post.  There are R packages which you will likely find useful for analyzing 450k data.  See 'missMethyl' .

ADD REPLYlink written 17 months ago by bmreilly0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 393 users visited in the last hour