I have a list of genes for which I need to pull out related DNA methylation data from my EPIC array dataset. I would like to visualise this information in a genomic browser if possible. I want to pick CpGs from different genomic regions in and around the specific genes i.e. gene body, 5'UTR, 3'UTR, TSS, promoter, etc. Can anyone help me with this? Is there any online tool I can use to create a track of my data in a genome browser?
Hi James! I have similar question with the original post. But after I got the mapped probes, how should I calculate the mean methylation value of a specific gene from the probes mapped to the gene location? Seems there's no information on which probes mapped to which genes?
Here's an example. Your homework is to decipher what I did.
> library(minfi)
> library(minfiData)
> z <- preprocessFunnorm(RGsetEx)
> library(TxDb.Hsapiens.UCSC.hg19.knownGene)
> pp <- promoters(TxDb.Hsapiens.UCSC.hg19.knownGene)
> pp
GRanges object with 82960 ranges and 2 metadata columns:
seqnames ranges strand | tx_id tx_name
<Rle> <IRanges> <Rle> | <integer> <character>
uc001aaa.3 chr1 9874-12073 + | 1 uc001aaa.3
uc010nxq.1 chr1 9874-12073 + | 2 uc010nxq.1
uc010nxr.1 chr1 9874-12073 + | 3 uc010nxr.1
uc001aal.1 chr1 67091-69290 + | 4 uc001aal.1
uc001aaq.2 chr1 319084-321283 + | 5 uc001aaq.2
## Now I already said that you need to use subsetByOverlaps, and I just got all the promoter regions
## you might want other genomic regions, but this is just an example so you have to figure out how to do what you want if it's something different
## Now get the mean M-value, per sample, per region, for the first 20
> fakeO <- do.call(rbind, lapply(1:20, function(x) colMeans(getM(subsetByOverlaps(z, pp[x])))))
> fakeO
5723646052_R02C02 5723646052_R04C01 5723646052_R05C02 5723646053_R04C02
[1,] NaN NaN NaN NaN
[2,] NaN NaN NaN NaN
[3,] NaN NaN NaN NaN
[4,] 0.4886808 0.9086010 -0.2618946 -0.1448504
[5,] NaN NaN NaN NaN
[6,] NaN NaN NaN NaN
[7,] NaN NaN NaN NaN
[8,] NaN NaN NaN NaN
[9,] NaN NaN NaN NaN
[10,] NaN NaN NaN NaN
[11,] NaN NaN NaN NaN
[12,] NaN NaN NaN NaN
[13,] -2.0552704 -1.9338784 -2.4166071 -2.2670802
[14,] -5.0269002 -4.8170834 -4.9346891 -4.5480297
[15,] -5.0269002 -4.8170834 -4.9346891 -4.5480297
[16,] -5.0269002 -4.8170834 -4.9346891 -4.5480297
[17,] -5.0269002 -4.8170834 -4.9346891 -4.5480297
[18,] -5.0269002 -4.8170834 -4.9346891 -4.5480297
[19,] -5.0269002 -4.8170834 -4.9346891 -4.5480297
[20,] -0.2155665 -0.2172631 -0.8050240 -0.2081479
5723646053_R05C02 5723646053_R06C02
[1,] NaN NaN
[2,] NaN NaN
[3,] NaN NaN
[4,] 0.42406292 -0.5063752
[5,] NaN NaN
[6,] NaN NaN
[7,] NaN NaN
[8,] NaN NaN
[9,] NaN NaN
[10,] NaN NaN
[11,] NaN NaN
[12,] NaN NaN
[13,] -1.99018087 -2.5155456
[14,] -4.72432917 -4.9795912
[15,] -4.72432917 -4.9795912
[16,] -4.72432917 -4.9795912
[17,] -4.72432917 -4.9795912
[18,] -4.72432917 -4.9795912
[19,] -4.72432917 -4.9795912
[20,] -0.03463982 -1.0646269
Having shown how to do what you specifically asked about, I will now caution you against doing that sort of thing. I mean it might be a legit thing to do, but it's non-standard. Usually you would look for regions that look like peaks of differential methylation and then try to determine if they are different. The general consensus is that these regions should look like peaks, and there are different ways of detecting them. minfi has the bumphunter method, DMRcate uses a Gaussian kernel, etc. Those are more sophisticated than a simple mean over a region.
Thank you James. That is very helpful. I will give it a try.
Hi James! I have similar question with the original post. But after I got the mapped probes, how should I calculate the mean methylation value of a specific gene from the probes mapped to the gene location? Seems there's no information on which probes mapped to which genes?
Here's an example. Your homework is to decipher what I did.
Having shown how to do what you specifically asked about, I will now caution you against doing that sort of thing. I mean it might be a legit thing to do, but it's non-standard. Usually you would look for regions that look like peaks of differential methylation and then try to determine if they are different. The general consensus is that these regions should look like peaks, and there are different ways of detecting them.
minfi
has thebumphunter
method,DMRcate
uses a Gaussian kernel, etc. Those are more sophisticated than a simple mean over a region.Hi James! Thank you very much for your detailed example and kind suggestions. I'll look into the bumphunter method.