DMRcate results cg ID of significant DMR
3
0
Entering edit mode
@giovanni-calice-6415
Last seen 6 months ago
Italy

Hi all,

in DMRcate results I got no.probes associated to each significant DMR.

How do I get the ID of each of these probes clustered together into each DMR?

Giovanni

dmrcate • 2.3k views
1
Entering edit mode
Tim Peters ▴ 120
@tim-peters-7579
Last seen 4 months ago
Australia

Hi Giovanni,

I think these are good general post-filtering options. However, your options depend on the type of hypothesis you are testing. Do you only want long blocks of differential methylation or do you believe that very localised methylation may also affect your phenotype? This will inform your cutoff for no.probes. Are you specifically looking at promoter methylation? If yes then taking only those with a gene symbol association makes sense.

Thanks for using DMRcate as intended and good luck!

Best,

Tim

0
Entering edit mode

Hi Tim,

I agree with you but one clarification:

- I'm looking also at Body methylation and also to this region obviously there's a considerable Gene Symbol association; what I was wondering for some time if is it possible to know other informations, in addition to the hg19 coord., about probes without G.S. association because you know that there are some of these both in 450k array and in EPIC array.

Maybe I should post a separate question on this aspect.

Thanks for everything, Best

Giovanni

0
Entering edit mode

Hi Giovanni,

The annotations available through DMRcate are the Gencode transcript coordinates and the per-probe characteristics found in the attached annotation packages. The boilerplate I posted earlier in the thread should help you attach these to your DMRs. Anything else you'll need to import yourself.

Best,

Tim

0
Entering edit mode

Hi Tim,

I know the attached annotation packages available through DMRcate.

My comment is on the probes of these annotations that don't have more informations like Gene Symbol, Genomic Regions, etc.

Thanks for everything posted in this thread and good luck,

Giovanni

0
Entering edit mode

Hello Tim,

I have used package called DMRcate to analyse 450k data. I want to find the gene which associates the DMRs.And I met some questions.The outputs include gene_assoc, group, hg19coord, no.probes, minpval, meanpval and maxbetafc,when I used the old version to analyse 450k data. But when I update the package,the outputs became coord, no.cpgs, minfdr, Stouffer, maxbetafc,meanbetafc.There is no result of "gene_assoc".I want to find the gene names associating "coord",can you help me ? Can you please tel me how to associate the gene by using the newest DMRcate packages.

There follow the output results of the newest DMRcate. coord no.cpgs minfdr Stouffer maxbetafc meanbetafc 63999 chr6:33156164-33181870 265 0 0 -0.5008031 -0.02648790 63997 chr6:33128825-33149777 150 0 0 0.4176126 0.08611966 63917 chr6:32144195-32161004 128 0 0 -0.2574513 -0.03184096 63914 chr6:32114490-32123701 124 0 0 -0.4377015 -0.06195576 63889 chr6:31935801-31940855 101 0 0 -0.1555205 -0.02401999 12564 chr11:31817810-31841980 100 0 0 -0.4611059 -0.17113506

0
Entering edit mode
Tim Peters ▴ 120
@tim-peters-7579
Last seen 4 months ago
Australia

Hi Giovanni,

The following should get you the IDs. Say you're using EPIC array data and your DMRs are in results.ranges:

locs <- IlluminaHumanMethylationEPICanno.ilm10b2.hg19@data$Locations locs.ranges <- GRanges(locs$chr, IRanges(locs$pos, locs$pos))
names(locs.ranges) <- rownames(locs)
​DMR.IDs <- lapply(results.ranges, function (x) names(locs.ranges[locs.ranges %over% x]))

Best,

Tim

0
Entering edit mode

Hi Tim,

I did several tests with other methods on Ranges such as intersect(...) etc.

and now also through what you suggested everything works fine.

Many Thanks, Best

Giovanni

0
Entering edit mode

Hi Tim,

I'm doing some differential analyses on my processed data and in one of these I got a too big number of DMRs.

I would pre-filter probes but following your package's threads I found this will bias the method.

So I think it would be better post-filter the results:

- on no.probes > 1  (because no.probes = 1, would indicate DMPs instead of DMRs)

- on DMRs that effectively have a Gene Symbol associated

- on absolute value of maxbetafc greater than of 0.2 as minimum threshold of differential methylation level

What do you think and suggest about?

Giovanni

1
Entering edit mode

Hi Giovanni,

Any post-filtering decision is up to you - I've made DMRcate that way so user can have some control over what thresholding they believe is best. Many users I've encountered have made decisions in line with what you describe in your bullet points.

Best,

Tim

0
Entering edit mode

Hi Tim,

Thanks for response.

Best,

Giovanni

0
Entering edit mode

Hi Tim,

I have been using this code for a while now and it always worked for me and returned the CpGs constituting the DMRs. However, when I tried to use it today, locs turns out empty and the same with loc.ranges. I switched to using another function (makeGRanges) in package humarray. I used the annotation file object I had for 450k probes (fdata.all). Following is the command I used to generate loc.ranges which looks like what I used to obtain using the above commands specified by you. But using this in the last command line does not return CpG IDs. I am not sure where I am making a mistake. I am not able to find the function(x) on github to be able to evaluate what it does. Please take a look at the codes and results below and let me know if you are able to spot any error. It will also be very useful to know any alternate methods to extract the CpGs from DMRcate results.

>loc.ranges<-makeGRanges(fdata.all$CHR, pos = fdata.all$MAPINFO, row.names = rownames(fdata.all), build = "hg19")

GRanges object with 6 ranges and 0 metadata columns:
seqnames                 ranges strand
<Rle>              <IRanges>  <Rle>

>​DMR.IDs <- lapply(results.ranges, function (x) names(locs.ranges[locs.ranges %over% x]))

> DMR.IDs[1]
\$844

cg00000029       16 [ 53468112,  53468112]      *
cg00000108        3 [ 37459206,  37459206]      *
cg00000109        3 [171916037, 171916037]      *
cg00000165        1 [ 91194674,  91194674]      *
cg00000236        8 [ 42263294,  42263294]      *
cg00000289       14 [ 69341139,  69341139]      *
-------
seqinfo: 24 sequences from hg19 genome; no seqlengths

Thank you,

Poojitha Rajasekar

0
Entering edit mode

Hi Poojitha,

Looks like the S4 structure of IlluminaHumanMethylationEPICanno.ilm10b2.hg19 has changed.

Try

locs <- IlluminaHumanMethylationEPICanno.ilm10b2.hg19::Locations

Cheers,

Tim

0
Entering edit mode

Its not a surprise that it is such a small thing that gives error. Thank you Tim. The codes are working now.

Cheers!

Poojitha

0
Entering edit mode

Hi Tim,

I have used these set of commands before and it worked perfectly fine. However, I ran into the following error today,

DMR.IDs <- lapply(DMR1, function (x) names(locs.ranges[locs.ranges %over% x])) Error during wrapup: GRanges objects don't support [[, as.list(), lapply(), or unlist() at the moment

Is there a replacement for lappy or any new set of commands I can use to retrieve corresponding CpGs for regions in results.ranges?

Thank you.

Poojitha

0
Entering edit mode

Hi Tim,

I have used these set of commands before and it worked perfectly fine. However, I ran into the following error today,

DMR.IDs <- lapply(DMR1, function (x) names(locs.ranges[locs.ranges %over% x])) Error during wrapup: GRanges objects don’t support [[, as.list(), lapply(), or unlist() at the moment

Is there a replacement for lappy or any new set of commands I can use to retrieve corresponding CpGs for regions in results.ranges?

Thank you.

Poojitha

0
Entering edit mode

Hi Poojitha,

A bit of digging reveals this is a temporary situation: https://github.com/Bioconductor/GenomicRanges/blob/master/R/GenomicRanges-class.R#L597-L599 https://support.bioconductor.org/p/109137/

So I guess for the time being you'll need to use a for loop:

constituent.cpgs <- list()
for (i in 1:length(results.ranges)){
constituent.cpgs <- c(constituent.cpgs, locs.ranges[locs.ranges %over% results.ranges[i]])
}

Best, Tim

0
Entering edit mode
574233829 • 0
@574233829-16391
Last seen 2.2 years ago

Hello Tim,

I have used package called DMRcate to analyse 450k data. I want to find the gene which associates the DMRs.And I met some questions.The outputs include gene_assoc, group, hg19coord, no.probes, minpval, meanpval and maxbetafc,when I used the old version to analyse 450k data. But when I update the package,the outputs became coord, no.cpgs, minfdr, Stouffer, maxbetafc,meanbetafc.There is no result of "gene_assoc".I want to find the gene names associating "coord",can you help me ? Can you please tel me how to associate the gene by using the newest DMRcate packages.

There follow the output results of the newest DMRcate. coord no.cpgs minfdr Stouffer maxbetafc meanbetafc 63999 chr6:33156164-33181870 265 0 0 -0.5008031 -0.02648790 63997 chr6:33128825-33149777 150 0 0 0.4176126 0.08611966 63917 chr6:32144195-32161004 128 0 0 -0.2574513 -0.03184096 63914 chr6:32114490-32123701 124 0 0 -0.4377015 -0.06195576 63889 chr6:31935801-31940855 101 0 0 -0.1555205 -0.02401999 12564 chr11:31817810-31841980 100 0 0 -0.4611059 -0.17113506

0
Entering edit mode

Hi,

Just like with any epigenetic mark, not all DMRs have an unambiguous gene assignment. This is why I've restricted the gene information to overlapping promoters. If you're interested in finding the nearest genes, I would use a method such as this: https://bioconductor.org/packages/release/bioc/vignettes/ChIPpeakAnno/inst/doc/quickStart.html#use-case-3-annotate-the-peaks-in-both-sides-with-nearest-transcription-start-sites-within-5k-bps.

Best, Tim