extract CpGs from DMRcate results
Entering edit mode
landenshanie ▴ 40
Last seen 12 months ago


I have used the excellent package DMRcate and I am trying to extract the CpG names included in each DMR from my results.ranges. I have found this question many times over online forums but I think they are outdated as when I do this code that the creators of the package recommended:

locs <- IlluminaHumanMethylationEPICanno.ilm10b2.hg19::Locations

locs.ranges <- GRanges(locs$chr, IRanges(locs$pos, locs$pos))

names(locs.ranges) <- rownames(locs)

DMR.IDs <- lapply(results.ranges, function (x) names(locs.ranges[locs.ranges %over% x]))

I get this error on the last line of code: Error in getListElement(x, i, ...) : GRanges objects don't support [[, as.list(), lapply(), or unlist() at the moment

Found on other forums that a "for loop" can be used on granges objects instead of lapply. But I don't know how to convert this lapply function to a for loop in this context for it to work.

Can anyone help with converting this last line of code to a for loop, or anything else that would extract the CpG names from a DMR?

Thank you, Shanie

dmrcate cpgs results.ranges DMRcate • 1.5k views
Entering edit mode
Tim Peters ▴ 150
Last seen 16 hours ago

Hi Shanie,

James' answer is worth reading: https://support.bioconductor.org/p/125350/#125410

From the vignette example, you can get your CpG IDs from 1 DMR in one line:

 > rownames(subsetByOverlaps(tcell, results.ranges[1]))
 [1] "cg11423206" "cg24345747" "cg13803976" "cg26057751" "cg07152196" "cg12653796" "cg00916536" "cg09046939" "cg13946520" "cg17108819"
[11] "cg27247697" "cg21578555" "cg02170525" "cg27502457" "cg25355010" "cg09318840" "cg08506127" "cg18830527" "cg21648425" "cg15195030"
[21] "cg09849552" "cg18174654" "cg12344862" "cg13681325" "cg25939861" "cg03196485"

And then for all DMRs:

allDMRcpgs <- sapply(1:length(results.ranges), function (x) rownames(subsetByOverlaps(tcell, results.ranges[x])))

Cheers, Tim

Entering edit mode

Thanks so much Tim for the quick reply. However, I am getting "NULL" when I try that code.

I don't know whether this is the issue, but I am getting NULL for results.ranges@ranges@NAMES. This seems wrong and I cant figure out why cpg names aren't appearing there as they do in the annotation@ranges@NAMES.

Thanks again, Shanie

Entering edit mode

Hi Shanie,

Did you get NULL from the vignette example, or your own example? If the latter could you please post your workflow with sessionInfo()? If the former just post the sessionInfo() and I'll figure out what is going on.

DMR ranges won't have CpG names because, well, they're DMRs, not CpGs. So you won't find the CpG IDs there.

Cheers, Tim

Entering edit mode

Hi Tim,

I get NULL for my own data. I have 92 DMRs and in my results.ranges I can see the chrom, start, no. cpgs etc..

Here is my work flow:

annotated <- GRanges(as.character(annotationoverlap$V1), IRanges(start=c(annotationoverlap$V2),end=c(annotationoverlap$V3)), stat = resultssexfull2$t, diff = resultssexfull2$logFC, ind.fdr = resultssexfull2$adj.P.Val, is.sig = resultssexfull2$adj.P.Val < 0.05) names(annotated) <- rownames(resultssexfull2)

annotated <- sort(annotated)

annotated <- new("CpGannotated", ranges = annotated)

DMR <- dmrcate(annotated, C=2, min.cpgs = 2) results.ranges <- extractRanges(DMR, genome = "hg19")

locs <- IlluminaHumanMethylationEPICanno.ilm10b2.hg19::Locations

locs.ranges <- GRanges(locs$chr, IRanges(locs$pos, locs$pos))

names(locs.ranges) <- rownames(locs)

rownames(subsetByOverlaps(locs.ranges, results.ranges[43]))```

Entering edit mode

And session info:

R version 4.0.0 (2020-04-24) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 17763)

Matrix products: default

locale: [1] LCCOLLATE=EnglishUnited States.1252 LCCTYPE=EnglishUnited States.1252 LCMONETARY=EnglishUnited States.1252 [4] LCNUMERIC=C LCTIME=English_United States.1252

attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets methods base

other attached packages: [1] DMRcatedata2.6.0 ExperimentHub1.14.0 AnnotationHub2.20.0 BiocFileCache1.12.0
[5] dbplyr1.4.3 GenoGAM2.6.0 data.table1.12.8 Matrix1.2-18
[9] HDF5Array1.16.0 rhdf52.32.0 DMRcate2.2.0 minfi1.34.0
[13] bumphunter1.30.0 locfit1.5-9.4 iterators1.0.12 foreach1.5.0
[17] Biostrings2.56.0 XVector0.28.0 SummarizedExperiment1.18.1 DelayedArray0.14.0
[21] matrixStats0.56.0 Biobase2.48.0 GenomicRanges1.40.0 GenomeInfoDb1.24.0
[25] IRanges2.22.1 S4Vectors0.26.1 BiocGenerics_0.34.0

Entering edit mode

loaded via a namespace (and not attached): [1] R.utils2.9.2 tidyselect1.1.0
[3] IlluminaHumanMethylationEPICanno.ilm10b2.hg190.6.0 RSQLite2.2.0
[5] AnnotationDbi1.50.0 htmlwidgets1.5.1
[7] grid4.0.0 BiocParallel1.22.0
[9] munsell0.5.0 codetools0.2-16
[11] preprocessCore1.50.0 statmod1.4.34
[13] colorspace1.4-1 knitr1.28
[15] rstudioapi0.11 GenomeInfoDbData1.2.3
[17] bit640.9-7 vctrs0.3.0
[19] lambda.r1.2.4 xfun0.14
[21] biovizBase1.36.0 R62.4.1
[23] illuminaio0.30.0 AnnotationFilter1.12.0
[25] bitops1.0-6 reshape0.8.8
[27] assertthat0.2.1 IlluminaHumanMethylation450kanno.ilmn12.hg190.6.0 [29] promises1.1.0 scales1.1.1
[31] bsseq1.24.0 nnet7.3-14
[33] gtable0.3.0 ensembldb2.12.1
[35] rlang0.4.6 genefilter1.70.0
[37] splines4.0.0 rtracklayer1.48.0
[39] lazyeval0.2.2 acepack1.4.1
[41] DSS2.36.0 GEOquery2.56.0
[43] dichromat2.0-0 checkmate2.0.0
[45] BiocManager1.30.10 yaml2.2.1
[47] GenomicFeatures1.40.0 backports1.1.6
[49] httpuv1.5.2 Hmisc4.4-0
[51] tools4.0.0 nor1mix1.3-0
[53] ggplot23.3.0 ellipsis0.3.1
[55] RColorBrewer1.1-2 siggenes1.62.0
[57] Rcpp1.0.4.6 plyr1.8.6
[59] base64enc0.1-3 progress1.2.2
[61] zlibbioc1.34.0 purrr0.3.4
[63] RCurl1.98-1.2 prettyunits1.1.1
[65] rpart4.1-15 openssl1.4.1
[67] cluster2.1.0 magrittr1.5
[69] futile.options1.0.1 IlluminaHumanMethylationEPICanno.ilm10b4.hg190.6.0 [71] ProtGenerics1.20.0 missMethyl1.22.0
[73] hms0.5.3 mime0.9
[75] xtable1.8-4 XML3.99-0.3
[77] jpeg0.1-8.1 readxl1.3.1
[79] mclust5.4.6 gridExtra2.3

I have until row 147 but it wont let me post anymore. Really appreciate your help :)

Entering edit mode

Hi Shanie,

I can see 2 problems here:

1) Why are you constructing your own S4 CpGannotated object? The cpg.annotate() function specifically exists for that. Do you have a GenomicRatioSet or M-value matrix that houses your methylation data?

2) You are also constructing your own custom CpG ranges object locs.ranges to pass to subsetByOverlaps which is a GRanges object, not a GenomicRatioSet object. The latter has rownames but the former does not; evidenced by your line names(locs.ranges) <- rownames(locs) . So that's why you are getting NULL. You might get the result if you need if you call names(subsetByOverlaps(... instead of rownames(subsetByOverlaps(..., but first I would go right back to your normalised data and run cpg.annotate() first.

Cheers, Tim

Entering edit mode

Thanks so much! I got it :) Some happy dances and squeels were had.

  1. We use the annotation from Zhou et al, and we have added genehancer and chromatin states information to it.

  2. That was helpful very helpful. I had to use the annotated Granges object before I turned it into a CpGannotated object. Then indeed it worked to use names(... instead of rownames(...

Thanks so much for the tips in the right direction! All the best, Shanie


Login before adding your answer.

Traffic: 417 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6