GDC legacy archive retired. So, I downloaded the TCGA methylation data in hg38 (450K) using TCGAbiolinks. I'd like to use DMRcate to find DMRs. But the problem is that the annotation is hg19 (IlluminaHumanMethylation450kanno.ilmn12.hg19). I am tring to modify the "cpg.annotate" function, but got stuck when playing with "makeGenomicRatioSetFromMatrix" function to use my home made annotation for methylation array in hg38 (450K) (based on files here http://zwdzwd.github.io/InfiniumAnnotation). Actually, I don't know how to modify the funciton in makeGenomicRatioSetFromMatrix for this part as below to use my annotation rather than the feeded value of "ilmn12.hg19". I was also trying to make a GenomicRatioSet using my homemade annotation, but failed.
out <- GenomicRatioSet(gr = gr[ind2, ], Beta = NULL,
M = mat[ind1, , drop = FALSE], CN = NULL, colData = pData,
annotation = c(array = array, annotation = annotation),
preprocessMethod = preprocessing)
So, the question is that how should I apply "cpg.annotate" to TCGA methylation data in hg38 (450K)?
Another confusion is that I see "DMR.plot" has an option of "genome" which can be "hg38" (https://www.bioconductor.org/packages/devel/bioc/manuals/DMRcate/man/DMRcate.pdf). Is "hg38" only for EPICv2 in hg38?
See a related question here https://www.biostars.org/p/9587144/ .
Thanks a lot!
Hi Xiaofei,
Thanks for this. If your data is from 450K, you'll have to call your DMRs in hg19, and then lift the DMR ranges over to hg38 post-hoc. DMRcate is one-to-one with regards to platform -> reference, since it follows the Illumina-provided annotation.
450K: IlluminaHumanMethylation450kanno.ilmn12.hg19
EPICv1: IlluminaHumanMethylationEPICanno.ilm10b4.hg19
EPICv2: IlluminaHumanMethylationEPICv2anno.20a1.hg38
cpg.annotate() isn't built for customised/homemade annotations, but you're more than welcome to fork the git (https://github.com/timpeters82/DMRcate-devel/) adapt it for your own needs.
Cheers, Tim
Also, re DMR.plot(), yes the understanding is that all DMRs from EPICv2 should be plotted in hg38. I've left that implied for the user since the same function uses sequencing data but if it gets too confusing I'll force the annotations from array data in a future commit.