Question

How to pick Illumina EPIC Array probe IDs within a specific genomic region?

0

Entering edit mode

poojitha.stemcell ▴ 10

@poojithastemcell-11859

Last seen 3.1 years ago

United Kingdom

I have a list of genes for which I need to pull out related DNA methylation data from my EPIC array dataset. I would like to visualise this information in a genomic browser if possible. I want to pick CpGs from different genomic regions in and around the specific genes i.e. gene body, 5'UTR, 3'UTR, TSS, promoter, etc. Can anyone help me with this? Is there any online tool I can use to create a track of my data in a genome browser?

IlluminaHumanMethylationEPICmanifest • 2.1k views

ADD COMMENT • link updated 2.2 years ago by Di • 0 • written 3.1 years ago by poojitha.stemcell ▴ 10

score 1 · Answer 1 · 2021-03-18

You are asking a pretty general question, which is difficult to answer. Here is a specific answer, which may be helpful in orienting you.

# example data
> library(minfiData)
> data(MsetEx)
## need to map to the genome - if you use minfi it usually happens as part of the preprocessing
> MsetEx <- mapToGenome(MsetEx)
> MsetEx
class: GenomicMethylSet 
dim: 485512 6 
metadata(0):
assays(2): Meth Unmeth
rownames(485512): cg13869341 cg14008030 ... cg08265308 cg14273923
rowData names(0):
colnames(6): 5723646052_R02C02 5723646052_R04C01 ... 5723646053_R05C02
  5723646053_R06C02
colData names(13): Sample_Name Sample_Well ... Basename filenames
Annotation
  array: IlluminaHumanMethylation450k
  annotation: ilmn12.hg19
Preprocessing
  Method: Raw (no normalization or bg correction)
  minfi version: 1.21.2
  Manifest version: 0.4.0

## now we can extract positional data using a GRanges object to say what positions we care about I'll just fake something up

> fakeGR <- GRanges(c("chr1", "chr2"), IRanges(c(15000, 15000), c(30000, 30000)))

> ms <- subsetByOverlaps(MsetEx, fakeGR)
> ms
class: GenomicMethylSet 
dim: 5 6 
metadata(0):
assays(2): Meth Unmeth
rownames(5): cg13869341 cg14008030 cg12045430 cg20826792 cg00381604
rowData names(0):
colnames(6): 5723646052_R02C02 5723646052_R04C01 ... 5723646053_R05C02
  5723646053_R06C02
colData names(13): Sample_Name Sample_Well ... Basename filenames
Annotation
  array: IlluminaHumanMethylation450k
  annotation: ilmn12.hg19
Preprocessing
  Method: Raw (no normalization or bg correction)
  minfi version: 1.21.2
  Manifest version: 0.4.0

## See? It's smaller now

## and now we could convert the data into a GRanges
> z <- rowRanges(ms)

> mcols(z) <- cbind(getMeth(ms), getUnmeth(ms))
> z
GRanges object with 5 ranges and 12 metadata columns:
             seqnames    ranges strand | 5723646052_R02C02 5723646052_R04C01
                <Rle> <IRanges>  <Rle> |         <numeric>         <numeric>
  cg13869341     chr1     15865      * |             37782             43291
  cg14008030     chr1     18827      * |             13119             21718
  cg12045430     chr1     29407      * |              1221              1759
  cg20826792     chr1     29425      * |              2708              2967
  cg00381604     chr1     29435      * |               587               555
             5723646052_R05C02 5723646053_R04C02 5723646053_R05C02
                     <numeric>         <numeric>         <numeric>
  cg13869341             46988             36686             40965
  cg14008030             21113             19502             23834
  cg12045430              1162              3836              1543
  cg20826792              2180              3906              3035
  cg00381604               506               873               686
             5723646053_R06C02 5723646052_R02C02 5723646052_R04C01
                     <numeric>         <numeric>         <numeric>
  cg13869341             44036              7270              2778
  cg14008030             22968              7122             13020
  cg12045430              1381             16802             20690
  cg20826792              2807             17744             18621
  cg00381604               945             16059             17723
             5723646052_R05C02 5723646053_R04C02 5723646053_R05C02
                     <numeric>         <numeric>         <numeric>
  cg13869341              4909              7187              8529
  cg14008030              8597              8185             11111
  cg12045430             17308             17867             19840
  cg20826792             17640             17201             19277
  cg00381604             14534             15602             17430
             5723646053_R06C02
                     <numeric>
  cg13869341              1899
  cg14008030              9067
  cg12045430             19557
  cg20826792             18413
  cg00381604             15559
  -------
  seqinfo: 24 sequences from hg19 genome; no seqlengths

And you could use e.g., Gviz to make plots of that.