How to find nearest CpG islands around miRNA genes using granges?
2
0
Entering edit mode
Adam_M • 0
@adam_m-13707
Last seen 5.3 years ago

Hello,

I'm trying to find the nearest CpG islands around each miRNA of interest(gene) and also return distance between them.
For example for hsa-mir-137 the nearest CpG islands are 73 and 91 where CpG 73 overlaps with this miRNA gene and CpG 91 is ~7500 bp away. What is more is it possible to filter only miRNAs where CpG overlaps it or where distance between them is less than 1000 bp?

Here is what I got:

    hub <- AnnotationHub()
    query(hub, c("cpg","hg19"))
    cpgs <- hub[["AH5086"]] ## the granges of all CpG's (hg19)

    gffRangedData<-import.gff("allmirna.gff3")
    mirna_granges<-as(gffRangedData, "GRanges") ## I downloaded all annotated miRNAs from miRBase and I converted them to granges object

    sample_mirnas <- read.table("sample_data.txt", header = TRUE, sep ='\t') #here are my 3 miRNAs of interest (hsa-mir-106b,hsa-mir-150,hsa-mir-107)
    final_mirnas <- mirna_granges[mirna_granges$Name %in% sample_mirnas$mirna] # now I wanted to have granges only for my miRNA so I subset data from miRBase

So as a result I have:

1) CpG's granges

    > cpgs
    GRanges object with 28691 ranges and 1 metadata column:
                           seqnames           ranges strand |        name
                              <Rle>        <IRanges>  <Rle> | <character>
          [1]                  chr1 [ 28736,  29810]      * |    CpG:_116
          [2]                  chr1 [135125, 135563]      * |     CpG:_30
          [3]                  chr1 [327791, 328229]      * |     CpG:_29
          [4]                  chr1 [437152, 438164]      * |     CpG:_84
          [5]                  chr1 [449274, 450544]      * |     CpG:_99
          ...                   ...              ...    ... .         ...
      [28687]  chr9_gl000201_random [ 15651,  15909]      * |     CpG:_30
      [28688]  chr9_gl000201_random [ 26397,  26873]      * |     CpG:_43
      [28689] chr11_gl000202_random [ 16284,  16540]      * |     CpG:_23
      [28690] chr17_gl000204_random [ 54686,  57368]      * |    CpG:_228
      [28691] chr17_gl000205_random [117501, 117801]      * |     CpG:_23
      -------
      seqinfo: 93 sequences (1 circular) from hg19 genome

2) Granges of miRNA's of interest

    > final_mirnas
    GRanges object with 3 ranges and 8 metadata columns:
          seqnames                 ranges strand |   source                     type     score
             <Rle>              <IRanges>  <Rle> | <factor>                 <factor> <numeric>
      [1]    chr10 [ 89592747,  89592827]      - |     <NA> miRNA_primary_transcript      <NA>
      [2]    chr19 [ 49500785,  49500868]      - |     <NA> miRNA_primary_transcript      <NA>
      [3]     chr7 [100093993, 100094074]      - |     <NA> miRNA_primary_transcript      <NA>
              phase          ID           Alias         Name Derives_from
          <integer> <character> <CharacterList>  <character>  <character>
      [1]      <NA>   MI0000114       MI0000114  hsa-mir-107         <NA>
      [2]      <NA>   MI0000479       MI0000479  hsa-mir-150         <NA>
      [3]      <NA>   MI0000734       MI0000734 hsa-mir-106b         <NA>
      -------
      seqinfo: 24 sequences from an unspecified genome; no seqlengths

Now, how to combine this data togesher to obtain something like (results are fake, made them up for question):

    miRNA            CpG_upstream    CpG_downstream       CpG_up_distance          CpG_down_distance
    hsa-mir-106b       57                    23                            852                               15
    hsa-mir-150        NA                   77                              NA                               987
    hsa-mir-107        12                    NA                            -12                               NA

I put `NA`If distance between miRNA and CpG is greater than 1000bp but it can be actual value, what is more i put negative number when CpG and miRNA gene overlap.

I tried to use distance() and follow() functions but there are only bare numbers, can you help me how to figure it out?

granges bioconductor R • 1.4k views
ADD COMMENT
2
Entering edit mode
Julie Zhu ★ 4.3k
@julie-zhu-3596
Last seen 4 months ago
United States
Adam, Please try annotatePeakInBatch function in ChIPPeakAnno package. Best regards, Julie On Aug 10, 2017, at 2:13 AM, Adam_M [bioc] <noreply@bioconductor.org<mailto:noreply@bioconductor.org>> wrote: Activity on a post you are following on support.bioconductor.org<https: support.bioconductor.org=""> User Adam_M<https: support.bioconductor.org="" u="" 13707=""/> wrote Question: How to find nearest CpG islands around miRNA genes using granges?<https: support.bioconductor.org="" p="" 99050=""/>: Hello, I'm trying to find the nearest CpG islands around each miRNA of interest(gene) and also return distance between them. For example for hsa-mir-137 the nearest CpG islands are 73 and 91 where CpG 73 overlaps with this miRNA gene and CpG 91 is ~7500 bp away. What is more is it possible to filter only miRNAs where CpG overlaps it or where distance between them is less than 1000 bp? Here is what I got: hub <- AnnotationHub() query(hub, c("cpg","hg19")) cpgs <- hub[["AH5086"]] ## the granges of all CpG's (hg19) gffRangedData<-import.gff("allmirna.gff3") mirna_granges<-as(gffRangedData, "GRanges") ## I downloaded all annotated miRNAs from miRBase and I converted them to granges object sample_mirnas <- read.table("sample_data.txt", header = TRUE, sep ='\t') #here are my 3 miRNAs of interest (hsa-mir-106b,hsa-mir-150,hsa-mir-107) final_mirnas <- mirna_granges[mirna_granges$Name %in% sample_mirnas$mirna] # now I wanted to have granges only for my miRNA so I subset data from miRBase **So as a result I have:** **1) CpG's granges** > cpgs GRanges object with 28691 ranges and 1 metadata column: seqnames ranges strand | name <rle> <iranges> <rle> | <character> [1] chr1 [ 28736, 29810] * | CpG:_116 [2] chr1 [135125, 135563] * | CpG:_30 [3] chr1 [327791, 328229] * | CpG:_29 [4] chr1 [437152, 438164] * | CpG:_84 [5] chr1 [449274, 450544] * | CpG:_99 ... ... ... ... . ... [28687] chr9_gl000201_random [ 15651, 15909] * | CpG:_30 [28688] chr9_gl000201_random [ 26397, 26873] * | CpG:_43 [28689] chr11_gl000202_random [ 16284, 16540] * | CpG:_23 [28690] chr17_gl000204_random [ 54686, 57368] * | CpG:_228 [28691] chr17_gl000205_random [117501, 117801] * | CpG:_23 ------- seqinfo: 93 sequences (1 circular) from hg19 genome **2) Granges of miRNA's of interest** > final_mirnas GRanges object with 3 ranges and 8 metadata columns: seqnames ranges strand | source type score <rle> <iranges> <rle> | <factor> <factor> <numeric> [1] chr10 [ 89592747, 89592827] - | <na> miRNA_primary_transcript <na> [2] chr19 [ 49500785, 49500868] - | <na> miRNA_primary_transcript <na> [3] chr7 [100093993, 100094074] - | <na> miRNA_primary_transcript <na> phase ID Alias Name Derives_from <integer> <character> <characterlist> <character> <character> [1] <na> MI0000114 MI0000114 hsa-mir-107 <na> [2] <na> MI0000479 MI0000479 hsa-mir-150 <na> [3] <na> MI0000734 MI0000734 hsa-mir-106b <na> ------- seqinfo: 24 sequences from an unspecified genome; no seqlengths **Now, how to combine this data togesher to obtain something like (results are fake, made them up for question):** miRNA CpG_upstream CpG_downstream CpG_up_distance CpG_down_distance hsa-mir-106b 57 23 852 15 hsa-mir-150 NA 77 NA 987 hsa-mir-107 12 NA -12 NA **I put `NA`If distance between miRNA and CpG is greater than 1000bp but it can be actual value, what is more i put negative number when CpG and miRNA gene overlap.** **I tried to use distance() and follow() functions but there are only bare numbers, can you help me how to figure it out?** ________________________________ Post tags: granges, bioconductor, R You may reply via email or visit How to find nearest CpG islands around miRNA genes using granges?
ADD COMMENT
0
Entering edit mode
Adam_M • 0
@adam_m-13707
Last seen 5.3 years ago

Julie,

I read the documentation and this function returned what I needed.

Thank you so much, great package!

ADD COMMENT
0
Entering edit mode
Adam, Great! Thanks for letting me know and the kind words! Best regards, Julie On Aug 10, 2017, at 8:43 AM, Adam_M [bioc] <noreply@bioconductor.org<mailto:noreply@bioconductor.org>> wrote: Activity on a post you are following on support.bioconductor.org<https: support.bioconductor.org=""> User Adam_M<https: support.bioconductor.org="" u="" 13707=""/> wrote Answer: How to find nearest CpG islands around miRNA genes using granges?<https: support.bioconductor.org="" p="" 99050="" #99070="">: Julie, I read the documentation and this function returned what I needed. Thank you so much, great package! ________________________________ Post tags: granges, bioconductor, R You may reply via email or visit A: How to find nearest CpG islands around miRNA genes using granges?
ADD REPLY

Login before adding your answer.

Traffic: 768 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6