Question: Create a matrix to show overlaps among multiple GRanges
1
gravatar for Vinicius Henrique da Silva
6 months ago by
Brazil

I am trying to find a way to efficiently extract a matrix showing '0' or '1' when comparing different GRange objects. In my example:

gr.1 <- regioneR::createRandomRegions(nregions=1000, length.mean=500000, length.sd=30000)
gr.2 <- regioneR::createRandomRegions(nregions=1000, length.mean=500000, length.sd=30000)
gr.3 <- regioneR::createRandomRegions(nregions=1000, length.mean=500000, length.sd=30000)

I tried findOverlaps to evaluate the overlaps among these regions but apparently it can't deal with more than two GRanges:

> GenomicRanges::findOverlaps(gr.1, gr.2, gr.3)
> Error in IRanges:::NCList_find_overlaps_in_groups(ranges(query),
> q_space,  :    'maxgap' must be a single integer

Moreover, my required output would be something like this example data-frame:

out <- "gr.1 gr.2 gr.3
chr1-1 1  0  1
chr1-2 1  1  0
chr10-3 0 1  0
chr10-4 1 1  0"

out <- read.table(text=out, header=TRUE)

Any idea to wisely export it using mainly functions from bioconductor packages?

genomicranges granges R • 210 views
ADD COMMENTlink modified 6 months ago by James W. MacDonald51k • written 6 months ago by Vinicius Henrique da Silva40
Answer: Create a matrix to show overlaps among multiple GRanges
1
gravatar for James W. MacDonald
6 months ago by
United States
James W. MacDonald51k wrote:

Do note that you cannot really have a matrix showing the overlap between the three GRanges objects, because there will be different regions defined in the three GRanges. Instead, you could define a common set of genomic ranges and then make an incidence matrix showing if a given range in a particular GRanges object overlaps.

> gr.1 <- regioneR::createRandomRegions(nregions=1000, length.mean=500000, length.sd=30000)
> gr.2 <- regioneR::createRandomRegions(nregions=1000, length.mean=500000, length.sd=30000)
> gr.3 <- regioneR::createRandomRegions(nregions=1000, length.mean=500000, length.sd=30000)

> base <- reduce(c(gr.1, gr.2, gr.3))
> base
GRanges object with 2096 ranges and 0 metadata columns:
                seqnames          ranges strand
                   <Rle>       <IRanges>  <Rle>
     [1]            chr1  729836-1633109      *
     [2]            chr1 1946672-2419469      *
     [3]            chr1 2952076-3463275      *
     [4]            chr1 3836898-4273778      *
     [5]            chr1 4931353-6065223      *
     ...             ...             ...    ...
  [2092]   chr6_qbl_hap6 3481669-4494167      *
  [2093]  chr6_ssto_hap7  717078-1631256      *
  [2094]  chr6_ssto_hap7 3139239-3635135      *
  [2095] chr17_ctg5_hap1   154779-591672      *
  [2096] chr17_ctg5_hap1  978830-1633449      *
  -------
  seqinfo: 93 sequences from an unspecified genome; no seqlengths

> z <- matrix(0,length(base),3)
> z[subjectHits(findOverlaps(gr.1, base)),1] <- 1
> z[subjectHits(findOverlaps(gr.2, base)),2] <- 1
> z[subjectHits(findOverlaps(gr.3, base)),3] <- 1
> colnames(z) <- paste0("gr.", 1:3)
> mcols(base) <- z
> base
GRanges object with 2096 ranges and 3 metadata columns:
                seqnames          ranges strand |      gr.1      gr.2      gr.3
                   <Rle>       <IRanges>  <Rle> | <numeric> <numeric> <numeric>
     [1]            chr1  729836-1633109      * |         0         1         1
     [2]            chr1 1946672-2419469      * |         0         1         0
     [3]            chr1 2952076-3463275      * |         0         1         0
     [4]            chr1 3836898-4273778      * |         0         1         0
     [5]            chr1 4931353-6065223      * |         0         1         1
     ...             ...             ...    ... .       ...       ...       ...
  [2092]   chr6_qbl_hap6 3481669-4494167      * |         1         1         0
  [2093]  chr6_ssto_hap7  717078-1631256      * |         1         1         0
  [2094]  chr6_ssto_hap7 3139239-3635135      * |         0         0         1
  [2095] chr17_ctg5_hap1   154779-591672      * |         0         0         1
  [2096] chr17_ctg5_hap1  978830-1633449      * |         1         1         0
  -------
  seqinfo: 93 sequences from an unspecified genome; no seqlengths

Is, I believe, pretty close to what you want?

ADD COMMENTlink written 6 months ago by James W. MacDonald51k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 228 users visited in the last hour