Question: Extract metadata from GRanges object for every genomic coordinate
0
gravatar for Robert
6 weeks ago by
Robert10
Robert10 wrote:

It is easy enough to extract genomic ranges and associated meta data from a GRanges object.

library(GenomicRanges)

gr1 <- GRanges(
  seqnames = Rle("chr1", 3),
  ranges = IRanges(c(1, 2, 8), end = c(2, 5, 9)),
  score = 1:3)

show(gr1)
## GRanges object with 3 ranges and 1 metadata column:
##       seqnames    ranges strand |     score
##          <Rle> <IRanges>  <Rle> | <integer>
##   [1]     chr1       1-2      * |         1
##   [2]     chr1       2-5      * |         2
##   [3]     chr1       8-9      * |         3
##   -------
##   seqinfo: 1 sequence from an unspecified genome; no seqlengths

ranges(gr1)
## IRanges object with 3 ranges and 0 metadata columns:
##           start       end     width
##       <integer> <integer> <integer>
##   [1]         1         2         2
##   [2]         2         5         4
##   [3]         8         9         2

mcols(gr1)
## DataFrame with 3 rows and 1 column
##       score
##   <integer>
## 1         1
## 2         2
## 3         3

However, I would like to extract the metadata from a GRanges object for each genomic coordinate in the ranges and insert a 0 for uncovered coordinates. One can achieve this by converting the GRanges object to a data.frame and filling in the metadata values.

df1 <- as.data.frame(gr1)

df1.expanded <- data.frame(chr = "chr1", coord = min(df1$start):(max(df1$end)-1), score = 0)

for(n in 1:nrow(df1.expanded)){
  try(df1.expanded$score[n] <- df1$score[df1$start <= df1.expanded$coord[n] & df1$end > df1.expanded$coord[n]], silent = TRUE)
}

df1.expanded
##    chr coord score
## 1 chr1     1     1
## 2 chr1     2     2
## 3 chr1     3     2
## 4 chr1     4     2
## 5 chr1     5     0
## 6 chr1     6     0
## 7 chr1     7     0
## 8 chr1     8     3

Is there an easier and more efficient way to do this? What I want to do in the end is to correlate the metadata of two GRanges objects in specific genome regions. For one of the GRanges objects uncovered genomic ranges should get a metadata score 0. Are there better ways to achieve this than transforming GRanges objects to data.frames and "expanding" them as shown above?

genomicranges granges • 165 views
ADD COMMENTlink modified 29 days ago • written 6 weeks ago by Robert10
Answer: Extract metadata from GRanges object for every genomic coordinate
2
gravatar for Michael Lawrence
6 weeks ago by
United States
Michael Lawrence11k wrote:

It is not necessary to use data.frame, even taking your general approach. Here is a rough sketch:

gr.expanded <- GRanges("chr1", IRanges(min(start(gr1)):(max(end(gr1))-1), width=1L), score=0L)

for(n in 1:length(gr.expanded)){
  try(gr.expanded$score[n] <- score(gr1)[start(gr1) <= start(gr.expanded)[n] & end(gr1) > start(gr.expanded)[n]], silent = TRUE)
}

I wouldn't recommend doing that, of course, but it's important to realize that GRanges objects behave much like data.frames.

Martin's answer looks good (probably not necessary to call range() though).

Another option is to just use the coverage vector infrastructure:

gp <- GRanges(coverage(gr1, weight="score"))

You might also look into the GPos container in case the correlation will be computed at the base level.

ADD COMMENTlink written 6 weeks ago by Michael Lawrence11k

Btw, the plyranges package skips the scary RleList intermediate:

gr <- compute_coverage(gr1, weight="score")
ADD REPLYlink written 6 weeks ago by Michael Lawrence11k

Thanks, Michael. That's very helpful to fill the uncovered regions. However, I'm still not sure how to get a GRanges object with all ranges having width 1. I would like to achieve this, as I have to plot and correlate several scores from different files at the single-nucleotide level.

ADD REPLYlink modified 6 weeks ago • written 6 weeks ago by Robert10

Thanks, Michael. The GPos containers seem very interesting. However, it seems like with GPos(<my GRanges object>) my metadata columns are lost. Do you know a workaround?

ADD REPLYlink written 6 weeks ago by Robert10

I sent an email to Herve about this 2 years ago but he never replied. It would be nice to do this:

gp <- GPos(coverage(gr1, weight="score"))

Instead, probably need this:

gr <- GRanges(coverage(gr1, weight="score"))
gp <- GPos(gr, score=rep(score, width(gr)))
ADD REPLYlink written 6 weeks ago by Michael Lawrence11k

Just to note that, in this comment https://support.bioconductor.org/p/122571/#122587 Robert means the intervals to be half-open, so that coverage and other GRanges functionality will not work; really these need to be closed intervals.

ADD REPLYlink written 6 weeks ago by Martin Morgan ♦♦ 23k
Answer: Extract metadata from GRanges object for every genomic coordinate
1
gravatar for Martin Morgan
6 weeks ago by
Martin Morgan ♦♦ 23k
United States
Martin Morgan ♦♦ 23k wrote:

Maybe one way would be to find the unscored ranges

unscored = setdiff(range(gr1), gr1)

and then update and append the ranges

unscored$score = 0
sort(append(gr1, unscored))
## GRanges object with 4 ranges and 1 metadata column:
##       seqnames    ranges strand |     score
##          <Rle> <IRanges>  <Rle> | <numeric>
##   [1]     chr1       1-2      * |         1
##   [2]     chr1       2-5      * |         2
##   [3]     chr1       6-7      * |         0
##   [4]     chr1       8-9      * |         3
##   -------
##   seqinfo: 1 sequence from an unspecified genome; no seqlengths

Note that in your original data position 2 has scores 1 and 2, whereas in your data.frame it has only score 2.

ADD COMMENTlink written 6 weeks ago by Martin Morgan ♦♦ 23k

Thanks, Martin. I'm working with .bedGraph files (zero-based half-open). That's were the overlapping range for position 2 came from.

ADD REPLYlink written 6 weeks ago by Robert10

using rtracklayer::import() correctly transforms BED coordinates to the standard 1 based closed intervals of Bioconductor, avoiding confusion that arises using 0-based half-open intervals in the GRanges infrastructure.

ADD REPLYlink written 6 weeks ago by Martin Morgan ♦♦ 23k
Answer: Extract metadata from GRanges object for every genomic coordinate
0
gravatar for Robert
29 days ago by
Robert10
Robert10 wrote:

Based on Michael's answer I can achieve my goal with this code:

library(GenomicRanges)

gr1 <- GRanges(
  seqnames = Rle("chr1", 3),
  ranges = IRanges(start = c(1, 2, 8) + 1, end = c(2, 5, 9)),
  score = 1:3)

gr1
# GRanges object with 3 ranges and 1 metadata column:
#     seqnames    ranges strand |     score
#        <Rle> <IRanges>  <Rle> | <integer>
# [1]     chr1         2      * |         1
# [2]     chr1       3-5      * |         2
# [3]     chr1         9      * |         3
# -------
# seqinfo: 1 sequence from an unspecified genome; no seqlengths

gr1.exp <- GRanges(coverage(gr1, weight="score"))

gp1 <- GPos(gr1.exp)

gp1$score <- rep(gr1.exp$score, width(gr1.exp))

gp1
# GPos object with 9 positions and 1 metadata column:
#     seqnames       pos strand |     score
#        <Rle> <integer>  <Rle> | <integer>
# [1]     chr1         1      * |         0
# [2]     chr1         2      * |         1
# [3]     chr1         3      * |         2
# [4]     chr1         4      * |         2
# [5]     chr1         5      * |         2
# [6]     chr1         6      * |         0
# [7]     chr1         7      * |         0
# [8]     chr1         8      * |         0
# [9]     chr1         9      * |         3
# -------
# seqinfo: 1 sequence from an unspecified genome
ADD COMMENTlink written 29 days ago by Robert10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 211 users visited in the last hour