Question: Aggregating scores of multiple metadata columns from one GRanges object on another
0
5 months ago by
gueux0
gueux0 wrote:

I have one GRanges object with thousands of score columns (tomap), and another with regions of interest and no metadata (roi). I am trying to map the max score from each column in tomap to the corresponding interval in roi.

I also want to retain the names of the score columns (in my real data these are meaningful names and not genralizable like score1, score2 etc...). I can do it for specific columns but am struggling to generalize it to every column.

Here is what I've got so far:

library(GenomicRanges)
tomap <- GRanges(
seqnames = Rle(c("chr1"), c(10)),
ranges = IRanges(1:10*10, end = 1:10*10+5),
score1 = runif(10),score2=runif(10),score3=runif(10),score4=runif(10),score5=runif(10))

roi <- GRanges(
seqnames = Rle(c("chr1"), c(5)),
ranges = IRanges(1:5*20 + floor(runif(5)*4), width = 10))

hits <- findOverlaps(roi, tomap, ignore.strand = TRUE)

ans<-roi
mcols(ans) <- aggregate(tomap, hits, score1=max(score1), score2= max(score2))

ans
#GRanges object with 5 ranges and 3 metadata columns:
#      seqnames    ranges strand |             grouping            score1            score2
#         <Rle> <IRanges>  <Rle> | <ManyToManyGrouping>         <numeric>         <numeric>
#  [1]     chr1     22-31      * |                  2,3 0.326366489753127 0.925836584065109
#  [2]     chr1     42-51      * |                  4,5  0.92806151532568 0.897841389290988
#  [3]     chr1     62-71      * |                  6,7 0.980487102875486 0.940743445185944
#  [4]     chr1     83-92      * |                  8,9 0.798293181695044 0.381754550151527
#  [5]     chr1   101-110      * |                   10 0.872806148370728 0.953412540955469


As you can see, this works when I specify each score column individually, but how do I do this for thousands of columns?

genomicranges • 178 views
modified 5 months ago by Michael Lawrence11k • written 5 months ago by gueux0
Answer: Aggregating scores of multiple metadata columns from one GRanges object on anoth
0
5 months ago by
gueux0
gueux0 wrote:

Based on a suggestion received here, the following works:

scoreagg<-paste0("mcols(ans)<-aggregate(tomap,hits,",paste0(colnames(tomap@elementMetadata)[1:5],"=","max(",colnames(tomap@elementMetadata)[1:5],")",collapse=","),")")

eval(parse(text=scoreagg))

Answer: Aggregating scores of multiple metadata columns from one GRanges object on anoth
0
5 months ago by
United States
Michael Lawrence11k wrote:

If you have thousands of score columns, you might find it more convenient to bind them into a matrix, stored as a single column on the GRanges. Then you could use the matrixStats package to compute the maxima across all columns. I think the code would look something like:

aggregate(tomap, hits, max_scores=bindROWS(lapply(scores, colMaxs)))


Thanks Michael, but how do you store a matrix as a single column in the GRanges? If I do:

scores<-cbind(score1 = runif(10),score2=runif(10),score3=runif(10),score4=runif(10),score5=runif(10))

and then:

tomap <- GRanges( seqnames = Rle(c("chr1"), c(10)), ranges = IRanges(1:10*10, end = 1:10*10+5), score=scores)

I still get 5 metadata columns, and an error when running your code:Error in FUN(X[[i]], ...) : Argument 'x' must be a matrix or a vector.

Add it as a column after construction.

Doing this still ends up with tomap having 5 metadata columns: tomap <- GRanges( seqnames = Rle(c("chr1"), c(10)), ranges = IRanges(1:10*10, end = 1:10*10+5)) mcols(tomap)<-scores

Then running: aggregate(tomap, hits, max_scores=lapply(scores, colMaxs)) gives this error: Error in FUN(X[[i]], ...) : Argument 'dim' must be an integer vector of length two.

1

mcols(x)\$scores <- mat

thanks, that does work to associate the matrix but then if I have more score columns than rows in roi then I get the error Error in DataFrame(by, stats) : different row counts implied by arguments. It seems, the result is transposed with the result for interval 1 placed in column 1 vertically instead of horizontally. And it fails completely if there are not the same number of intervals and score columns.

I edited by answer to include the call to bindROWS() but I'm not sure if it will work. Please provide a fully reproducible example.