Aggregating scores of multiple metadata columns from one GRanges object on another
2
0
Entering edit mode
gueux • 0
@gueux-20370
Last seen 5.0 years ago

I have one GRanges object with thousands of score columns (tomap), and another with regions of interest and no metadata (roi). I am trying to map the max score from each column in tomap to the corresponding interval in roi.

I also want to retain the names of the score columns (in my real data these are meaningful names and not genralizable like score1, score2 etc...). I can do it for specific columns but am struggling to generalize it to every column.

Here is what I've got so far:

library(GenomicRanges)
tomap <- GRanges(
    seqnames = Rle(c("chr1"), c(10)),
    ranges = IRanges(1:10*10, end = 1:10*10+5),
    score1 = runif(10),score2=runif(10),score3=runif(10),score4=runif(10),score5=runif(10))

roi <- GRanges(
    seqnames = Rle(c("chr1"), c(5)),
    ranges = IRanges(1:5*20 + floor(runif(5)*4), width = 10))

hits <- findOverlaps(roi, tomap, ignore.strand = TRUE)

ans<-roi
mcols(ans) <- aggregate(tomap, hits, score1=max(score1), score2= max(score2))

ans
#GRanges object with 5 ranges and 3 metadata columns:
#      seqnames    ranges strand |             grouping            score1            score2
#         <Rle> <IRanges>  <Rle> | <ManyToManyGrouping>         <numeric>         <numeric>
#  [1]     chr1     22-31      * |                  2,3 0.326366489753127 0.925836584065109
#  [2]     chr1     42-51      * |                  4,5  0.92806151532568 0.897841389290988
#  [3]     chr1     62-71      * |                  6,7 0.980487102875486 0.940743445185944
#  [4]     chr1     83-92      * |                  8,9 0.798293181695044 0.381754550151527
#  [5]     chr1   101-110      * |                   10 0.872806148370728 0.953412540955469

As you can see, this works when I specify each score column individually, but how do I do this for thousands of columns?

GenomicRanges • 1.4k views
ADD COMMENT
0
Entering edit mode
gueux • 0
@gueux-20370
Last seen 5.0 years ago

Based on a suggestion received here, the following works:

scoreagg<-paste0("mcols(ans)<-aggregate(tomap,hits,",paste0(colnames(tomap@elementMetadata)[1:5],"=","max(",colnames(tomap@elementMetadata)[1:5],")",collapse=","),")")

eval(parse(text=scoreagg))
ADD COMMENT
0
Entering edit mode
@michael-lawrence-3846
Last seen 2.3 years ago
United States

If you have thousands of score columns, you might find it more convenient to bind them into a matrix, stored as a single column on the GRanges. Then you could use the matrixStats package to compute the maxima across all columns. I think the code would look something like:

aggregate(tomap, hits, max_scores=bindROWS(lapply(scores, colMaxs)))
ADD COMMENT
0
Entering edit mode

Thanks Michael, but how do you store a matrix as a single column in the GRanges? If I do:

scores<-cbind(score1 = runif(10),score2=runif(10),score3=runif(10),score4=runif(10),score5=runif(10))

and then:

tomap <- GRanges( seqnames = Rle(c("chr1"), c(10)), ranges = IRanges(1:10*10, end = 1:10*10+5), score=scores)

I still get 5 metadata columns, and an error when running your code:Error in FUN(X[[i]], ...) : Argument 'x' must be a matrix or a vector.

ADD REPLY
0
Entering edit mode

Add it as a column after construction.

ADD REPLY
0
Entering edit mode

Doing this still ends up with tomap having 5 metadata columns: tomap <- GRanges( seqnames = Rle(c("chr1"), c(10)), ranges = IRanges(1:10*10, end = 1:10*10+5)) mcols(tomap)<-scores

Then running: aggregate(tomap, hits, max_scores=lapply(scores, colMaxs)) gives this error: Error in FUN(X[[i]], ...) : Argument 'dim' must be an integer vector of length two.

ADD REPLY
1
Entering edit mode

mcols(x)$scores <- mat

ADD REPLY
0
Entering edit mode

thanks, that does work to associate the matrix but then if I have more score columns than rows in roi then I get the error Error in DataFrame(by, stats) : different row counts implied by arguments. It seems, the result is transposed with the result for interval 1 placed in column 1 vertically instead of horizontally. And it fails completely if there are not the same number of intervals and score columns.

ADD REPLY
0
Entering edit mode

I edited by answer to include the call to bindROWS() but I'm not sure if it will work. Please provide a fully reproducible example.

ADD REPLY

Login before adding your answer.

Traffic: 878 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6