Question: Aggregating scores of multiple metadata columns from one GRanges object on another
0
gravatar for gueux
5 months ago by
gueux0
gueux0 wrote:

I have one GRanges object with thousands of score columns (tomap), and another with regions of interest and no metadata (roi). I am trying to map the max score from each column in tomap to the corresponding interval in roi.

I also want to retain the names of the score columns (in my real data these are meaningful names and not genralizable like score1, score2 etc...). I can do it for specific columns but am struggling to generalize it to every column.

Here is what I've got so far:

library(GenomicRanges)
tomap <- GRanges(
    seqnames = Rle(c("chr1"), c(10)),
    ranges = IRanges(1:10*10, end = 1:10*10+5),
    score1 = runif(10),score2=runif(10),score3=runif(10),score4=runif(10),score5=runif(10))

roi <- GRanges(
    seqnames = Rle(c("chr1"), c(5)),
    ranges = IRanges(1:5*20 + floor(runif(5)*4), width = 10))

hits <- findOverlaps(roi, tomap, ignore.strand = TRUE)

ans<-roi
mcols(ans) <- aggregate(tomap, hits, score1=max(score1), score2= max(score2))

ans
#GRanges object with 5 ranges and 3 metadata columns:
#      seqnames    ranges strand |             grouping            score1            score2
#         <Rle> <IRanges>  <Rle> | <ManyToManyGrouping>         <numeric>         <numeric>
#  [1]     chr1     22-31      * |                  2,3 0.326366489753127 0.925836584065109
#  [2]     chr1     42-51      * |                  4,5  0.92806151532568 0.897841389290988
#  [3]     chr1     62-71      * |                  6,7 0.980487102875486 0.940743445185944
#  [4]     chr1     83-92      * |                  8,9 0.798293181695044 0.381754550151527
#  [5]     chr1   101-110      * |                   10 0.872806148370728 0.953412540955469

As you can see, this works when I specify each score column individually, but how do I do this for thousands of columns?

genomicranges • 178 views
ADD COMMENTlink modified 5 months ago by Michael Lawrence11k • written 5 months ago by gueux0
Answer: Aggregating scores of multiple metadata columns from one GRanges object on anoth
0
gravatar for gueux
5 months ago by
gueux0
gueux0 wrote:

Based on a suggestion received here, the following works:

scoreagg<-paste0("mcols(ans)<-aggregate(tomap,hits,",paste0(colnames(tomap@elementMetadata)[1:5],"=","max(",colnames(tomap@elementMetadata)[1:5],")",collapse=","),")")

eval(parse(text=scoreagg))
ADD COMMENTlink written 5 months ago by gueux0
Answer: Aggregating scores of multiple metadata columns from one GRanges object on anoth
0
gravatar for Michael Lawrence
5 months ago by
United States
Michael Lawrence11k wrote:

If you have thousands of score columns, you might find it more convenient to bind them into a matrix, stored as a single column on the GRanges. Then you could use the matrixStats package to compute the maxima across all columns. I think the code would look something like:

aggregate(tomap, hits, max_scores=bindROWS(lapply(scores, colMaxs)))
ADD COMMENTlink modified 5 months ago • written 5 months ago by Michael Lawrence11k

Thanks Michael, but how do you store a matrix as a single column in the GRanges? If I do:

scores<-cbind(score1 = runif(10),score2=runif(10),score3=runif(10),score4=runif(10),score5=runif(10))

and then:

tomap <- GRanges( seqnames = Rle(c("chr1"), c(10)), ranges = IRanges(1:10*10, end = 1:10*10+5), score=scores)

I still get 5 metadata columns, and an error when running your code:Error in FUN(X[[i]], ...) : Argument 'x' must be a matrix or a vector.

ADD REPLYlink written 5 months ago by gueux0

Add it as a column after construction.

ADD REPLYlink written 5 months ago by Michael Lawrence11k

Doing this still ends up with tomap having 5 metadata columns: tomap <- GRanges( seqnames = Rle(c("chr1"), c(10)), ranges = IRanges(1:10*10, end = 1:10*10+5)) mcols(tomap)<-scores

Then running: aggregate(tomap, hits, max_scores=lapply(scores, colMaxs)) gives this error: Error in FUN(X[[i]], ...) : Argument 'dim' must be an integer vector of length two.

ADD REPLYlink modified 5 months ago • written 5 months ago by gueux0
1

mcols(x)$scores <- mat

ADD REPLYlink written 5 months ago by Michael Lawrence11k

thanks, that does work to associate the matrix but then if I have more score columns than rows in roi then I get the error Error in DataFrame(by, stats) : different row counts implied by arguments. It seems, the result is transposed with the result for interval 1 placed in column 1 vertically instead of horizontally. And it fails completely if there are not the same number of intervals and score columns.

ADD REPLYlink written 5 months ago by gueux0

I edited by answer to include the call to bindROWS() but I'm not sure if it will work. Please provide a fully reproducible example.

ADD REPLYlink written 5 months ago by Michael Lawrence11k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 231 users visited in the last hour