Search
Question: matrix as a column of mcols of GRanges would not be extracted correctly by lapply
0
gravatar for wcstcyx
4 months ago by
wcstcyx30
China/Beijing/AMSS,CAS
wcstcyx30 wrote:

Hi,

I used matrix as a column of mcols of GRanges. This kind of structure would be of great value in some case. However, I found the matrix can not be extracted correctly with lapply. I don't know the reason. I can use 'for loop' to get around of it, but this way is not very convenient. 

For example,

gr <- GRanges(c("A:1-2", "B:4-5"))

gr$mat <- matrix(1:4, 2)

(grl <- GRangesList(gr, gr + 1))

## GRangesList object of length 2:
## [[1]] 
## GRanges object with 2 ranges and 1 metadata column:
##       seqnames    ranges strand |      mat
##          <Rle> <IRanges>  <Rle> | <matrix>
##   [1]        A    [1, 2]      * |      1:3
##   [2]        B    [4, 5]      * |      2:4

## [[2]] 
## GRanges object with 2 ranges and 1 metadata column:
##       seqnames ranges strand | mat
##   [1]        A [0, 3]      * | 1:3
##   [2]        B [3, 6]      * | 2:4

## -------
## seqinfo: 2 sequences from an unspecified genome; no seqlengths


lapply(grl, function(gr) gr$mat) ## it only takes the first column of mat
## [[1]]
## [1] 1 2

## [[2]]
## [1] 1 2

listMat <- list()
for (i in 1:2) { ## use for loop to avoid that problem
  listMat[[i]] <- grl[[i]]$mat
}
print(listMat)
## [[1]]
##      [,1] [,2]
## [1,]    1    3
## [2,]    2    4

## [[2]]
##      [,1] [,2]
## [1,]    1    3
## [2,]    2    4

Even attaching the class "AsIs" to the mat does not work.

gr$mat <- I(matrix(1:4, 2))
class(gr$mat) <- "AsIs"
class(gr$mat) ## "AsIs" is attached
## [1] "AsIs"
(grl_2 <- GRangesList(gr, gr + 1))
GRangesList object of length 2:
## [[1]] 
## GRanges object with 2 ranges and 1 metadata column:
##       seqnames    ranges strand |      mat
##          <Rle> <IRanges>  <Rle> | <matrix>
##   [1]        A    [1, 2]      * |      1:3
##   [2]        B    [4, 5]      * |      2:4

## [[2]] 
## GRanges object with 2 ranges and 1 metadata column:
##       seqnames ranges strand | mat
##   [1]        A [0, 3]      * | 1:3
##   [2]        B [3, 6]      * | 2:4

## -------
## seqinfo: 2 sequences from an unspecified genome; no seqlengths
class(grl_2[[1]]$mat) ## "AsIs" is automatically removed!
## [1] "matrix"
identical(grl, grl_2)
## [1] TRUE

I don't know if it is a proper behavior and the rational behind it.

Thanks for your help in advance.

Bests,

Can

sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=Chinese (Simplified)_People's Republic of China.936 
[2] LC_CTYPE=Chinese (Simplified)_People's Republic of China.936   
[3] LC_MONETARY=Chinese (Simplified)_People's Republic of China.936
[4] LC_NUMERIC=C                                                   
[5] LC_TIME=Chinese (Simplified)_People's Republic of China.936    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
[1] GenomicRanges_1.28.3 GenomeInfoDb_1.12.2  IRanges_2.10.2      
[4] S4Vectors_0.14.3     BiocGenerics_0.22.0  BiocInstaller_1.26.0
[7] cowsay_0.5.0        

loaded via a namespace (and not attached):
[1] zlibbioc_1.22.0         compiler_3.4.1          rmsfact_0.0.3          
[4] XVector_0.16.0          GenomeInfoDbData_0.99.0 RCurl_1.95-4.8         
[7] bitops_1.0-6            fortunes_1.5-4         
ADD COMMENTlink modified 4 months ago by Michael Lawrence9.8k • written 4 months ago by wcstcyx30
3
gravatar for Michael Lawrence
4 months ago by
United States
Michael Lawrence9.8k wrote:

This is because

> extractROWS(gr$mat, NSBS(IRanges(1,2), gr$mat))
[1] 1 2

which dispatches to extractROWS,vector_or_factor,RangeNSBS() and fails to take into account the dimensions. For now, I've checked in a fallback to S4Vectors:::.extractROWSWithBracket() for arrays, but there would be some inefficiency in converting the ranges to an integer vector. Thoughts, Herve?

ADD COMMENTlink written 4 months ago by Michael Lawrence9.8k

Hi Michael,

Thanks! I tried to understand those functions you had mentioned. And found

extractROWS(gr$mat, NSBS(IRanges(1,2), gr$mat))

eventually calls

x <- as(x, "vector_OR_factor", strict = FALSE)

which would convert a matrix or an array to a numeric vector. And consequently the following

extract_ranges_from_vector_OR_factor(x, start, width)

would only extract elements corresponding to first column of the original matrix or array.

S4Vectors:::.extractROWSWithBracket(gr$mat, NSBS(IRanges(1,2), gr$mat))

can succeed.

S4Vectors:::.extractROWSWithBracket

 is the same with

selectMethod(extractROWS, signature(x="any", i="any"))

So 

x="array", i="RangeNSBS"
    (inherited from: x="vector_OR_factor", i="RangeNSBS")
x="array", i="RangesNSBS"
    (inherited from: x="vector_OR_factor", i="RangesNSBS")

x="matrix", i="RangeNSBS"
    (inherited from: x="vector_OR_factor", i="RangeNSBS")
x="matrix", i="RangesNSBS"
    (inherited from: x="vector_OR_factor", i="RangesNSBS")

can be changed to inherited from: x="ANY", i="ANY" to avoid the problem.

I think the potential inefficiency you meant may come from

i <- normalizeSingleBracketSubscript(i, x, allow.NAs = TRUE)

in S4Vectors:::.extractROWSWithBracket

It was a long way for me to understand the mechanism you had interpreted. Anyway, thank you very much :)

Can

ADD REPLYlink modified 3 months ago • written 3 months ago by wcstcyx30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 188 users visited in the last hour