Search
Question: matrix as a column of mcols of GRanges would not be extracted correctly by lapply
0
17 months ago by
wcstcyx30
China/Beijing/AMSS,CAS
wcstcyx30 wrote:

Hi,

I used matrix as a column of mcols of GRanges. This kind of structure would be of great value in some case. However, I found the matrix can not be extracted correctly with lapply. I don't know the reason. I can use 'for loop' to get around of it, but this way is not very convenient.

For example,

gr <- GRanges(c("A:1-2", "B:4-5"))

gr$mat <- matrix(1:4, 2) (grl <- GRangesList(gr, gr + 1)) ## GRangesList object of length 2: ## [[1]] ## GRanges object with 2 ranges and 1 metadata column: ## seqnames ranges strand | mat ## <Rle> <IRanges> <Rle> | <matrix> ## [1] A [1, 2] * | 1:3 ## [2] B [4, 5] * | 2:4 ## [[2]] ## GRanges object with 2 ranges and 1 metadata column: ## seqnames ranges strand | mat ## [1] A [0, 3] * | 1:3 ## [2] B [3, 6] * | 2:4 ## ------- ## seqinfo: 2 sequences from an unspecified genome; no seqlengths lapply(grl, function(gr) gr$mat) ## it only takes the first column of mat
## [[1]]
## [1] 1 2

## [[2]]
## [1] 1 2

listMat <- list()
for (i in 1:2) { ## use for loop to avoid that problem
listMat[[i]] <- grl[[i]]$mat } print(listMat) ## [[1]] ## [,1] [,2] ## [1,] 1 3 ## [2,] 2 4 ## [[2]] ## [,1] [,2] ## [1,] 1 3 ## [2,] 2 4 Even attaching the class "AsIs" to the mat does not work. gr$mat <- I(matrix(1:4, 2))
class(gr$mat) <- "AsIs" class(gr$mat) ## "AsIs" is attached
## [1] "AsIs"
(grl_2 <- GRangesList(gr, gr + 1))
GRangesList object of length 2:
## [[1]]
## GRanges object with 2 ranges and 1 metadata column:
##       seqnames    ranges strand |      mat
##          <Rle> <IRanges>  <Rle> | <matrix>
##   [1]        A    [1, 2]      * |      1:3
##   [2]        B    [4, 5]      * |      2:4

## [[2]]
## GRanges object with 2 ranges and 1 metadata column:
##       seqnames ranges strand | mat
##   [1]        A [0, 3]      * | 1:3
##   [2]        B [3, 6]      * | 2:4

## -------
## seqinfo: 2 sequences from an unspecified genome; no seqlengths
class(grl_2[[1]]$mat) ## "AsIs" is automatically removed! ## [1] "matrix" identical(grl, grl_2) ## [1] TRUE I don't know if it is a proper behavior and the rational behind it. Thanks for your help in advance. Bests, Can sessionInfo() R version 3.4.1 (2017-06-30) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1 Matrix products: default locale: [1] LC_COLLATE=Chinese (Simplified)_People's Republic of China.936 [2] LC_CTYPE=Chinese (Simplified)_People's Republic of China.936 [3] LC_MONETARY=Chinese (Simplified)_People's Republic of China.936 [4] LC_NUMERIC=C [5] LC_TIME=Chinese (Simplified)_People's Republic of China.936 attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets [8] methods base other attached packages: [1] GenomicRanges_1.28.3 GenomeInfoDb_1.12.2 IRanges_2.10.2 [4] S4Vectors_0.14.3 BiocGenerics_0.22.0 BiocInstaller_1.26.0 [7] cowsay_0.5.0 loaded via a namespace (and not attached): [1] zlibbioc_1.22.0 compiler_3.4.1 rmsfact_0.0.3 [4] XVector_0.16.0 GenomeInfoDbData_0.99.0 RCurl_1.95-4.8 [7] bitops_1.0-6 fortunes_1.5-4  ADD COMMENTlink modified 17 months ago by Michael Lawrence10k • written 17 months ago by wcstcyx30 3 17 months ago by United States Michael Lawrence10k wrote: This is because > extractROWS(gr$mat, NSBS(IRanges(1,2), gr$mat)) [1] 1 2 which dispatches to extractROWS,vector_or_factor,RangeNSBS() and fails to take into account the dimensions. For now, I've checked in a fallback to S4Vectors:::.extractROWSWithBracket() for arrays, but there would be some inefficiency in converting the ranges to an integer vector. Thoughts, Herve? ADD COMMENTlink written 17 months ago by Michael Lawrence10k Hi Michael, Thanks! I tried to understand those functions you had mentioned. And found extractROWS(gr$mat, NSBS(IRanges(1,2), gr$mat)) eventually calls x <- as(x, "vector_OR_factor", strict = FALSE) which would convert a matrix or an array to a numeric vector. And consequently the following extract_ranges_from_vector_OR_factor(x, start, width) would only extract elements corresponding to first column of the original matrix or array. S4Vectors:::.extractROWSWithBracket(gr$mat, NSBS(IRanges(1,2), gr\$mat))

can succeed.

S4Vectors:::.extractROWSWithBracket

is the same with

selectMethod(extractROWS, signature(x="any", i="any"))

So

x="array", i="RangeNSBS"
(inherited from: x="vector_OR_factor", i="RangeNSBS")
x="array", i="RangesNSBS"
(inherited from: x="vector_OR_factor", i="RangesNSBS")

x="matrix", i="RangeNSBS"
(inherited from: x="vector_OR_factor", i="RangeNSBS")
x="matrix", i="RangesNSBS"
(inherited from: x="vector_OR_factor", i="RangesNSBS")

can be changed to inherited from: x="ANY", i="ANY" to avoid the problem.

I think the potential inefficiency you meant may come from

i <- normalizeSingleBracketSubscript(i, x, allow.NAs = TRUE)

in S4Vectors:::.extractROWSWithBracket

It was a long way for me to understand the mechanism you had interpreted. Anyway, thank you very much :)

Can