Search
Question: subset a RleList using GRanges object?
0
gravatar for Janet Young
4.0 years ago by
Janet Young730
Fred Hutchinson Cancer Research Center, Seattle, WA, USA
Janet Young730 wrote:
Hi there, I'm playing around with coverage data generated outside of R, planning to plot RNA-seq coverage for some genes we're interested in. I have a request - it'd be nice from my point of view if it were possible to look at a small region an RleList (or CompressedRleList) using a GRanges object to focus on that smaller region. Looks like I can subset a single Rle object with an IRanges object, but I wonder if this nice feature could be extended to GRanges objects? Some code is below to show what I'm trying to do. thanks veruy much, Janet library(GenomicRanges) ## an example RleList x <- Rle(10:1, 1:10) y <- Rle(10:1, 1:10) z <- RleList( chr1=x, chr2=y) ## an example GRanges object myRange <- GRanges( seqnames="chr1", ranges=IRanges(start=10,end=15) ) ## subsetting an Rle using an IRanges object works, as expected: z[["chr1"]] [ ranges(myRange) ] ## but subsetting an RleList by GRanges object doesn't work z [ myRange ] # Error in normalizeSingleBracketSubscript(i, x) : invalid subscript type sessionInfo() R Under development (unstable) (2013-11-06 r64163) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] GenomicRanges_1.15.10 XVector_0.3.2 IRanges_1.21.13 [4] BiocGenerics_0.9.1 loaded via a namespace (and not attached): [1] stats4_3.1.0 ------------------------------------------------------------------- Dr. Janet Young Malik lab http://research.fhcrc.org/malik/en.html Fred Hutchinson Cancer Research Center 1100 Fairview Avenue N., A2-025, P.O. Box 19024, Seattle, WA 98109-1024, USA. tel: (206) 667 4512 email: jayoung ...at... fhcrc.org -------------------------------------------------------------------
ADD COMMENTlink modified 4.0 years ago by Hervé Pagès ♦♦ 13k • written 4.0 years ago by Janet Young730
0
gravatar for Michael Lawrence
4.0 years ago by
United States
Michael Lawrence9.8k wrote:
For now, just coerce to a RangesList. z [ as(myRange, "RangesList") ] Herve might consider adding a [,List,GenomicRanges method... kind of a logical leap though from seqnames to list element names. extractCoverageForPositions is different; it extracts an integer vector given a vector of width-one positions. On Tue, Dec 3, 2013 at 4:53 PM, Janet Young <jayoung@fhcrc.org> wrote: > Hi there, > > I'm playing around with coverage data generated outside of R, planning to > plot RNA-seq coverage for some genes we're interested in. > > I have a request - it'd be nice from my point of view if it were possible > to look at a small region an RleList (or CompressedRleList) using a GRanges > object to focus on that smaller region. Looks like I can subset a single > Rle object with an IRanges object, but I wonder if this nice feature could > be extended to GRanges objects? > > Some code is below to show what I'm trying to do. > > thanks veruy much, > > Janet > > > > library(GenomicRanges) > > ## an example RleList > x <- Rle(10:1, 1:10) > y <- Rle(10:1, 1:10) > z <- RleList( chr1=x, chr2=y) > > ## an example GRanges object > myRange <- GRanges( seqnames="chr1", ranges=IRanges(start=10,end=15) ) > > ## subsetting an Rle using an IRanges object works, as expected: > z[["chr1"]] [ ranges(myRange) ] > > ## but subsetting an RleList by GRanges object doesn't work > z [ myRange ] > # Error in normalizeSingleBracketSubscript(i, x) : invalid subscript type > > sessionInfo() > > R Under development (unstable) (2013-11-06 r64163) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] GenomicRanges_1.15.10 XVector_0.3.2 IRanges_1.21.13 > [4] BiocGenerics_0.9.1 > > loaded via a namespace (and not attached): > [1] stats4_3.1.0 > > > > > ------------------------------------------------------------------- > > Dr. Janet Young > > Malik lab > http://research.fhcrc.org/malik/en.html > > Fred Hutchinson Cancer Research Center > 1100 Fairview Avenue N., A2-025, > P.O. Box 19024, Seattle, WA 98109-1024, USA. > > tel: (206) 667 4512 > email: jayoung ...at... fhcrc.org > > ------------------------------------------------------------------- > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENTlink written 4.0 years ago by Michael Lawrence9.8k
Hi Michael, On 12/03/2013 07:13 PM, Michael Lawrence wrote: > For now, just coerce to a RangesList. > > z [ as(myRange, "RangesList") ] > > Herve might consider adding a [,List,GenomicRanges method... kind of a > logical leap though from seqnames to list element names. This is something I've been considering for a while. I think we should do it. The mapping between the seqnames of a GRanges and the names of a RangesList is well established at this point e.g. the coercion (back and forth) between the two already does that. Cheers, H. > > extractCoverageForPositions is different; it extracts an integer vector > given a vector of width-one positions. > > > > > > > On Tue, Dec 3, 2013 at 4:53 PM, Janet Young <jayoung at="" fhcrc.org=""> wrote: > >> Hi there, >> >> I'm playing around with coverage data generated outside of R, planning to >> plot RNA-seq coverage for some genes we're interested in. >> >> I have a request - it'd be nice from my point of view if it were possible >> to look at a small region an RleList (or CompressedRleList) using a GRanges >> object to focus on that smaller region. Looks like I can subset a single >> Rle object with an IRanges object, but I wonder if this nice feature could >> be extended to GRanges objects? >> >> Some code is below to show what I'm trying to do. >> >> thanks veruy much, >> >> Janet >> >> >> >> library(GenomicRanges) >> >> ## an example RleList >> x <- Rle(10:1, 1:10) >> y <- Rle(10:1, 1:10) >> z <- RleList( chr1=x, chr2=y) >> >> ## an example GRanges object >> myRange <- GRanges( seqnames="chr1", ranges=IRanges(start=10,end=15) ) >> >> ## subsetting an Rle using an IRanges object works, as expected: >> z[["chr1"]] [ ranges(myRange) ] >> >> ## but subsetting an RleList by GRanges object doesn't work >> z [ myRange ] >> # Error in normalizeSingleBracketSubscript(i, x) : invalid subscript type >> >> sessionInfo() >> >> R Under development (unstable) (2013-11-06 r64163) >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] parallel stats graphics grDevices utils datasets methods >> [8] base >> >> other attached packages: >> [1] GenomicRanges_1.15.10 XVector_0.3.2 IRanges_1.21.13 >> [4] BiocGenerics_0.9.1 >> >> loaded via a namespace (and not attached): >> [1] stats4_3.1.0 >> >> >> >> >> ------------------------------------------------------------------- >> >> Dr. Janet Young >> >> Malik lab >> http://research.fhcrc.org/malik/en.html >> >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Avenue N., A2-025, >> P.O. Box 19024, Seattle, WA 98109-1024, USA. >> >> tel: (206) 667 4512 >> email: jayoung ...at... fhcrc.org >> >> ------------------------------------------------------------------- >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD REPLYlink written 4.0 years ago by Hervé Pagès ♦♦ 13k
Thanks, all. Yes, that coercion will do it for me: z[as(myRange, "RangesList")]. It's always nice for the end user when you take care of those kinds of things behind the scenes for us - sounds like what you're considering implementing might take care of it. Robert: thanks for the ideas! But my needs are a little different, though - I'm keeping the individual values in the region I'm interested in (I'm plotting RNA-seq coverage on just a single gene, but my coverage object is for the whole genome). Janet On Dec 4, 2013, at 10:09 AM, Hervé Pagès wrote: > Hi Michael, > > On 12/03/2013 07:13 PM, Michael Lawrence wrote: >> For now, just coerce to a RangesList. >> >> z [ as(myRange, "RangesList") ] >> >> Herve might consider adding a [,List,GenomicRanges method... kind of a >> logical leap though from seqnames to list element names. > > This is something I've been considering for a while. I think we should > do it. The mapping between the seqnames of a GRanges and the names of a > RangesList is well established at this point e.g. the coercion (back and > forth) between the two already does that. > > Cheers, > H. > >> >> extractCoverageForPositions is different; it extracts an integer vector >> given a vector of width-one positions. >> >> >> >> >> >> >> On Tue, Dec 3, 2013 at 4:53 PM, Janet Young <jayoung at="" fhcrc.org=""> wrote: >> >>> Hi there, >>> >>> I'm playing around with coverage data generated outside of R, planning to >>> plot RNA-seq coverage for some genes we're interested in. >>> >>> I have a request - it'd be nice from my point of view if it were possible >>> to look at a small region an RleList (or CompressedRleList) using a GRanges >>> object to focus on that smaller region. Looks like I can subset a single >>> Rle object with an IRanges object, but I wonder if this nice feature could >>> be extended to GRanges objects? >>> >>> Some code is below to show what I'm trying to do. >>> >>> thanks veruy much, >>> >>> Janet >>> >>> >>> >>> library(GenomicRanges) >>> >>> ## an example RleList >>> x <- Rle(10:1, 1:10) >>> y <- Rle(10:1, 1:10) >>> z <- RleList( chr1=x, chr2=y) >>> >>> ## an example GRanges object >>> myRange <- GRanges( seqnames="chr1", ranges=IRanges(start=10,end=15) ) >>> >>> ## subsetting an Rle using an IRanges object works, as expected: >>> z[["chr1"]] [ ranges(myRange) ] >>> >>> ## but subsetting an RleList by GRanges object doesn't work >>> z [ myRange ] >>> # Error in normalizeSingleBracketSubscript(i, x) : invalid subscript type >>> >>> sessionInfo() >>> >>> R Under development (unstable) (2013-11-06 r64163) >>> Platform: x86_64-unknown-linux-gnu (64-bit) >>> >>> locale: >>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>> >>> attached base packages: >>> [1] parallel stats graphics grDevices utils datasets methods >>> [8] base >>> >>> other attached packages: >>> [1] GenomicRanges_1.15.10 XVector_0.3.2 IRanges_1.21.13 >>> [4] BiocGenerics_0.9.1 >>> >>> loaded via a namespace (and not attached): >>> [1] stats4_3.1.0 >>> >>> >>> >>> >>> ------------------------------------------------------------------- >>> >>> Dr. Janet Young >>> >>> Malik lab >>> http://research.fhcrc.org/malik/en.html >>> >>> Fred Hutchinson Cancer Research Center >>> 1100 Fairview Avenue N., A2-025, >>> P.O. Box 19024, Seattle, WA 98109-1024, USA. >>> >>> tel: (206) 667 4512 >>> email: jayoung ...at... fhcrc.org >>> >>> ------------------------------------------------------------------- >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpages at fhcrc.org > Phone: (206) 667-5791 > Fax: (206) 667-1319
ADD REPLYlink written 4.0 years ago by Janet Young730
Janet, et al I believe there are problems with all approaches mentioned so far on this thread. When you index something, length of the result should equal the length of the index, AND, the order of the elements of the result should correspond to the order of the indices. The approaches mentioned all depend upon coersion from GRanges to RangesList, but this does not preserve the order of the GRanges. Look: > example(GRangesList) > as(gr3,'RangesList') IRangesList of length 2 $chr1 IRanges of length 1 start end width [1] 1 3 3 $chr2 IRanges of length 1 start end width [1] 4 9 6 !> as(gr3[2:1],'RangesList') IRangesList of length 2 $chr1 IRanges of length 1 start end width [1] 1 3 3 $chr2 IRanges of length 1 start end width [1] 4 9 6 I think if you contrive for your GRanges to be in 'canonical order' (as defined by the RangesList coersion), you can proceed with this approach. Look: > stopifnot(all.equal(as(sort(gr3[2:1]),'RangesList'),as(sort(gr3[1:2] ),'RangesList'))) Janet, you brought this up in http://thread.gmane.org/gmane.science.biology.informatics.conductor/36 247/focus=36263 with more or less the same outcome. Here is excerpt from that exchange: >> An "order" method would be great. Note though that the coercion to >> RangesList only sorts by seqlevels, so we would need something that gave >> out that order vector. The point here is to avoid forcing the user to >> modify the order of the original GRanges. >> > > With order() the user will still be able to achieve this with something > like: > > regionMeans <- function(regions, cvg) > { > seqlevels(regions) <- names(cvg) > oo <- order(regions) > regions <- regions[oo] > ans <- unlist( > viewMeans( > Views(cvg, as(regions, "RangesList")) > ), use.names=FALSE) > ans[oo] <- ans # restore original order > ans > } > > values(myRegions)$meanCov <- regionMeans(myRegions, myRleList) > > Tricky :-/ Let's not forget this lesson! However, I agree, some syntactic sugar that allows indexing an RleList by GRanges with the proposed semantics would be great. In the mean time, I offer this function, which I found to be faster than Herve's offering above, and also provides a few more useful (to me) abstractions. YMMV: applyAt <- function(cvg # an RleList (quite possibly representing a genomic coverage) ,gr # a GRanges at each element of which you want to evaluate a function ,FUN=identity # by default, just return individual rleLists for each element of gr ,... # additional options to FUN ,.MAPPLY=mapply #optionally parallize with mcmapply or your favorite variant ,USE.NAMES=TRUE ,SIMPLIFY=TRUE #'array'#TRUE ,MoreArgs=NULL # passed to .MAPPLY ,ignore.strand=TRUE # else, reverse the coverages ) { revif<-function(s,x) { if(ignore.strand==TRUE) {x} else if (s=='+') {x} else {rev(x)} } ans<-.MAPPLY(function(a,b,c,s) { FUN(revif(s,cvg[[a]][IRanges(b,c),drop=FALSE]),...) } ,as.vector(seqnames(gr)),start(gr),end(gr),as.vector(strand(gr)) ,MoreArgs=MoreArgs ,SIMPLIFY=SIMPLIFY ,USE.NAMES=FALSE #USE.NAMES ) if (!USE.NAMES) {} else if (is.null(dim(ans))) {names(ans)<-names(gr)} else if (SIMPLIFY=='array') {colnames(ans)<-names(gr)} else rownames(ans)<-names(gr) ans } Finally, what I think we REALLY want is a version of the above which currys a FUN (possibly `identity`) down to rle levels that can be applied to a bigwig file without having to load the whole thing into memory. In the mean time, I offer: applyAt.bw<-function(x,y,...) { applyAtimport.bw(x,asRle=TRUE ,selection=BigWigSelection(y) ) ,y ,...) } ~Malcolm >-----Original Message----- >From: bioconductor-bounces at r-project.org [mailto:bioconductor- bounces at r-project.org] On Behalf Of Janet Young >Sent: Wednesday, December 04, 2013 12:59 PM >To: Hervé Pagès >Cc: Michael Lawrence; bioconductor at r-project.org >Subject: Re: [BioC] subset a RleList using GRanges object? > >Thanks, all. Yes, that coercion will do it for me: z[as(myRange, "RangesList")]. It's always nice for the end user when you take care of >those kinds of things behind the scenes for us - sounds like what you're considering implementing might take care of it. > >Robert: thanks for the ideas! But my needs are a little different, though - I'm keeping the individual values in the region I'm interested >in (I'm plotting RNA-seq coverage on just a single gene, but my coverage object is for the whole genome). > >Janet > > > >On Dec 4, 2013, at 10:09 AM, Hervé Pagès wrote: > >> Hi Michael, >> >> On 12/03/2013 07:13 PM, Michael Lawrence wrote: >>> For now, just coerce to a RangesList. >>> >>> z [ as(myRange, "RangesList") ] >>> >>> Herve might consider adding a [,List,GenomicRanges method... kind of a >>> logical leap though from seqnames to list element names. >> >> This is something I've been considering for a while. I think we should >> do it. The mapping between the seqnames of a GRanges and the names of a >> RangesList is well established at this point e.g. the coercion (back and >> forth) between the two already does that. >> >> Cheers, >> H. >> >>> >>> extractCoverageForPositions is different; it extracts an integer vector >>> given a vector of width-one positions. >>> >>> >>> >>> >>> >>> >>> On Tue, Dec 3, 2013 at 4:53 PM, Janet Young <jayoung at="" fhcrc.org=""> wrote: >>> >>>> Hi there, >>>> >>>> I'm playing around with coverage data generated outside of R, planning to >>>> plot RNA-seq coverage for some genes we're interested in. >>>> >>>> I have a request - it'd be nice from my point of view if it were possible >>>> to look at a small region an RleList (or CompressedRleList) using a GRanges >>>> object to focus on that smaller region. Looks like I can subset a single >>>> Rle object with an IRanges object, but I wonder if this nice feature could >>>> be extended to GRanges objects? >>>> >>>> Some code is below to show what I'm trying to do. >>>> >>>> thanks veruy much, >>>> >>>> Janet >>>> >>>> >>>> >>>> library(GenomicRanges) >>>> >>>> ## an example RleList >>>> x <- Rle(10:1, 1:10) >>>> y <- Rle(10:1, 1:10) >>>> z <- RleList( chr1=x, chr2=y) >>>> >>>> ## an example GRanges object >>>> myRange <- GRanges( seqnames="chr1", ranges=IRanges(start=10,end=15) ) >>>> >>>> ## subsetting an Rle using an IRanges object works, as expected: >>>> z[["chr1"]] [ ranges(myRange) ] >>>> >>>> ## but subsetting an RleList by GRanges object doesn't work >>>> z [ myRange ] >>>> # Error in normalizeSingleBracketSubscript(i, x) : invalid subscript type >>>> >>>> sessionInfo() >>>> >>>> R Under development (unstable) (2013-11-06 r64163) >>>> Platform: x86_64-unknown-linux-gnu (64-bit) >>>> >>>> locale: >>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >>>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >>>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>>> >>>> attached base packages: >>>> [1] parallel stats graphics grDevices utils datasets methods >>>> [8] base >>>> >>>> other attached packages: >>>> [1] GenomicRanges_1.15.10 XVector_0.3.2 IRanges_1.21.13 >>>> [4] BiocGenerics_0.9.1 >>>> >>>> loaded via a namespace (and not attached): >>>> [1] stats4_3.1.0 >>>> >>>> >>>> >>>> >>>> ------------------------------------------------------------------- >>>> >>>> Dr. Janet Young >>>> >>>> Malik lab >>>> http://research.fhcrc.org/malik/en.html >>>> >>>> Fred Hutchinson Cancer Research Center >>>> 1100 Fairview Avenue N., A2-025, >>>> P.O. Box 19024, Seattle, WA 98109-1024, USA. >>>> >>>> tel: (206) 667 4512 >>>> email: jayoung ...at... fhcrc.org >>>> >>>> ------------------------------------------------------------------- >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> -- >> Hervé Pagès >> >> Program in Computational Biology >> Division of Public Health Sciences >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N, M1-B514 >> P.O. Box 19024 >> Seattle, WA 98109-1024 >> >> E-mail: hpages at fhcrc.org >> Phone: (206) 667-5791 >> Fax: (206) 667-1319 > >_______________________________________________ >Bioconductor mailing list >Bioconductor at r-project.org >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLYlink written 4.0 years ago by Malcolm Cook1.4k
Hi Malcolm, On 12/04/2013 11:43 AM, Cook, Malcolm wrote: > Janet, et al > > I believe there are problems with all approaches mentioned so far on this thread. > > When you index something, length of the result should equal the length of the index, Not if the index is a logical vector or a Ranges object though. > AND, the order of the elements of the result should correspond to the order of the indices. > > The approaches mentioned all depend upon coersion from GRanges to RangesList, but this does not preserve the order of the GRanges. True but it's guaranteed to preserve the order within each space/chromosome. I think this is good enough to make subsetting a named list-like object (e.g. named RleList, named DNAStringSet, etc...) by a GRanges object a reasonable thing to support. The order of the seqlevels in the GRanges object doesn't matter and won't affect the result of the subsetting because they're matched to the names of the list-like object to subset. Note that subsetting needs to behave like an "endomorphism" i.e. the result must be of the same type as the original object. So when subsetting an RleList by a GRanges index (like in Janet's example), the result will always be an RleList object, even if the index has more than 1 range per chromosome (in that case the extracted ranges are concatenated together). I'm still not convinced this is what Janet really needs. Like Michael mentioned earlier, maybe a more useful approach is to create an RleViewsList object with something like: vlist <- Views(rleList=z, rangesList=as(myRange, "RangesList")) viewMeans(vlist) More below. > > Look: > >> example(GRangesList) >> as(gr3,'RangesList') > IRangesList of length 2 > $chr1 > IRanges of length 1 > start end width > [1] 1 3 3 > > $chr2 > IRanges of length 1 > start end width > [1] 4 9 6 > > !> as(gr3[2:1],'RangesList') > IRangesList of length 2 > $chr1 > IRanges of length 1 > start end width > [1] 1 3 3 > > $chr2 > IRanges of length 1 > start end width > [1] 4 9 6 > > I think if you contrive for your GRanges to be in 'canonical order' (as defined by the RangesList coersion), you can proceed with this approach. Look: > >> stopifnot(all.equal(as(sort(gr3[2:1]),'RangesList'),as(sort(gr3[1:2 ]),'RangesList'))) > > Janet, you brought this up in > http://thread.gmane.org/gmane.science.biology.informatics.conductor/ 36247/focus=36263 > > with more or less the same outcome. > > Here is excerpt from that exchange: > >>> An "order" method would be great. Note though that the coercion to >>> RangesList only sorts by seqlevels, so we would need something that gave >>> out that order vector. The point here is to avoid forcing the user to >>> modify the order of the original GRanges. >>> >> >> With order() the user will still be able to achieve this with something >> like: >> >> regionMeans <- function(regions, cvg) >> { >> seqlevels(regions) <- names(cvg) >> oo <- order(regions) >> regions <- regions[oo] >> ans <- unlist( >> viewMeans( >> Views(cvg, as(regions, "RangesList")) >> ), use.names=FALSE) >> ans[oo] <- ans # restore original order >> ans >> } >> >> values(myRegions)$meanCov <- regionMeans(myRegions, myRleList) >> >> Tricky :-/ Very related indeed. Thanks for refreshing my memory. Note that this discussion happened more than 2 years ago at a time when we didn't have order() for GRanges yet. Now we have it so my regionMeans() proposal above should work. FWIW in ?tileGenome (GenomicRanges package), I show how to implement a similar function (binnedAverage) without the need for ordering the GRanges object (it uses a split/unsplit aproach instead). The design of binnedAverage() is also a little bit cleaner. > > Let's not forget this lesson! > > However, I agree, some syntactic sugar that allows indexing an RleList by GRanges with the proposed semantics would be great. > > In the mean time, I offer this function, which I found to be faster than Herve's offering above, and also provides a few more useful (to me) abstractions. > > YMMV: > > > applyAt <- function(cvg # an RleList (quite possibly representing a genomic coverage) > ,gr # a GRanges at each element of which you want to evaluate a function > ,FUN=identity # by default, just return individual rleLists for each element of gr > ,... # additional options to FUN > ,.MAPPLY=mapply #optionally parallize with mcmapply or your favorite variant > ,USE.NAMES=TRUE > ,SIMPLIFY=TRUE #'array'#TRUE > ,MoreArgs=NULL # passed to .MAPPLY > ,ignore.strand=TRUE # else, reverse the coverages > ) { > revif<-function(s,x) { > if(ignore.strand==TRUE) {x} > else if (s=='+') {x} > else {rev(x)} > } > ans<-.MAPPLY(function(a,b,c,s) { > FUN(revif(s,cvg[[a]][IRanges(b,c),drop=FALSE]),...) > } > ,as.vector(seqnames(gr)),start(gr),end(gr),as.vector(strand(gr)) > ,MoreArgs=MoreArgs > ,SIMPLIFY=SIMPLIFY > ,USE.NAMES=FALSE #USE.NAMES > ) > if (!USE.NAMES) {} > else if (is.null(dim(ans))) {names(ans)<-names(gr)} > else if (SIMPLIFY=='array') {colnames(ans)<-names(gr)} > else rownames(ans)<-names(gr) > ans > } > > Finally, what I think we REALLY want is a version of the above which currys a FUN (possibly `identity`) down to rle levels that can be applied to a bigwig file without having to load the whole thing into memory. > > In the mean time, I offer: > > applyAt.bw<-function(x,y,...) { > applyAtimport.bw(x,asRle=TRUE > ,selection=BigWigSelection(y) > ) > ,y > ,...) > } Interesting. It sounds to me like maybe one way to facilitate this (and also avoid the proliferation of apply-like functions) would be to finally bite the bullet and come up with a new kind of view objects where the views are *genomic* ranges instead of just ranges (like they are right now). Something that was already mentioned in this more-than-2-year-old discussion. We could start with RleListGViews and BigWigFileGViews objects, both concrete subclasses of virtual class GViews. Thanks for you feedback, H. > > ~Malcolm > > > >-----Original Message----- > >From: bioconductor-bounces at r-project.org [mailto:bioconductor- bounces at r-project.org] On Behalf Of Janet Young > >Sent: Wednesday, December 04, 2013 12:59 PM > >To: Hervé Pagès > >Cc: Michael Lawrence; bioconductor at r-project.org > >Subject: Re: [BioC] subset a RleList using GRanges object? > > > >Thanks, all. Yes, that coercion will do it for me: z[as(myRange, "RangesList")]. It's always nice for the end user when you take care of > >those kinds of things behind the scenes for us - sounds like what you're considering implementing might take care of it. > > > >Robert: thanks for the ideas! But my needs are a little different, though - I'm keeping the individual values in the region I'm interested > >in (I'm plotting RNA-seq coverage on just a single gene, but my coverage object is for the whole genome). > > > >Janet > > > > > > > >On Dec 4, 2013, at 10:09 AM, Hervé Pagès wrote: > > > >> Hi Michael, > >> > >> On 12/03/2013 07:13 PM, Michael Lawrence wrote: > >>> For now, just coerce to a RangesList. > >>> > >>> z [ as(myRange, "RangesList") ] > >>> > >>> Herve might consider adding a [,List,GenomicRanges method... kind of a > >>> logical leap though from seqnames to list element names. > >> > >> This is something I've been considering for a while. I think we should > >> do it. The mapping between the seqnames of a GRanges and the names of a > >> RangesList is well established at this point e.g. the coercion (back and > >> forth) between the two already does that. > >> > >> Cheers, > >> H. > >> > >>> > >>> extractCoverageForPositions is different; it extracts an integer vector > >>> given a vector of width-one positions. > >>> > >>> > >>> > >>> > >>> > >>> > >>> On Tue, Dec 3, 2013 at 4:53 PM, Janet Young <jayoung at="" fhcrc.org=""> wrote: > >>> > >>>> Hi there, > >>>> > >>>> I'm playing around with coverage data generated outside of R, planning to > >>>> plot RNA-seq coverage for some genes we're interested in. > >>>> > >>>> I have a request - it'd be nice from my point of view if it were possible > >>>> to look at a small region an RleList (or CompressedRleList) using a GRanges > >>>> object to focus on that smaller region. Looks like I can subset a single > >>>> Rle object with an IRanges object, but I wonder if this nice feature could > >>>> be extended to GRanges objects? > >>>> > >>>> Some code is below to show what I'm trying to do. > >>>> > >>>> thanks veruy much, > >>>> > >>>> Janet > >>>> > >>>> > >>>> > >>>> library(GenomicRanges) > >>>> > >>>> ## an example RleList > >>>> x <- Rle(10:1, 1:10) > >>>> y <- Rle(10:1, 1:10) > >>>> z <- RleList( chr1=x, chr2=y) > >>>> > >>>> ## an example GRanges object > >>>> myRange <- GRanges( seqnames="chr1", ranges=IRanges(start=10,end=15) ) > >>>> > >>>> ## subsetting an Rle using an IRanges object works, as expected: > >>>> z[["chr1"]] [ ranges(myRange) ] > >>>> > >>>> ## but subsetting an RleList by GRanges object doesn't work > >>>> z [ myRange ] > >>>> # Error in normalizeSingleBracketSubscript(i, x) : invalid subscript type > >>>> > >>>> sessionInfo() > >>>> > >>>> R Under development (unstable) (2013-11-06 r64163) > >>>> Platform: x86_64-unknown-linux-gnu (64-bit) > >>>> > >>>> locale: > >>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > >>>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > >>>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > >>>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > >>>> [9] LC_ADDRESS=C LC_TELEPHONE=C > >>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > >>>> > >>>> attached base packages: > >>>> [1] parallel stats graphics grDevices utils datasets methods > >>>> [8] base > >>>> > >>>> other attached packages: > >>>> [1] GenomicRanges_1.15.10 XVector_0.3.2 IRanges_1.21.13 > >>>> [4] BiocGenerics_0.9.1 > >>>> > >>>> loaded via a namespace (and not attached): > >>>> [1] stats4_3.1.0 > >>>> > >>>> > >>>> > >>>> > >>>> ------------------------------------------------------------------- > >>>> > >>>> Dr. Janet Young > >>>> > >>>> Malik lab > >>>> http://research.fhcrc.org/malik/en.html > >>>> > >>>> Fred Hutchinson Cancer Research Center > >>>> 1100 Fairview Avenue N., A2-025, > >>>> P.O. Box 19024, Seattle, WA 98109-1024, USA. > >>>> > >>>> tel: (206) 667 4512 > >>>> email: jayoung ...at... fhcrc.org > >>>> > >>>> ------------------------------------------------------------------- > >>>> > >>>> _______________________________________________ > >>>> Bioconductor mailing list > >>>> Bioconductor at r-project.org > >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor > >>>> Search the archives: > >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor > >>>> > >>> > >>> [[alternative HTML version deleted]] > >>> > >>> _______________________________________________ > >>> Bioconductor mailing list > >>> Bioconductor at r-project.org > >>> https://stat.ethz.ch/mailman/listinfo/bioconductor > >>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > >>> > >> > >> -- > >> Hervé Pagès > >> > >> Program in Computational Biology > >> Division of Public Health Sciences > >> Fred Hutchinson Cancer Research Center > >> 1100 Fairview Ave. N, M1-B514 > >> P.O. Box 19024 > >> Seattle, WA 98109-1024 > >> > >> E-mail: hpages at fhcrc.org > >> Phone: (206) 667-5791 > >> Fax: (206) 667-1319 > > > >_______________________________________________ > >Bioconductor mailing list > >Bioconductor at r-project.org > >https://stat.ethz.ch/mailman/listinfo/bioconductor > >Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD REPLYlink written 4.0 years ago by Hervé Pagès ♦♦ 13k
On 12/04/2013 05:18 PM, Hervé Pagès wrote: [...] > > vlist <- Views(rleList=z, rangesList=as(myRange, "RangesList")) Should be Views(z, as(myRange, "RangesList")) sorry... H.
ADD REPLYlink written 4.0 years ago by Hervé Pagès ♦♦ 13k
I like the idea of GViews. BamFileGViews and TabixFileGViews would also be high on the list. As would BSgenomeGViews if BSgenome came to support random access. Michael On Wed, Dec 4, 2013 at 5:18 PM, Hervé Pagès <hpages@fhcrc.org> wrote: > Hi Malcolm, > > > On 12/04/2013 11:43 AM, Cook, Malcolm wrote: > >> Janet, et al >> >> I believe there are problems with all approaches mentioned so far on this >> thread. >> >> When you index something, length of the result should equal the length of >> the index, >> > > Not if the index is a logical vector or a Ranges object though. > > > AND, the order of the elements of the result should correspond to the >> order of the indices. >> >> The approaches mentioned all depend upon coersion from GRanges to >> RangesList, but this does not preserve the order of the GRanges. >> > > True but it's guaranteed to preserve the order within each > space/chromosome. I think this is good enough to make subsetting > a named list-like object (e.g. named RleList, named DNAStringSet, > etc...) by a GRanges object a reasonable thing to support. > The order of the seqlevels in the GRanges object doesn't matter > and won't affect the result of the subsetting because they're > matched to the names of the list-like object to subset. > > Note that subsetting needs to behave like an "endomorphism" i.e. > the result must be of the same type as the original object. So > when subsetting an RleList by a GRanges index (like in Janet's > example), the result will always be an RleList object, even if > the index has more than 1 range per chromosome (in that case > the extracted ranges are concatenated together). I'm still not > convinced this is what Janet really needs. Like Michael mentioned > earlier, maybe a more useful approach is to create an RleViewsList > object with something like: > > vlist <- Views(rleList=z, rangesList=as(myRange, "RangesList")) > viewMeans(vlist) > > More below. > > > >> Look: >> >> example(GRangesList) >>> as(gr3,'RangesList') >>> >> IRangesList of length 2 >> $chr1 >> IRanges of length 1 >> start end width >> [1] 1 3 3 >> >> $chr2 >> IRanges of length 1 >> start end width >> [1] 4 9 6 >> >> !> as(gr3[2:1],'RangesList') >> IRangesList of length 2 >> $chr1 >> IRanges of length 1 >> start end width >> [1] 1 3 3 >> >> $chr2 >> IRanges of length 1 >> start end width >> [1] 4 9 6 >> >> I think if you contrive for your GRanges to be in 'canonical order' (as >> defined by the RangesList coersion), you can proceed with this approach. >> Look: >> >> stopifnot(all.equal(as(sort(gr3[2:1]),'RangesList'),as( >>> sort(gr3[1:2]),'RangesList'))) >>> >> >> Janet, you brought this up in >> http://thread.gmane.org/gmane.science.biology.informatics. >> conductor/36247/focus=36263 >> >> with more or less the same outcome. >> >> Here is excerpt from that exchange: >> >> An "order" method would be great. Note though that the coercion to >>>> RangesList only sorts by seqlevels, so we would need something that gave >>>> out that order vector. The point here is to avoid forcing the user to >>>> modify the order of the original GRanges. >>>> >>>> >>> With order() the user will still be able to achieve this with something >>> like: >>> >>> regionMeans <- function(regions, cvg) >>> { >>> seqlevels(regions) <- names(cvg) >>> oo <- order(regions) >>> regions <- regions[oo] >>> ans <- unlist( >>> viewMeans( >>> Views(cvg, as(regions, "RangesList")) >>> ), use.names=FALSE) >>> ans[oo] <- ans # restore original order >>> ans >>> } >>> >>> values(myRegions)$meanCov <- regionMeans(myRegions, myRleList) >>> >>> Tricky :-/ >>> >> > Very related indeed. Thanks for refreshing my memory. Note that this > discussion happened more than 2 years ago at a time when we didn't have > order() for GRanges yet. Now we have it so my regionMeans() proposal > above should work. FWIW in ?tileGenome (GenomicRanges package), I show > how to implement a similar function (binnedAverage) without the need > for ordering the GRanges object (it uses a split/unsplit aproach > instead). The design of binnedAverage() is also a little bit cleaner. > > > >> Let's not forget this lesson! >> >> However, I agree, some syntactic sugar that allows indexing an RleList by >> GRanges with the proposed semantics would be great. >> >> In the mean time, I offer this function, which I found to be faster than >> Herve's offering above, and also provides a few more useful (to me) >> abstractions. >> >> YMMV: >> >> >> applyAt <- function(cvg # an RleList (quite possibly representing a >> genomic coverage) >> ,gr # a GRanges at each element of which you want to >> evaluate a function >> ,FUN=identity # by default, just return individual >> rleLists for each element of gr >> ,... # additional options to FUN >> ,.MAPPLY=mapply #optionally parallize with mcmapply >> or your favorite variant >> ,USE.NAMES=TRUE >> ,SIMPLIFY=TRUE #'array'#TRUE >> ,MoreArgs=NULL # passed to .MAPPLY >> ,ignore.strand=TRUE # else, reverse the coverages >> ) { >> revif<-function(s,x) { >> if(ignore.strand==TRUE) {x} >> else if (s=='+') {x} >> else {rev(x)} >> } >> ans<-.MAPPLY(function(a,b,c,s) { >> FUN(revif(s,cvg[[a]][IRanges(b,c),drop=FALSE]),...) >> } >> ,as.vector(seqnames(gr)),start(gr),end(gr),as.vector( >> strand(gr)) >> ,MoreArgs=MoreArgs >> ,SIMPLIFY=SIMPLIFY >> ,USE.NAMES=FALSE #USE.NAMES >> ) >> if (!USE.NAMES) {} >> else if (is.null(dim(ans))) {names(ans)<-names(gr)} >> else if (SIMPLIFY=='array') {colnames(ans)<-names(gr)} >> else rownames(ans)<-names(gr) >> ans >> } >> >> Finally, what I think we REALLY want is a version of the above which >> currys a FUN (possibly `identity`) down to rle levels that can be applied >> to a bigwig file without having to load the whole thing into memory. >> >> In the mean time, I offer: >> >> applyAt.bw<-function(x,y,...) { >> applyAtimport.bw(x,asRle=TRUE >> ,selection=BigWigSelection(y) >> ) >> ,y >> ,...) >> } >> > > Interesting. It sounds to me like maybe one way to facilitate this > (and also avoid the proliferation of apply-like functions) would be > to finally bite the bullet and come up with a new kind of view objects > where the views are *genomic* ranges instead of just ranges (like they > are right now). Something that was already mentioned in this > more-than-2-year-old discussion. We could start with RleListGViews > and BigWigFileGViews objects, both concrete subclasses of virtual > class GViews. > > Thanks for you feedback, > H. > > > >> ~Malcolm >> >> >> >-----Original Message----- >> >From: bioconductor-bounces@r-project.org [mailto:bioconductor- bounces@ >> r-project.org] On Behalf Of Janet Young >> >Sent: Wednesday, December 04, 2013 12:59 PM >> >To: Hervé Pagès >> >Cc: Michael Lawrence; bioconductor@r-project.org >> >Subject: Re: [BioC] subset a RleList using GRanges object? >> > >> >Thanks, all. Yes, that coercion will do it for me: z[as(myRange, >> "RangesList")]. It's always nice for the end user when you take care of >> >those kinds of things behind the scenes for us - sounds like what >> you're considering implementing might take care of it. >> > >> >Robert: thanks for the ideas! But my needs are a little different, >> though - I'm keeping the individual values in the region I'm interested >> >in (I'm plotting RNA-seq coverage on just a single gene, but my >> coverage object is for the whole genome). >> > >> >Janet >> > >> > >> > >> >On Dec 4, 2013, at 10:09 AM, Hervé Pagès wrote: >> > >> >> Hi Michael, >> >> >> >> On 12/03/2013 07:13 PM, Michael Lawrence wrote: >> >>> For now, just coerce to a RangesList. >> >>> >> >>> z [ as(myRange, "RangesList") ] >> >>> >> >>> Herve might consider adding a [,List,GenomicRanges method... kind >> of a >> >>> logical leap though from seqnames to list element names. >> >> >> >> This is something I've been considering for a while. I think we >> should >> >> do it. The mapping between the seqnames of a GRanges and the names >> of a >> >> RangesList is well established at this point e.g. the coercion (back >> and >> >> forth) between the two already does that. >> >> >> >> Cheers, >> >> H. >> >> >> >>> >> >>> extractCoverageForPositions is different; it extracts an integer >> vector >> >>> given a vector of width-one positions. >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> On Tue, Dec 3, 2013 at 4:53 PM, Janet Young <jayoung@fhcrc.org> >> wrote: >> >>> >> >>>> Hi there, >> >>>> >> >>>> I'm playing around with coverage data generated outside of R, >> planning to >> >>>> plot RNA-seq coverage for some genes we're interested in. >> >>>> >> >>>> I have a request - it'd be nice from my point of view if it were >> possible >> >>>> to look at a small region an RleList (or CompressedRleList) using >> a GRanges >> >>>> object to focus on that smaller region. Looks like I can subset a >> single >> >>>> Rle object with an IRanges object, but I wonder if this nice >> feature could >> >>>> be extended to GRanges objects? >> >>>> >> >>>> Some code is below to show what I'm trying to do. >> >>>> >> >>>> thanks veruy much, >> >>>> >> >>>> Janet >> >>>> >> >>>> >> >>>> >> >>>> library(GenomicRanges) >> >>>> >> >>>> ## an example RleList >> >>>> x <- Rle(10:1, 1:10) >> >>>> y <- Rle(10:1, 1:10) >> >>>> z <- RleList( chr1=x, chr2=y) >> >>>> >> >>>> ## an example GRanges object >> >>>> myRange <- GRanges( seqnames="chr1", ranges=IRanges(start=10,end=15) >> ) >> >>>> >> >>>> ## subsetting an Rle using an IRanges object works, as expected: >> >>>> z[["chr1"]] [ ranges(myRange) ] >> >>>> >> >>>> ## but subsetting an RleList by GRanges object doesn't work >> >>>> z [ myRange ] >> >>>> # Error in normalizeSingleBracketSubscript(i, x) : invalid >> subscript type >> >>>> >> >>>> sessionInfo() >> >>>> >> >>>> R Under development (unstable) (2013-11-06 r64163) >> >>>> Platform: x86_64-unknown-linux-gnu (64-bit) >> >>>> >> >>>> locale: >> >>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> >>>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> >>>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >> >>>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >> >>>> [9] LC_ADDRESS=C LC_TELEPHONE=C >> >>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >>>> >> >>>> attached base packages: >> >>>> [1] parallel stats graphics grDevices utils datasets >> methods >> >>>> [8] base >> >>>> >> >>>> other attached packages: >> >>>> [1] GenomicRanges_1.15.10 XVector_0.3.2 IRanges_1.21.13 >> >>>> [4] BiocGenerics_0.9.1 >> >>>> >> >>>> loaded via a namespace (and not attached): >> >>>> [1] stats4_3.1.0 >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> ------------------------------------------------------------ >> ------- >> >>>> >> >>>> Dr. Janet Young >> >>>> >> >>>> Malik lab >> >>>> http://research.fhcrc.org/malik/en.html >> >>>> >> >>>> Fred Hutchinson Cancer Research Center >> >>>> 1100 Fairview Avenue N., A2-025, >> >>>> P.O. Box 19024, Seattle, WA 98109-1024, USA. >> >>>> >> >>>> tel: (206) 667 4512 >> >>>> email: jayoung ...at... fhcrc.org >> >>>> >> >>>> ------------------------------------------------------------ >> ------- >> >>>> >> >>>> _______________________________________________ >> >>>> Bioconductor mailing list >> >>>> Bioconductor@r-project.org >> >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >> >>>> Search the archives: >> >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >>>> >> >>> >> >>> [[alternative HTML version deleted]] >> >>> >> >>> _______________________________________________ >> >>> Bioconductor mailing list >> >>> Bioconductor@r-project.org >> >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >> >>> Search the archives: http://news.gmane.org/gmane. >> science.biology.informatics.conductor >> >>> >> >> >> >> -- >> >> Hervé Pagès >> >> >> >> Program in Computational Biology >> >> Division of Public Health Sciences >> >> Fred Hutchinson Cancer Research Center >> >> 1100 Fairview Ave. N, M1-B514 >> >> P.O. Box 19024 >> >> Seattle, WA 98109-1024 >> >> >> >> E-mail: hpages@fhcrc.org >> >> Phone: (206) 667-5791 >> >> Fax: (206) 667-1319 >> > >> >_______________________________________________ >> >Bioconductor mailing list >> >Bioconductor@r-project.org >> >https://stat.ethz.ch/mailman/listinfo/bioconductor >> >Search the archives: http://news.gmane.org/gmane. >> science.biology.informatics.conductor >> >> > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpages@fhcrc.org > Phone: (206) 667-5791 > Fax: (206) 667-1319 > [[alternative HTML version deleted]]
ADD REPLYlink written 4.0 years ago by Michael Lawrence9.8k
+1 +1 A+, ~Malcolm From: Michael Lawrence [mailto:lawrence.michael@gene.com] Sent: Wednesday, December 04, 2013 9:11 PM To: Herv� Pag�s Cc: Cook, Malcolm; Janet Young; Michael Lawrence; bioconductor@r-project.org Subject: Re: [BioC] subset a RleList using GRanges object? I like the idea of GViews. BamFileGViews and TabixFileGViews would also be high on the list. As would BSgenomeGViews if BSgenome came to support random access. Michael On Wed, Dec 4, 2013 at 5:18 PM, Herv� Pag�s <hpages@fhcrc.org<mailto:hpages@fhcrc.org>> wrote: Hi Malcolm, On 12/04/2013 11:43 AM, Cook, Malcolm wrote: Janet, et al I believe there are problems with all approaches mentioned so far on this thread. When you index something, length of the result should equal the length of the index, Not if the index is a logical vector or a Ranges object though. AND, the order of the elements of the result should correspond to the order of the indices. The approaches mentioned all depend upon coersion from GRanges to RangesList, but this does not preserve the order of the GRanges. True but it's guaranteed to preserve the order within each space/chromosome. I think this is good enough to make subsetting a named list-like object (e.g. named RleList, named DNAStringSet, etc...) by a GRanges object a reasonable thing to support. The order of the seqlevels in the GRanges object doesn't matter and won't affect the result of the subsetting because they're matched to the names of the list-like object to subset. Note that subsetting needs to behave like an "endomorphism" i.e. the result must be of the same type as the original object. So when subsetting an RleList by a GRanges index (like in Janet's example), the result will always be an RleList object, even if the index has more than 1 range per chromosome (in that case the extracted ranges are concatenated together). I'm still not convinced this is what Janet really needs. Like Michael mentioned earlier, maybe a more useful approach is to create an RleViewsList object with something like: vlist <- Views(rleList=z, rangesList=as(myRange, "RangesList")) viewMeans(vlist) More below. Look: example(GRangesList) as(gr3,'RangesList') IRangesList of length 2 $chr1 IRanges of length 1 start end width [1] 1 3 3 $chr2 IRanges of length 1 start end width [1] 4 9 6 !> as(gr3[2:1],'RangesList') IRangesList of length 2 $chr1 IRanges of length 1 start end width [1] 1 3 3 $chr2 IRanges of length 1 start end width [1] 4 9 6 I think if you contrive for your GRanges to be in 'canonical order' (as defined by the RangesList coersion), you can proceed with this approach. Look: stopifnot(all.equal(as(sort(gr3[2:1]),'RangesList'),as(sort(gr3[1:2]), 'RangesList'))) Janet, you brought this up in http://thread.gmane.org/gmane.science.biology.informatics.conductor/36 247/focus=36263 with more or less the same outcome. Here is excerpt from that exchange: An "order" method would be great. Note though that the coercion to RangesList only sorts by seqlevels, so we would need something that gave out that order vector. The point here is to avoid forcing the user to modify the order of the original GRanges. With order() the user will still be able to achieve this with something like: regionMeans <- function(regions, cvg) { seqlevels(regions) <- names(cvg) oo <- order(regions) regions <- regions[oo] ans <- unlist( viewMeans( Views(cvg, as(regions, "RangesList")) ), use.names=FALSE) ans[oo] <- ans # restore original order ans } values(myRegions)$meanCov <- regionMeans(myRegions, myRleList) Tricky :-/ Very related indeed. Thanks for refreshing my memory. Note that this discussion happened more than 2 years ago at a time when we didn't have order() for GRanges yet. Now we have it so my regionMeans() proposal above should work. FWIW in ?tileGenome (GenomicRanges package), I show how to implement a similar function (binnedAverage) without the need for ordering the GRanges object (it uses a split/unsplit aproach instead). The design of binnedAverage() is also a little bit cleaner. Let's not forget this lesson! However, I agree, some syntactic sugar that allows indexing an RleList by GRanges with the proposed semantics would be great. In the mean time, I offer this function, which I found to be faster than Herve's offering above, and also provides a few more useful (to me) abstractions. YMMV: applyAt <- function(cvg # an RleList (quite possibly representing a genomic coverage) ,gr # a GRanges at each element of which you want to evaluate a function ,FUN=identity # by default, just return individual rleLists for each element of gr ,... # additional options to FUN ,.MAPPLY=mapply #optionally parallize with mcmapply or your favorite variant ,USE.NAMES=TRUE ,SIMPLIFY=TRUE #'array'#TRUE ,MoreArgs=NULL # passed to .MAPPLY ,ignore.strand=TRUE # else, reverse the coverages ) { revif<-function(s,x) { if(ignore.strand==TRUE) {x} else if (s=='+') {x} else {rev(x)} } ans<-.MAPPLY(function(a,b,c,s) { FUN(revif(s,cvg[[a]][IRanges(b,c),drop=FALSE]),...) } ,as.vector(seqnames(gr)),start(gr),end(gr),as.vector(strand(gr)) ,MoreArgs=MoreArgs ,SIMPLIFY=SIMPLIFY ,USE.NAMES=FALSE #USE.NAMES ) if (!USE.NAMES) {} else if (is.null(dim(ans))) {names(ans)<-names(gr)} else if (SIMPLIFY=='array') {colnames(ans)<-names(gr)} else rownames(ans)<-names(gr) ans } Finally, what I think we REALLY want is a version of the above which currys a FUN (possibly `identity`) down to rle levels that can be applied to a bigwig file without having to load the whole thing into memory. In the mean time, I offer: applyAt.bw<-function(x,y,...) { applyAtimport.bw<http: import.bw="">(x,asRle=TRUE ,selection=BigWigSelection(y) ) ,y ,...) } Interesting. It sounds to me like maybe one way to facilitate this (and also avoid the proliferation of apply-like functions) would be to finally bite the bullet and come up with a new kind of view objects where the views are *genomic* ranges instead of just ranges (like they are right now). Something that was already mentioned in this more-than-2-year-old discussion. We could start with RleListGViews and BigWigFileGViews objects, both concrete subclasses of virtual class GViews. Thanks for you feedback, H. ~Malcolm >-----Original Message----- >From: bioconductor-bounces@r-project.org<mailto:bioconductor- bounces@r-project.org=""> [mailto:bioconductor- bounces@r-project.org<mailto:bioconductor-bounces@r-project.org>] On Behalf Of Janet Young >Sent: Wednesday, December 04, 2013 12:59 PM >To: Herv� Pag�s >Cc: Michael Lawrence; bioconductor@r-project.org<mailto:bioconductor@r-project.org> >Subject: Re: [BioC] subset a RleList using GRanges object? > >Thanks, all. Yes, that coercion will do it for me: z[as(myRange, "RangesList")]. It's always nice for the end user when you take care of >those kinds of things behind the scenes for us - sounds like what you're considering implementing might take care of it. > >Robert: thanks for the ideas! But my needs are a little different, though - I'm keeping the individual values in the region I'm interested >in (I'm plotting RNA-seq coverage on just a single gene, but my coverage object is for the whole genome). > >Janet > > > >On Dec 4, 2013, at 10:09 AM, Herv� Pag�s wrote: > >> Hi Michael, >> >> On 12/03/2013 07:13 PM, Michael Lawrence wrote: >>> For now, just coerce to a RangesList. >>> >>> z [ as(myRange, "RangesList") ] >>> >>> Herve might consider adding a [,List,GenomicRanges method... kind of a >>> logical leap though from seqnames to list element names. >> >> This is something I've been considering for a while. I think we should >> do it. The mapping between the seqnames of a GRanges and the names of a >> RangesList is well established at this point e.g. the coercion (back and >> forth) between the two already does that. >> >> Cheers, >> H. >> >>> >>> extractCoverageForPositions is different; it extracts an integer vector >>> given a vector of width-one positions. >>> >>> >>> >>> >>> >>> >>> On Tue, Dec 3, 2013 at 4:53 PM, Janet Young <jayoung@fhcrc.org<mailto:jayoung@fhcrc.org>> wrote: >>> >>>> Hi there, >>>> >>>> I'm playing around with coverage data generated outside of R, planning to >>>> plot RNA-seq coverage for some genes we're interested in. >>>> >>>> I have a request - it'd be nice from my point of view if it were possible >>>> to look at a small region an RleList (or CompressedRleList) using a GRanges >>>> object to focus on that smaller region. Looks like I can subset a single >>>> Rle object with an IRanges object, but I wonder if this nice feature could >>>> be extended to GRanges objects? >>>> >>>> Some code is below to show what I'm trying to do. >>>> >>>> thanks veruy much, >>>> >>>> Janet >>>> >>>> >>>> >>>> library(GenomicRanges) >>>> >>>> ## an example RleList >>>> x <- Rle(10:1, 1:10) >>>> y <- Rle(10:1, 1:10) >>>> z <- RleList( chr1=x, chr2=y) >>>> >>>> ## an example GRanges object >>>> myRange <- GRanges( seqnames="chr1", ranges=IRanges(start=10,end=15) ) >>>> >>>> ## subsetting an Rle using an IRanges object works, as expected: >>>> z[["chr1"]] [ ranges(myRange) ] >>>> >>>> ## but subsetting an RleList by GRanges object doesn't work >>>> z [ myRange ] >>>> # Error in normalizeSingleBracketSubscript(i, x) : invalid subscript type >>>> >>>> sessionInfo() >>>> >>>> R Under development (unstable) (2013-11-06 r64163) >>>> Platform: x86_64-unknown-linux-gnu (64-bit) >>>> >>>> locale: >>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >>>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >>>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>>> >>>> attached base packages: >>>> [1] parallel stats graphics grDevices utils datasets methods >>>> [8] base >>>> >>>> other attached packages: >>>> [1] GenomicRanges_1.15.10 XVector_0.3.2 IRanges_1.21.13 >>>> [4] BiocGenerics_0.9.1 >>>> >>>> loaded via a namespace (and not attached): >>>> [1] stats4_3.1.0 >>>> >>>> >>>> >>>> >>>> ------------------------------------------------------------------- >>>> >>>> Dr. Janet Young >>>> >>>> Malik lab >>>> http://research.fhcrc.org/malik/en.html >>>> >>>> Fred Hutchinson Cancer Research Center >>>> 1100 Fairview Avenue N., A2-025, >>>> P.O. Box 19024, Seattle, WA 98109-1024, USA. >>>> >>>> tel: (206) 667 4512<tel:%28206%29%20667%204512> >>>> email: jayoung ...at... fhcrc.org<http: fhcrc.org=""> >>>> >>>> ------------------------------------------------------------------- >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor@r-project.org<mailto:bioconductor@r-project.org> >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor@r-project.org<mailto:bioconductor@r-project.org> >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> -- >> Herv� Pag�s >> >> Program in Computational Biology >> Division of Public Health Sciences >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N, M1-B514 >> P.O. Box 19024 >> Seattle, WA 98109-1024 >> >> E-mail: hpages@fhcrc.org<mailto:hpages@fhcrc.org> >> Phone: (206) 667-5791<tel:%28206%29%20667-5791> >> Fax: (206) 667-1319<tel:%28206%29%20667-1319> > >_______________________________________________ >Bioconductor mailing list >Bioconductor@r-project.org<mailto:bioconductor@r-project.org> >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Herv� Pag�s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages@fhcrc.org<mailto:hpages@fhcrc.org> Phone: (206) 667-5791<tel:%28206%29%20667-5791> Fax: (206) 667-1319<tel:%28206%29%20667-1319> [[alternative HTML version deleted]]
ADD REPLYlink written 4.0 years ago by Malcolm Cook1.4k
Hi, On 12/04/2013 05:18 PM, Hervé Pagès wrote: > Hi Malcolm, > > On 12/04/2013 11:43 AM, Cook, Malcolm wrote: >> Janet, et al >> >> I believe there are problems with all approaches mentioned so far on >> this thread. >> >> When you index something, length of the result should equal the length >> of the index, > > Not if the index is a logical vector or a Ranges object though. > >> AND, the order of the elements of the result should correspond to the >> order of the indices. >> >> The approaches mentioned all depend upon coersion from GRanges to >> RangesList, but this does not preserve the order of the GRanges. > > True but it's guaranteed to preserve the order within each > space/chromosome. I think this is good enough to make subsetting > a named list-like object (e.g. named RleList, named DNAStringSet, > etc...) by a GRanges object a reasonable thing to support. > The order of the seqlevels in the GRanges object doesn't matter > and won't affect the result of the subsetting because they're > matched to the names of the list-like object to subset. > > Note that subsetting needs to behave like an "endomorphism" i.e. > the result must be of the same type as the original object. So > when subsetting an RleList by a GRanges index (like in Janet's > example), the result will always be an RleList object, even if > the index has more than 1 range per chromosome (in that case > the extracted ranges are concatenated together). I'm still not > convinced this is what Janet really needs. I take this back. Yes subsetting by a GRanges needs to behave as an "endomorphism" but it doesn't have to behave the way I'm describing here (which is the behavior we would get if GRanges subscript 'gr' was replaced with 'as(gr, "RangesList")'). A more useful behavior (and maybe this is what Malcolm was trying to tell me, sorry) is to return an object that is parallel to the GRanges object: > gr <- GRanges(Rle(c("chr2", "chr1"), c(3, 2)), IRanges(1, 7:3)) > gr GRanges with 5 ranges and 0 metadata columns: seqnames ranges strand <rle> <iranges> <rle> [1] chr2 [1, 7] * [2] chr2 [1, 6] * [3] chr2 [1, 5] * [4] chr1 [1, 4] * [5] chr1 [1, 3] * --- seqlengths: chr1 chr2 NA NA With an RleList: > x <- RleList(chr1=1:6, chr2=101:110) > x[gr] RleList of length 5 $chr2 integer-Rle of length 7 with 7 runs Lengths: 1 1 1 1 1 1 1 Values : 101 102 103 104 105 106 107 $chr2 integer-Rle of length 6 with 6 runs Lengths: 1 1 1 1 1 1 Values : 101 102 103 104 105 106 $chr2 integer-Rle of length 5 with 5 runs Lengths: 1 1 1 1 1 Values : 101 102 103 104 105 $chr1 integer-Rle of length 4 with 4 runs Lengths: 1 1 1 1 Values : 1 2 3 4 $chr1 integer-Rle of length 3 with 3 runs Lengths: 1 1 1 Values : 1 2 3 With a DNAStringSet: > dna <- DNAStringSet(c(chr1="AATANG", chr2="TTACCACGTT")) > dna[gr] A DNAStringSet instance of length 5 width seq names [1] 7 TTACCAC chr2 [2] 6 TTACCA chr2 [3] 5 TTACC chr2 [4] 4 AATA chr1 [5] 3 AAT chr1 Then the returned object can be added as a metadata col to the GRanges object, possibly after passing it thru an extra summarization step (with e.g. 'mean(x[gr])' in the case of the RleList), which is probably infinitely more useful than a subsetting that would replace 'gr' with 'as(gr, "RangesList")'. And it's still an "endomorphism". I've added this to GenomicRanges 1.15.14 (devel). Only very lightly tested and not optimized yet. Cheers, H. > Like Michael mentioned > earlier, maybe a more useful approach is to create an RleViewsList > object with something like: > > vlist <- Views(rleList=z, rangesList=as(myRange, "RangesList")) > viewMeans(vlist) > > More below. > >> >> Look: >> >>> example(GRangesList) >>> as(gr3,'RangesList') >> IRangesList of length 2 >> $chr1 >> IRanges of length 1 >> start end width >> [1] 1 3 3 >> >> $chr2 >> IRanges of length 1 >> start end width >> [1] 4 9 6 >> >> !> as(gr3[2:1],'RangesList') >> IRangesList of length 2 >> $chr1 >> IRanges of length 1 >> start end width >> [1] 1 3 3 >> >> $chr2 >> IRanges of length 1 >> start end width >> [1] 4 9 6 >> >> I think if you contrive for your GRanges to be in 'canonical order' >> (as defined by the RangesList coersion), you can proceed with this >> approach. Look: >> >>> stopifnot(all.equal(as(sort(gr3[2:1]),'RangesList'),as(sort(gr3[1: 2]),'RangesList'))) >>> >> >> Janet, you brought this up in >> http://thread.gmane.org/gmane.science.biology.informatics.conductor /36247/focus=36263 >> >> >> with more or less the same outcome. >> >> Here is excerpt from that exchange: >> >>>> An "order" method would be great. Note though that the coercion to >>>> RangesList only sorts by seqlevels, so we would need something that >>>> gave >>>> out that order vector. The point here is to avoid forcing the user to >>>> modify the order of the original GRanges. >>>> >>> >>> With order() the user will still be able to achieve this with something >>> like: >>> >>> regionMeans <- function(regions, cvg) >>> { >>> seqlevels(regions) <- names(cvg) >>> oo <- order(regions) >>> regions <- regions[oo] >>> ans <- unlist( >>> viewMeans( >>> Views(cvg, as(regions, "RangesList")) >>> ), use.names=FALSE) >>> ans[oo] <- ans # restore original order >>> ans >>> } >>> >>> values(myRegions)$meanCov <- regionMeans(myRegions, myRleList) >>> >>> Tricky :-/ > > Very related indeed. Thanks for refreshing my memory. Note that this > discussion happened more than 2 years ago at a time when we didn't have > order() for GRanges yet. Now we have it so my regionMeans() proposal > above should work. FWIW in ?tileGenome (GenomicRanges package), I show > how to implement a similar function (binnedAverage) without the need > for ordering the GRanges object (it uses a split/unsplit aproach > instead). The design of binnedAverage() is also a little bit cleaner. > >> >> Let's not forget this lesson! >> >> However, I agree, some syntactic sugar that allows indexing an RleList >> by GRanges with the proposed semantics would be great. >> >> In the mean time, I offer this function, which I found to be faster >> than Herve's offering above, and also provides a few more useful (to >> me) abstractions. >> >> YMMV: >> >> >> applyAt <- function(cvg # an RleList (quite possibly representing a >> genomic coverage) >> ,gr # a GRanges at each element of which you want >> to evaluate a function >> ,FUN=identity # by default, just return >> individual rleLists for each element of gr >> ,... # additional options to FUN >> ,.MAPPLY=mapply #optionally parallize with >> mcmapply or your favorite variant >> ,USE.NAMES=TRUE >> ,SIMPLIFY=TRUE #'array'#TRUE >> ,MoreArgs=NULL # passed to .MAPPLY >> ,ignore.strand=TRUE # else, reverse the coverages >> ) { >> revif<-function(s,x) { >> if(ignore.strand==TRUE) {x} >> else if (s=='+') {x} >> else {rev(x)} >> } >> ans<-.MAPPLY(function(a,b,c,s) { >> FUN(revif(s,cvg[[a]][IRanges(b,c),drop=FALSE]),...) >> } >> >> ,as.vector(seqnames(gr)),start(gr),end(gr),as.vector(strand(gr)) >> ,MoreArgs=MoreArgs >> ,SIMPLIFY=SIMPLIFY >> ,USE.NAMES=FALSE #USE.NAMES >> ) >> if (!USE.NAMES) {} >> else if (is.null(dim(ans))) {names(ans)<-names(gr)} >> else if (SIMPLIFY=='array') {colnames(ans)<-names(gr)} >> else rownames(ans)<-names(gr) >> ans >> } >> >> Finally, what I think we REALLY want is a version of the above which >> currys a FUN (possibly `identity`) down to rle levels that can be >> applied to a bigwig file without having to load the whole thing into >> memory. >> >> In the mean time, I offer: >> >> applyAt.bw<-function(x,y,...) { >> applyAtimport.bw(x,asRle=TRUE >> ,selection=BigWigSelection(y) >> ) >> ,y >> ,...) >> } > > Interesting. It sounds to me like maybe one way to facilitate this > (and also avoid the proliferation of apply-like functions) would be > to finally bite the bullet and come up with a new kind of view objects > where the views are *genomic* ranges instead of just ranges (like they > are right now). Something that was already mentioned in this > more-than-2-year-old discussion. We could start with RleListGViews > and BigWigFileGViews objects, both concrete subclasses of virtual > class GViews. > > Thanks for you feedback, > H. > >> >> ~Malcolm >> >> >> >-----Original Message----- >> >From: bioconductor-bounces at r-project.org >> [mailto:bioconductor-bounces at r-project.org] On Behalf Of Janet Young >> >Sent: Wednesday, December 04, 2013 12:59 PM >> >To: Hervé Pagès >> >Cc: Michael Lawrence; bioconductor at r-project.org >> >Subject: Re: [BioC] subset a RleList using GRanges object? >> > >> >Thanks, all. Yes, that coercion will do it for me: z[as(myRange, >> "RangesList")]. It's always nice for the end user when you take care of >> >those kinds of things behind the scenes for us - sounds like what >> you're considering implementing might take care of it. >> > >> >Robert: thanks for the ideas! But my needs are a little different, >> though - I'm keeping the individual values in the region I'm interested >> >in (I'm plotting RNA-seq coverage on just a single gene, but my >> coverage object is for the whole genome). >> > >> >Janet >> > >> > >> > >> >On Dec 4, 2013, at 10:09 AM, Hervé Pagès wrote: >> > >> >> Hi Michael, >> >> >> >> On 12/03/2013 07:13 PM, Michael Lawrence wrote: >> >>> For now, just coerce to a RangesList. >> >>> >> >>> z [ as(myRange, "RangesList") ] >> >>> >> >>> Herve might consider adding a [,List,GenomicRanges method... >> kind of a >> >>> logical leap though from seqnames to list element names. >> >> >> >> This is something I've been considering for a while. I think we >> should >> >> do it. The mapping between the seqnames of a GRanges and the >> names of a >> >> RangesList is well established at this point e.g. the coercion >> (back and >> >> forth) between the two already does that. >> >> >> >> Cheers, >> >> H. >> >> >> >>> >> >>> extractCoverageForPositions is different; it extracts an integer >> vector >> >>> given a vector of width-one positions. >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> On Tue, Dec 3, 2013 at 4:53 PM, Janet Young <jayoung at="" fhcrc.org=""> >> wrote: >> >>> >> >>>> Hi there, >> >>>> >> >>>> I'm playing around with coverage data generated outside of R, >> planning to >> >>>> plot RNA-seq coverage for some genes we're interested in. >> >>>> >> >>>> I have a request - it'd be nice from my point of view if it >> were possible >> >>>> to look at a small region an RleList (or CompressedRleList) >> using a GRanges >> >>>> object to focus on that smaller region. Looks like I can >> subset a single >> >>>> Rle object with an IRanges object, but I wonder if this nice >> feature could >> >>>> be extended to GRanges objects? >> >>>> >> >>>> Some code is below to show what I'm trying to do. >> >>>> >> >>>> thanks veruy much, >> >>>> >> >>>> Janet >> >>>> >> >>>> >> >>>> >> >>>> library(GenomicRanges) >> >>>> >> >>>> ## an example RleList >> >>>> x <- Rle(10:1, 1:10) >> >>>> y <- Rle(10:1, 1:10) >> >>>> z <- RleList( chr1=x, chr2=y) >> >>>> >> >>>> ## an example GRanges object >> >>>> myRange <- GRanges( seqnames="chr1", >> ranges=IRanges(start=10,end=15) ) >> >>>> >> >>>> ## subsetting an Rle using an IRanges object works, as expected: >> >>>> z[["chr1"]] [ ranges(myRange) ] >> >>>> >> >>>> ## but subsetting an RleList by GRanges object doesn't work >> >>>> z [ myRange ] >> >>>> # Error in normalizeSingleBracketSubscript(i, x) : invalid >> subscript type >> >>>> >> >>>> sessionInfo() >> >>>> >> >>>> R Under development (unstable) (2013-11-06 r64163) >> >>>> Platform: x86_64-unknown-linux-gnu (64-bit) >> >>>> >> >>>> locale: >> >>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> >>>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> >>>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >> >>>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >> >>>> [9] LC_ADDRESS=C LC_TELEPHONE=C >> >>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >>>> >> >>>> attached base packages: >> >>>> [1] parallel stats graphics grDevices utils datasets >> methods >> >>>> [8] base >> >>>> >> >>>> other attached packages: >> >>>> [1] GenomicRanges_1.15.10 XVector_0.3.2 IRanges_1.21.13 >> >>>> [4] BiocGenerics_0.9.1 >> >>>> >> >>>> loaded via a namespace (and not attached): >> >>>> [1] stats4_3.1.0 >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> ------------------------------------------------------------------- >> >>>> >> >>>> Dr. Janet Young >> >>>> >> >>>> Malik lab >> >>>> http://research.fhcrc.org/malik/en.html >> >>>> >> >>>> Fred Hutchinson Cancer Research Center >> >>>> 1100 Fairview Avenue N., A2-025, >> >>>> P.O. Box 19024, Seattle, WA 98109-1024, USA. >> >>>> >> >>>> tel: (206) 667 4512 >> >>>> email: jayoung ...at... fhcrc.org >> >>>> >> >>>> >> ------------------------------------------------------------------- >> >>>> >> >>>> _______________________________________________ >> >>>> Bioconductor mailing list >> >>>> Bioconductor at r-project.org >> >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >> >>>> Search the archives: >> >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >>>> >> >>> >> >>> [[alternative HTML version deleted]] >> >>> >> >>> _______________________________________________ >> >>> Bioconductor mailing list >> >>> Bioconductor at r-project.org >> >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >> >>> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >>> >> >> >> >> -- >> >> Hervé Pagès >> >> >> >> Program in Computational Biology >> >> Division of Public Health Sciences >> >> Fred Hutchinson Cancer Research Center >> >> 1100 Fairview Ave. N, M1-B514 >> >> P.O. Box 19024 >> >> Seattle, WA 98109-1024 >> >> >> >> E-mail: hpages at fhcrc.org >> >> Phone: (206) 667-5791 >> >> Fax: (206) 667-1319 >> > >> >_______________________________________________ >> >Bioconductor mailing list >> >Bioconductor at r-project.org >> >https://stat.ethz.ch/mailman/listinfo/bioconductor >> >Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD REPLYlink written 3.9 years ago by Hervé Pagès ♦♦ 13k
Herve, >On 12/04/2013 05:18 PM, Hervé Pagès wrote: >> Hi Malcolm, >> >> On 12/04/2013 11:43 AM, Cook, Malcolm wrote: >>> Janet, et al >>> >>> I believe there are problems with all approaches mentioned so far on >>> this thread. >>> >>> When you index something, length of the result should equal the length >>> of the index, >> >> Not if the index is a logical vector or a Ranges object though. >> >>> AND, the order of the elements of the result should correspond to the >>> order of the indices. >>> >>> The approaches mentioned all depend upon coersion from GRanges to >>> RangesList, but this does not preserve the order of the GRanges. >> >> True but it's guaranteed to preserve the order within each >> space/chromosome. I think this is good enough to make subsetting >> a named list-like object (e.g. named RleList, named DNAStringSet, >> etc...) by a GRanges object a reasonable thing to support. >> The order of the seqlevels in the GRanges object doesn't matter >> and won't affect the result of the subsetting because they're >> matched to the names of the list-like object to subset. >> >> Note that subsetting needs to behave like an "endomorphism" i.e. >> the result must be of the same type as the original object. So >> when subsetting an RleList by a GRanges index (like in Janet's >> example), the result will always be an RleList object, even if >> the index has more than 1 range per chromosome (in that case >> the extracted ranges are concatenated together). I'm still not >> convinced this is what Janet really needs. > >I take this back. Yes subsetting by a GRanges needs to behave as an >"endomorphism" but it doesn't have to behave the way I'm describing >here (which is the behavior we would get if GRanges subscript 'gr' >was replaced with 'as(gr, "RangesList")'). > >A more useful behavior (and maybe this is what Malcolm was trying >to tell me, sorry) is to return an object that is parallel to the >GRanges object: Yes, exactly, thanks, "parallel" in that it stands in 1-to-1 with the components of the GRanges. However, the summarization step wants to be computed inside the score of the indexing especially if the indexing is being mcapplied since you want the smallest (summarized) value coming back between processes.... ~Malcolm > > > gr <- GRanges(Rle(c("chr2", "chr1"), c(3, 2)), IRanges(1, 7:3)) > > gr > GRanges with 5 ranges and 0 metadata columns: > seqnames ranges strand > <rle> <iranges> <rle> > [1] chr2 [1, 7] * > [2] chr2 [1, 6] * > [3] chr2 [1, 5] * > [4] chr1 [1, 4] * > [5] chr1 [1, 3] * > --- > seqlengths: > chr1 chr2 > NA NA > >With an RleList: > > > x <- RleList(chr1=1:6, chr2=101:110) > > x[gr] > RleList of length 5 > $chr2 > integer-Rle of length 7 with 7 runs > Lengths: 1 1 1 1 1 1 1 > Values : 101 102 103 104 105 106 107 > > $chr2 > integer-Rle of length 6 with 6 runs > Lengths: 1 1 1 1 1 1 > Values : 101 102 103 104 105 106 > > $chr2 > integer-Rle of length 5 with 5 runs > Lengths: 1 1 1 1 1 > Values : 101 102 103 104 105 > > $chr1 > integer-Rle of length 4 with 4 runs > Lengths: 1 1 1 1 > Values : 1 2 3 4 > > $chr1 > integer-Rle of length 3 with 3 runs > Lengths: 1 1 1 > Values : 1 2 3 > >With a DNAStringSet: > > > dna <- DNAStringSet(c(chr1="AATANG", chr2="TTACCACGTT")) > > dna[gr] > A DNAStringSet instance of length 5 > width seq names > > [1] 7 TTACCAC chr2 > [2] 6 TTACCA chr2 > [3] 5 TTACC chr2 > [4] 4 AATA chr1 > [5] 3 AAT chr1 > >Then the returned object can be added as a metadata col to the GRanges >object, possibly after passing it thru an extra summarization step (with >e.g. 'mean(x[gr])' in the case of the RleList), which is probably >infinitely more useful than a subsetting that would replace 'gr' with >'as(gr, "RangesList")'. And it's still an "endomorphism". > >I've added this to GenomicRanges 1.15.14 (devel). Only very lightly >tested and not optimized yet. > >Cheers, >H. > > >> Like Michael mentioned >> earlier, maybe a more useful approach is to create an RleViewsList >> object with something like: >> >> vlist <- Views(rleList=z, rangesList=as(myRange, "RangesList")) >> viewMeans(vlist) >> >> More below. >> >>> >>> Look: >>> >>>> example(GRangesList) >>>> as(gr3,'RangesList') >>> IRangesList of length 2 >>> $chr1 >>> IRanges of length 1 >>> start end width >>> [1] 1 3 3 >>> >>> $chr2 >>> IRanges of length 1 >>> start end width >>> [1] 4 9 6 >>> >>> !> as(gr3[2:1],'RangesList') >>> IRangesList of length 2 >>> $chr1 >>> IRanges of length 1 >>> start end width >>> [1] 1 3 3 >>> >>> $chr2 >>> IRanges of length 1 >>> start end width >>> [1] 4 9 6 >>> >>> I think if you contrive for your GRanges to be in 'canonical order' >>> (as defined by the RangesList coersion), you can proceed with this >>> approach. Look: >>> >>>> stopifnot(all.equal(as(sort(gr3[2:1]),'RangesList'),as(sort(gr3[ 1:2]),'RangesList'))) >>>> >>> >>> Janet, you brought this up in >>> http://thread.gmane.org/gmane.science.biology.informatics.conduct or/36247/focus=36263 >>> >>> >>> with more or less the same outcome. >>> >>> Here is excerpt from that exchange: >>> >>>>> An "order" method would be great. Note though that the coercion to >>>>> RangesList only sorts by seqlevels, so we would need something that >>>>> gave >>>>> out that order vector. The point here is to avoid forcing the user to >>>>> modify the order of the original GRanges. >>>>> >>>> >>>> With order() the user will still be able to achieve this with something >>>> like: >>>> >>>> regionMeans <- function(regions, cvg) >>>> { >>>> seqlevels(regions) <- names(cvg) >>>> oo <- order(regions) >>>> regions <- regions[oo] >>>> ans <- unlist( >>>> viewMeans( >>>> Views(cvg, as(regions, "RangesList")) >>>> ), use.names=FALSE) >>>> ans[oo] <- ans # restore original order >>>> ans >>>> } >>>> >>>> values(myRegions)$meanCov <- regionMeans(myRegions, myRleList) >>>> >>>> Tricky :-/ >> >> Very related indeed. Thanks for refreshing my memory. Note that this >> discussion happened more than 2 years ago at a time when we didn't have >> order() for GRanges yet. Now we have it so my regionMeans() proposal >> above should work. FWIW in ?tileGenome (GenomicRanges package), I show >> how to implement a similar function (binnedAverage) without the need >> for ordering the GRanges object (it uses a split/unsplit aproach >> instead). The design of binnedAverage() is also a little bit cleaner. >> >>> >>> Let's not forget this lesson! >>> >>> However, I agree, some syntactic sugar that allows indexing an RleList >>> by GRanges with the proposed semantics would be great. >>> >>> In the mean time, I offer this function, which I found to be faster >>> than Herve's offering above, and also provides a few more useful (to >>> me) abstractions. >>> >>> YMMV: >>> >>> >>> applyAt <- function(cvg # an RleList (quite possibly representing a >>> genomic coverage) >>> ,gr # a GRanges at each element of which you want >>> to evaluate a function >>> ,FUN=identity # by default, just return >>> individual rleLists for each element of gr >>> ,... # additional options to FUN >>> ,.MAPPLY=mapply #optionally parallize with >>> mcmapply or your favorite variant >>> ,USE.NAMES=TRUE >>> ,SIMPLIFY=TRUE #'array'#TRUE >>> ,MoreArgs=NULL # passed to .MAPPLY >>> ,ignore.strand=TRUE # else, reverse the coverages >>> ) { >>> revif<-function(s,x) { >>> if(ignore.strand==TRUE) {x} >>> else if (s=='+') {x} >>> else {rev(x)} >>> } >>> ans<-.MAPPLY(function(a,b,c,s) { >>> FUN(revif(s,cvg[[a]][IRanges(b,c),drop=FALSE]),...) >>> } >>> >>> ,as.vector(seqnames(gr)),start(gr),end(gr),as.vector(strand(gr)) >>> ,MoreArgs=MoreArgs >>> ,SIMPLIFY=SIMPLIFY >>> ,USE.NAMES=FALSE #USE.NAMES >>> ) >>> if (!USE.NAMES) {} >>> else if (is.null(dim(ans))) {names(ans)<-names(gr)} >>> else if (SIMPLIFY=='array') {colnames(ans)<-names(gr)} >>> else rownames(ans)<-names(gr) >>> ans >>> } >>> >>> Finally, what I think we REALLY want is a version of the above which >>> currys a FUN (possibly `identity`) down to rle levels that can be >>> applied to a bigwig file without having to load the whole thing into >>> memory. >>> >>> In the mean time, I offer: >>> >>> applyAt.bw<-function(x,y,...) { >>> applyAtimport.bw(x,asRle=TRUE >>> ,selection=BigWigSelection(y) >>> ) >>> ,y >>> ,...) >>> } >> >> Interesting. It sounds to me like maybe one way to facilitate this >> (and also avoid the proliferation of apply-like functions) would be >> to finally bite the bullet and come up with a new kind of view objects >> where the views are *genomic* ranges instead of just ranges (like they >> are right now). Something that was already mentioned in this >> more-than-2-year-old discussion. We could start with RleListGViews >> and BigWigFileGViews objects, both concrete subclasses of virtual >> class GViews. >> >> Thanks for you feedback, >> H. >> >>> >>> ~Malcolm >>> >>> >>> >-----Original Message----- >>> >From: bioconductor-bounces at r-project.org >>> [mailto:bioconductor-bounces at r-project.org] On Behalf Of Janet Young >>> >Sent: Wednesday, December 04, 2013 12:59 PM >>> >To: Hervé Pagès >>> >Cc: Michael Lawrence; bioconductor at r-project.org >>> >Subject: Re: [BioC] subset a RleList using GRanges object? >>> > >>> >Thanks, all. Yes, that coercion will do it for me: z[as(myRange, >>> "RangesList")]. It's always nice for the end user when you take care of >>> >those kinds of things behind the scenes for us - sounds like what >>> you're considering implementing might take care of it. >>> > >>> >Robert: thanks for the ideas! But my needs are a little different, >>> though - I'm keeping the individual values in the region I'm interested >>> >in (I'm plotting RNA-seq coverage on just a single gene, but my >>> coverage object is for the whole genome). >>> > >>> >Janet >>> > >>> > >>> > >>> >On Dec 4, 2013, at 10:09 AM, Hervé Pagès wrote: >>> > >>> >> Hi Michael, >>> >> >>> >> On 12/03/2013 07:13 PM, Michael Lawrence wrote: >>> >>> For now, just coerce to a RangesList. >>> >>> >>> >>> z [ as(myRange, "RangesList") ] >>> >>> >>> >>> Herve might consider adding a [,List,GenomicRanges method... >>> kind of a >>> >>> logical leap though from seqnames to list element names. >>> >> >>> >> This is something I've been considering for a while. I think we >>> should >>> >> do it. The mapping between the seqnames of a GRanges and the >>> names of a >>> >> RangesList is well established at this point e.g. the coercion >>> (back and >>> >> forth) between the two already does that. >>> >> >>> >> Cheers, >>> >> H. >>> >> >>> >>> >>> >>> extractCoverageForPositions is different; it extracts an integer >>> vector >>> >>> given a vector of width-one positions. >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> On Tue, Dec 3, 2013 at 4:53 PM, Janet Young <jayoung at="" fhcrc.org=""> >>> wrote: >>> >>> >>> >>>> Hi there, >>> >>>> >>> >>>> I'm playing around with coverage data generated outside of R, >>> planning to >>> >>>> plot RNA-seq coverage for some genes we're interested in. >>> >>>> >>> >>>> I have a request - it'd be nice from my point of view if it >>> were possible >>> >>>> to look at a small region an RleList (or CompressedRleList) >>> using a GRanges >>> >>>> object to focus on that smaller region. Looks like I can >>> subset a single >>> >>>> Rle object with an IRanges object, but I wonder if this nice >>> feature could >>> >>>> be extended to GRanges objects? >>> >>>> >>> >>>> Some code is below to show what I'm trying to do. >>> >>>> >>> >>>> thanks veruy much, >>> >>>> >>> >>>> Janet >>> >>>> >>> >>>> >>> >>>> >>> >>>> library(GenomicRanges) >>> >>>> >>> >>>> ## an example RleList >>> >>>> x <- Rle(10:1, 1:10) >>> >>>> y <- Rle(10:1, 1:10) >>> >>>> z <- RleList( chr1=x, chr2=y) >>> >>>> >>> >>>> ## an example GRanges object >>> >>>> myRange <- GRanges( seqnames="chr1", >>> ranges=IRanges(start=10,end=15) ) >>> >>>> >>> >>>> ## subsetting an Rle using an IRanges object works, as expected: >>> >>>> z[["chr1"]] [ ranges(myRange) ] >>> >>>> >>> >>>> ## but subsetting an RleList by GRanges object doesn't work >>> >>>> z [ myRange ] >>> >>>> # Error in normalizeSingleBracketSubscript(i, x) : invalid >>> subscript type >>> >>>> >>> >>>> sessionInfo() >>> >>>> >>> >>>> R Under development (unstable) (2013-11-06 r64163) >>> >>>> Platform: x86_64-unknown-linux-gnu (64-bit) >>> >>>> >>> >>>> locale: >>> >>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>> >>>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>> >>>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >>> >>>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >>> >>>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>> >>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>> >>>> >>> >>>> attached base packages: >>> >>>> [1] parallel stats graphics grDevices utils datasets >>> methods >>> >>>> [8] base >>> >>>> >>> >>>> other attached packages: >>> >>>> [1] GenomicRanges_1.15.10 XVector_0.3.2 IRanges_1.21.13 >>> >>>> [4] BiocGenerics_0.9.1 >>> >>>> >>> >>>> loaded via a namespace (and not attached): >>> >>>> [1] stats4_3.1.0 >>> >>>> >>> >>>> >>> >>>> >>> >>>> >>> >>>> >>> ------------------------------------------------------------------- >>> >>>> >>> >>>> Dr. Janet Young >>> >>>> >>> >>>> Malik lab >>> >>>> http://research.fhcrc.org/malik/en.html >>> >>>> >>> >>>> Fred Hutchinson Cancer Research Center >>> >>>> 1100 Fairview Avenue N., A2-025, >>> >>>> P.O. Box 19024, Seattle, WA 98109-1024, USA. >>> >>>> >>> >>>> tel: (206) 667 4512 >>> >>>> email: jayoung ...at... fhcrc.org >>> >>>> >>> >>>> >>> ------------------------------------------------------------------- >>> >>>> >>> >>>> _______________________________________________ >>> >>>> Bioconductor mailing list >>> >>>> Bioconductor at r-project.org >>> >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> >>>> Search the archives: >>> >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>>> >>> >>> >>> >>> [[alternative HTML version deleted]] >>> >>> >>> >>> _______________________________________________ >>> >>> Bioconductor mailing list >>> >>> Bioconductor at r-project.org >>> >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> >>> >> >>> >> -- >>> >> Hervé Pagès >>> >> >>> >> Program in Computational Biology >>> >> Division of Public Health Sciences >>> >> Fred Hutchinson Cancer Research Center >>> >> 1100 Fairview Ave. N, M1-B514 >>> >> P.O. Box 19024 >>> >> Seattle, WA 98109-1024 >>> >> >>> >> E-mail: hpages at fhcrc.org >>> >> Phone: (206) 667-5791 >>> >> Fax: (206) 667-1319 >>> > >>> >_______________________________________________ >>> >Bioconductor mailing list >>> >Bioconductor at r-project.org >>> >https://stat.ethz.ch/mailman/listinfo/bioconductor >>> >Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> > >-- >Hervé Pagès > >Program in Computational Biology >Division of Public Health Sciences >Fred Hutchinson Cancer Research Center >1100 Fairview Ave. N, M1-B514 >P.O. Box 19024 >Seattle, WA 98109-1024 > >E-mail: hpages at fhcrc.org >Phone: (206) 667-5791 >Fax: (206) 667-1319
ADD REPLYlink written 3.9 years ago by Malcolm Cook1.4k
0
gravatar for Hervé Pagès
4.0 years ago by
Hervé Pagès ♦♦ 13k
United States
Hervé Pagès ♦♦ 13k wrote:
Hi Janet, On 12/03/2013 04:53 PM, Janet Young wrote: > Hi there, > > I'm playing around with coverage data generated outside of R, planning to plot RNA-seq coverage for some genes we're interested in. > > I have a request - it'd be nice from my point of view if it were possible to look at a small region an RleList (or CompressedRleList) using a GRanges object to focus on that smaller region. Looks like I can subset a single Rle object with an IRanges object, but I wonder if this nice feature could be extended to GRanges objects? > > Some code is below to show what I'm trying to do. > > thanks veruy much, > > Janet > > > > library(GenomicRanges) > > ## an example RleList > x <- Rle(10:1, 1:10) > y <- Rle(10:1, 1:10) > z <- RleList( chr1=x, chr2=y) > > ## an example GRanges object > myRange <- GRanges( seqnames="chr1", ranges=IRanges(start=10,end=15) ) > > ## subsetting an Rle using an IRanges object works, as expected: > z[["chr1"]] [ ranges(myRange) ] > > ## but subsetting an RleList by GRanges object doesn't work > z [ myRange ] > # Error in normalizeSingleBracketSubscript(i, x) : invalid subscript type Subsetting by a GRanges object is not supported (yet) but subsetting by a RangesList object is: > idx <- IRangesList(IRanges(10, 15), IRanges(2, 6)) > z[idx] RleList of length 2 $chr1 integer-Rle of length 6 with 2 runs Lengths: 1 5 Values : 7 6 $chr2 integer-Rle of length 5 with 2 runs Lengths: 2 3 Values : 9 8 So if you turn your GRanges object into a RangesList object, that will make the trick: > z[as(myRange, "RangesList")] RleList of length 1 $chr1 integer-Rle of length 6 with 2 runs Lengths: 1 5 Values : 7 6 HTH, H. > > sessionInfo() > > R Under development (unstable) (2013-11-06 r64163) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] GenomicRanges_1.15.10 XVector_0.3.2 IRanges_1.21.13 > [4] BiocGenerics_0.9.1 > > loaded via a namespace (and not attached): > [1] stats4_3.1.0 > > > > > ------------------------------------------------------------------- > > Dr. Janet Young > > Malik lab > http://research.fhcrc.org/malik/en.html > > Fred Hutchinson Cancer Research Center > 1100 Fairview Avenue N., A2-025, > P.O. Box 19024, Seattle, WA 98109-1024, USA. > > tel: (206) 667 4512 > email: jayoung ...at... fhcrc.org > > ------------------------------------------------------------------- > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD COMMENTlink written 4.0 years ago by Hervé Pagès ♦♦ 13k
0
gravatar for Robert Castelo
4.0 years ago by
Robert Castelo2.0k
Spain/Barcelona/Universitat Pompeu Fabra
Robert Castelo2.0k wrote:
hi Janet, in october i asked to the list for a similar question and among other things posted in https://stat.ethz.ch/pipermail/bioconductor/2013-October/055618.html i got the following hint from Michael Lawrence that may be relevant to your goal: "If you want to query the RleList for single positions (in a GRanges) and get the result as an atomic vector, it's most efficient to use VariantTools::extractCoverageForPositions. That function should probably live somewhere more general but anyway, it's a fast path for single position RleList=>vector extraction." that particular function, extractCoverageForPositions, was not completely solving my needs and i wrote yet another one which also takes an RleList and a GRanges object as input. i paste it here and maybe merging somehow the two of them you can satify what you need (watch out line-wrapping by email software): rleGetValues <- function(rlelst, gr, summaryFun="mean", coercionFun="as.numeric") { summaryFun <- match.fun(summaryFun) coercionFun <- match.fun(coercionFun) seqlevels(gr) <- names(rlelst) ord <- order(seqnames(gr)) ans <- numeric(length(gr)) startbyseq <- split(start(gr), seqnames(gr), drop=TRUE) ans[ord] <- unlist(lapply(rlelst[startbyseq], coercionFun), use.names=FALSE) whregions <- which(width(gr) > 1) if (length(whregions) > 0) { ## regions comprising more than one position are summarized by summaryFun() startbyseq <- split(start(gr)[whregions], seqnames(gr)[whregions], drop=TRUE) widthbyseq <- split(width(gr)[whregions], seqnames(gr)[whregions], drop=TRUE) ans[whregions] <- unlist(mapply(function(r, p, w) sapply(seq_along(p), function(i, r, p, w) summaryFun(coercionFun(r[p[i]:(p[i]+w[i]-1)])), r, p, w), rlelst[names(startbyseq)], startbyseq, widthbyseq, SIMPLIFY=FALSE), use.names=FALSE) } ans } with your example it works as follows: library(GenomicRanges) x <- Rle(10:1, 1:10) y <- Rle(10:1, 1:10) z <- RleList( chr1=x, chr2=y) ## an example GRanges object myRange <- GRanges( seqnames="chr1", ranges=IRanges(start=10,end=15) ) ## subsetting an Rle using an IRanges object works, as expected: z[["chr1"]] [ ranges(myRange) ] integer-Rle of length 6 with 2 runs Lengths: 1 5 Values : 7 6 mean(z[["chr1"]] [ ranges(myRange) ]) [1] 6.166667 ## now using the function 'rleGetValues()' rleGetValues(z, myRange) [1] 6.166667 cheers, robert. On 12/4/13 1:53 AM, Janet Young wrote: > Hi there, > > I'm playing around with coverage data generated outside of R, planning to plot RNA-seq coverage for some genes we're interested in. > > I have a request - it'd be nice from my point of view if it were possible to look at a small region an RleList (or CompressedRleList) using a GRanges object to focus on that smaller region. Looks like I can subset a single Rle object with an IRanges object, but I wonder if this nice feature could be extended to GRanges objects? > > Some code is below to show what I'm trying to do. > > thanks veruy much, > > Janet > > > > library(GenomicRanges) > > ## an example RleList > x <- Rle(10:1, 1:10) > y <- Rle(10:1, 1:10) > z <- RleList( chr1=x, chr2=y) > > ## an example GRanges object > myRange <- GRanges( seqnames="chr1", ranges=IRanges(start=10,end=15) ) > > ## subsetting an Rle using an IRanges object works, as expected: > z[["chr1"]] [ ranges(myRange) ] > > ## but subsetting an RleList by GRanges object doesn't work > z [ myRange ] > # Error in normalizeSingleBracketSubscript(i, x) : invalid subscript type > > sessionInfo() > > R Under development (unstable) (2013-11-06 r64163) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] GenomicRanges_1.15.10 XVector_0.3.2 IRanges_1.21.13 > [4] BiocGenerics_0.9.1 > > loaded via a namespace (and not attached): > [1] stats4_3.1.0 > > > > > ------------------------------------------------------------------- > > Dr. Janet Young > > Malik lab > http://research.fhcrc.org/malik/en.html > > Fred Hutchinson Cancer Research Center > 1100 Fairview Avenue N., A2-025, > P.O. Box 19024, Seattle, WA 98109-1024, USA. > > tel: (206) 667 4512 > email: jayoung ...at... fhcrc.org > > ------------------------------------------------------------------- > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]
ADD COMMENTlink written 4.0 years ago by Robert Castelo2.0k
Hey Robert, My extractCoverageForPositions is a highly optimized extractor for width-one positions. To summarize general ranges, just use RleViews. It will be more efficient and simpler than your solution. Basically comes down to this (might need some house-keeping on the seqlevels): views <- RleViews(rlelst, as(gr, "RangesList")) viewMeans(views) Michael On Tue, Dec 3, 2013 at 7:09 PM, Robert Castelo <robert.castelo@upf.edu>wrote: > hi Janet, > > in october i asked to the list for a similar question and among other > things posted in > > https://stat.ethz.ch/pipermail/bioconductor/2013-October/055618.html > > i got the following hint from Michael Lawrence that may be relevant to > your goal: > > "If you want to query the RleList for single positions (in a GRanges) and > get the result as an atomic vector, it's most efficient to use > VariantTools::extractCoverageForPositions. That function should probably > live somewhere more general but anyway, it's a fast path for single > position RleList=>vector extraction." > > that particular function, extractCoverageForPositions, was not > completely solving my needs and i wrote yet another one which also takes > an RleList and a GRanges object as input. i paste it here and maybe > merging somehow the two of them you can satify what you need (watch out > line-wrapping by email software): > > rleGetValues <- function(rlelst, gr, summaryFun="mean", > coercionFun="as.numeric") { > summaryFun <- match.fun(summaryFun) > coercionFun <- match.fun(coercionFun) > seqlevels(gr) <- names(rlelst) > ord <- order(seqnames(gr)) > ans <- numeric(length(gr)) > startbyseq <- split(start(gr), seqnames(gr), drop=TRUE) > ans[ord] <- unlist(lapply(rlelst[startbyseq], coercionFun), > use.names=FALSE) > whregions <- which(width(gr) > 1) > if (length(whregions) > 0) { ## regions comprising more than one > position are summarized by summaryFun() > startbyseq <- split(start(gr)[whregions], seqnames(gr)[whregions], > drop=TRUE) > widthbyseq <- split(width(gr)[whregions], seqnames(gr)[whregions], > drop=TRUE) > ans[whregions] <- unlist(mapply(function(r, p, w) > sapply(seq_along(p), > function(i, r, p, w) > summaryFun(coercionFun(r[p[i]:(p[i]+w[i]-1)])), > r, p, w), > rlelst[names(startbyseq)], > startbyseq, widthbyseq, SIMPLIFY=FALSE), > use.names=FALSE) > } > ans > } > > with your example it works as follows: > > library(GenomicRanges) > > x <- Rle(10:1, 1:10) > y <- Rle(10:1, 1:10) > z <- RleList( chr1=x, chr2=y) > > ## an example GRanges object > myRange <- GRanges( seqnames="chr1", ranges=IRanges(start=10,end=15) ) > > ## subsetting an Rle using an IRanges object works, as expected: > z[["chr1"]] [ ranges(myRange) ] > integer-Rle of length 6 with 2 runs > Lengths: 1 5 > Values : 7 6 > mean(z[["chr1"]] [ ranges(myRange) ]) > [1] 6.166667 > > ## now using the function 'rleGetValues()' > rleGetValues(z, myRange) > [1] 6.166667 > > cheers, > > robert. > > On 12/4/13 1:53 AM, Janet Young wrote: > > Hi there, > > > > I'm playing around with coverage data generated outside of R, planning > to plot RNA-seq coverage for some genes we're interested in. > > > > I have a request - it'd be nice from my point of view if it were > possible to look at a small region an RleList (or CompressedRleList) using > a GRanges object to focus on that smaller region. Looks like I can subset > a single Rle object with an IRanges object, but I wonder if this nice > feature could be extended to GRanges objects? > > > > Some code is below to show what I'm trying to do. > > > > thanks veruy much, > > > > Janet > > > > > > > > library(GenomicRanges) > > > > ## an example RleList > > x <- Rle(10:1, 1:10) > > y <- Rle(10:1, 1:10) > > z <- RleList( chr1=x, chr2=y) > > > > ## an example GRanges object > > myRange <- GRanges( seqnames="chr1", ranges=IRanges(start=10,end=15) ) > > > > ## subsetting an Rle using an IRanges object works, as expected: > > z[["chr1"]] [ ranges(myRange) ] > > > > ## but subsetting an RleList by GRanges object doesn't work > > z [ myRange ] > > # Error in normalizeSingleBracketSubscript(i, x) : invalid subscript type > > > > sessionInfo() > > > > R Under development (unstable) (2013-11-06 r64163) > > Platform: x86_64-unknown-linux-gnu (64-bit) > > > > locale: > > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > > [9] LC_ADDRESS=C LC_TELEPHONE=C > > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > > > attached base packages: > > [1] parallel stats graphics grDevices utils datasets methods > > [8] base > > > > other attached packages: > > [1] GenomicRanges_1.15.10 XVector_0.3.2 IRanges_1.21.13 > > [4] BiocGenerics_0.9.1 > > > > loaded via a namespace (and not attached): > > [1] stats4_3.1.0 > > > > > > > > > > ------------------------------------------------------------------- > > > > Dr. Janet Young > > > > Malik lab > > http://research.fhcrc.org/malik/en.html > > > > Fred Hutchinson Cancer Research Center > > 1100 Fairview Avenue N., A2-025, > > P.O. Box 19024, Seattle, WA 98109-1024, USA. > > > > tel: (206) 667 4512 > > email: jayoung ...at... fhcrc.org > > > > ------------------------------------------------------------------- > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLYlink written 4.0 years ago by Michael Lawrence9.8k
hi Michael, thanks for the pointer, however, i need to coerce the Rle values to numeric before taking the mean since, in my case, they are 'raw' in order to reduce the memory footprint, and i have not been able to find the way to do this through the RleViews route. again, any hint will be very much appreciated!! robert. On 12/4/13 4:21 AM, Michael Lawrence wrote: > Hey Robert, > > My extractCoverageForPositions is a highly optimized extractor for > width-one positions. To summarize general ranges, just use RleViews. > It will be more efficient and simpler than your solution. > > Basically comes down to this (might need some house-keeping on the > seqlevels): > > views <- RleViews(rlelst, as(gr, "RangesList")) > viewMeans(views) > > Michael > > On Tue, Dec 3, 2013 at 7:09 PM, Robert Castelo <robert.castelo@upf.edu> <mailto:robert.castelo@upf.edu>> wrote: > > hi Janet, > > in october i asked to the list for a similar question and among other > things posted in > > https://stat.ethz.ch/pipermail/bioconductor/2013-October/055618.html > > i got the following hint from Michael Lawrence that may be relevant to > your goal: > > "If you want to query the RleList for single positions (in a > GRanges) and > get the result as an atomic vector, it's most efficient to use > VariantTools::extractCoverageForPositions. That function should > probably > live somewhere more general but anyway, it's a fast path for single > position RleList=>vector extraction." > > that particular function, extractCoverageForPositions, was not > completely solving my needs and i wrote yet another one which also > takes > an RleList and a GRanges object as input. i paste it here and maybe > merging somehow the two of them you can satify what you need > (watch out > line-wrapping by email software): > > rleGetValues <- function(rlelst, gr, summaryFun="mean", > coercionFun="as.numeric") { > summaryFun <- match.fun(summaryFun) > coercionFun <- match.fun(coercionFun) > seqlevels(gr) <- names(rlelst) > ord <- order(seqnames(gr)) > ans <- numeric(length(gr)) > startbyseq <- split(start(gr), seqnames(gr), drop=TRUE) > ans[ord] <- unlist(lapply(rlelst[startbyseq], coercionFun), > use.names=FALSE) > whregions <- which(width(gr) > 1) > if (length(whregions) > 0) { ## regions comprising more than one > position are summarized by summaryFun() > startbyseq <- split(start(gr)[whregions], > seqnames(gr)[whregions], > drop=TRUE) > widthbyseq <- split(width(gr)[whregions], > seqnames(gr)[whregions], > drop=TRUE) > ans[whregions] <- unlist(mapply(function(r, p, w) > sapply(seq_along(p), > function(i, r, p, w) > summaryFun(coercionFun(r[p[i]:(p[i]+w[i]-1)])), > r, p, w), > rlelst[names(startbyseq)], > startbyseq, widthbyseq, SIMPLIFY=FALSE), > use.names=FALSE) > } > ans > } > > with your example it works as follows: > > library(GenomicRanges) > > x <- Rle(10:1, 1:10) > y <- Rle(10:1, 1:10) > z <- RleList( chr1=x, chr2=y) > > ## an example GRanges object > myRange <- GRanges( seqnames="chr1", ranges=IRanges(start=10,end=15) ) > > ## subsetting an Rle using an IRanges object works, as expected: > z[["chr1"]] [ ranges(myRange) ] > integer-Rle of length 6 with 2 runs > Lengths: 1 5 > Values : 7 6 > mean(z[["chr1"]] [ ranges(myRange) ]) > [1] 6.166667 > > ## now using the function 'rleGetValues()' > rleGetValues(z, myRange) > [1] 6.166667 > > cheers, > > robert. > > On 12/4/13 1:53 AM, Janet Young wrote: > > Hi there, > > > > I'm playing around with coverage data generated outside of R, > planning to plot RNA-seq coverage for some genes we're interested in. > > > > I have a request - it'd be nice from my point of view if it were > possible to look at a small region an RleList (or > CompressedRleList) using a GRanges object to focus on that smaller > region. Looks like I can subset a single Rle object with an > IRanges object, but I wonder if this nice feature could be > extended to GRanges objects? > > > > Some code is below to show what I'm trying to do. > > > > thanks veruy much, > > > > Janet > > > > > > > > library(GenomicRanges) > > > > ## an example RleList > > x <- Rle(10:1, 1:10) > > y <- Rle(10:1, 1:10) > > z <- RleList( chr1=x, chr2=y) > > > > ## an example GRanges object > > myRange <- GRanges( seqnames="chr1", > ranges=IRanges(start=10,end=15) ) > > > > ## subsetting an Rle using an IRanges object works, as expected: > > z[["chr1"]] [ ranges(myRange) ] > > > > ## but subsetting an RleList by GRanges object doesn't work > > z [ myRange ] > > # Error in normalizeSingleBracketSubscript(i, x) : invalid > subscript type > > > > sessionInfo() > > > > R Under development (unstable) (2013-11-06 r64163) > > Platform: x86_64-unknown-linux-gnu (64-bit) > > > > locale: > > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > > [9] LC_ADDRESS=C LC_TELEPHONE=C > > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > > > attached base packages: > > [1] parallel stats graphics grDevices utils datasets > methods > > [8] base > > > > other attached packages: > > [1] GenomicRanges_1.15.10 XVector_0.3.2 IRanges_1.21.13 > > [4] BiocGenerics_0.9.1 > > > > loaded via a namespace (and not attached): > > [1] stats4_3.1.0 > > > > > > > > > > ------------------------------------------------------------------- > > > > Dr. Janet Young > > > > Malik lab > > http://research.fhcrc.org/malik/en.html > > > > Fred Hutchinson Cancer Research Center > > 1100 Fairview Avenue N., A2-025, > > P.O. Box 19024, Seattle, WA 98109-1024, USA. > > > > tel: (206) 667 4512 <tel:%28206%29%20667%204512> > > email: jayoung ...at... fhcrc.org <http: fhcrc.org=""> > > > > ------------------------------------------------------------------- > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org <mailto:bioconductor@r-project.org> > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org <mailto:bioconductor@r-project.org> > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > [[alternative HTML version deleted]]
ADD REPLYlink written 4.0 years ago by Robert Castelo2.0k
To coerce, starting from an RleList (ideally compressed), just rle <- unlist(rlelst) runValue(rle) <- as.numeric(runValue(rle)) # or as(rle, "numericRle") with VariantAnnotation rlelst <- relist(rle, rlelst) I'll add runValue<-,RleList,List at some point to turn that into a one-liner, like: runValue(rlelst) <- as(runValue(rlelst), "NumericList") On Tue, Dec 3, 2013 at 7:35 PM, Robert Castelo <robert.castelo@upf.edu>wrote: > hi Michael, > > thanks for the pointer, however, i need to coerce the Rle values to > numeric before taking the mean since, in my case, they are 'raw' in order > to reduce the memory footprint, and i have not been able to find the way to > do this through the RleViews route. again, any hint will be very much > appreciated!! > > robert. > > > On 12/4/13 4:21 AM, Michael Lawrence wrote: > > Hey Robert, > > My extractCoverageForPositions is a highly optimized extractor for > width-one positions. To summarize general ranges, just use RleViews. It > will be more efficient and simpler than your solution. > > Basically comes down to this (might need some house-keeping on the > seqlevels): > > views <- RleViews(rlelst, as(gr, "RangesList")) > viewMeans(views) > > Michael > > On Tue, Dec 3, 2013 at 7:09 PM, Robert Castelo <robert.castelo@upf.edu>wrote: > >> hi Janet, >> >> in october i asked to the list for a similar question and among other >> things posted in >> >> https://stat.ethz.ch/pipermail/bioconductor/2013-October/055618.html >> >> i got the following hint from Michael Lawrence that may be relevant to >> your goal: >> >> "If you want to query the RleList for single positions (in a GRanges) and >> get the result as an atomic vector, it's most efficient to use >> VariantTools::extractCoverageForPositions. That function should probably >> live somewhere more general but anyway, it's a fast path for single >> position RleList=>vector extraction." >> >> that particular function, extractCoverageForPositions, was not >> completely solving my needs and i wrote yet another one which also takes >> an RleList and a GRanges object as input. i paste it here and maybe >> merging somehow the two of them you can satify what you need (watch out >> line-wrapping by email software): >> >> rleGetValues <- function(rlelst, gr, summaryFun="mean", >> coercionFun="as.numeric") { >> summaryFun <- match.fun(summaryFun) >> coercionFun <- match.fun(coercionFun) >> seqlevels(gr) <- names(rlelst) >> ord <- order(seqnames(gr)) >> ans <- numeric(length(gr)) >> startbyseq <- split(start(gr), seqnames(gr), drop=TRUE) >> ans[ord] <- unlist(lapply(rlelst[startbyseq], coercionFun), >> use.names=FALSE) >> whregions <- which(width(gr) > 1) >> if (length(whregions) > 0) { ## regions comprising more than one >> position are summarized by summaryFun() >> startbyseq <- split(start(gr)[whregions], seqnames(gr)[whregions], >> drop=TRUE) >> widthbyseq <- split(width(gr)[whregions], seqnames(gr)[whregions], >> drop=TRUE) >> ans[whregions] <- unlist(mapply(function(r, p, w) >> sapply(seq_along(p), >> function(i, r, p, w) >> summaryFun(coercionFun(r[p[i]:(p[i]+w[i]-1)])), >> r, p, w), >> rlelst[names(startbyseq)], >> startbyseq, widthbyseq, SIMPLIFY=FALSE), >> use.names=FALSE) >> } >> ans >> } >> >> with your example it works as follows: >> >> library(GenomicRanges) >> >> x <- Rle(10:1, 1:10) >> y <- Rle(10:1, 1:10) >> z <- RleList( chr1=x, chr2=y) >> >> ## an example GRanges object >> myRange <- GRanges( seqnames="chr1", ranges=IRanges(start=10,end=15) ) >> >> ## subsetting an Rle using an IRanges object works, as expected: >> z[["chr1"]] [ ranges(myRange) ] >> integer-Rle of length 6 with 2 runs >> Lengths: 1 5 >> Values : 7 6 >> mean(z[["chr1"]] [ ranges(myRange) ]) >> [1] 6.166667 >> >> ## now using the function 'rleGetValues()' >> rleGetValues(z, myRange) >> [1] 6.166667 >> >> cheers, >> >> robert. >> >> On 12/4/13 1:53 AM, Janet Young wrote: >> > Hi there, >> > >> > I'm playing around with coverage data generated outside of R, planning >> to plot RNA-seq coverage for some genes we're interested in. >> > >> > I have a request - it'd be nice from my point of view if it were >> possible to look at a small region an RleList (or CompressedRleList) using >> a GRanges object to focus on that smaller region. Looks like I can subset >> a single Rle object with an IRanges object, but I wonder if this nice >> feature could be extended to GRanges objects? >> > >> > Some code is below to show what I'm trying to do. >> > >> > thanks veruy much, >> > >> > Janet >> > >> > >> > >> > library(GenomicRanges) >> > >> > ## an example RleList >> > x <- Rle(10:1, 1:10) >> > y <- Rle(10:1, 1:10) >> > z <- RleList( chr1=x, chr2=y) >> > >> > ## an example GRanges object >> > myRange <- GRanges( seqnames="chr1", ranges=IRanges(start=10,end=15) ) >> > >> > ## subsetting an Rle using an IRanges object works, as expected: >> > z[["chr1"]] [ ranges(myRange) ] >> > >> > ## but subsetting an RleList by GRanges object doesn't work >> > z [ myRange ] >> > # Error in normalizeSingleBracketSubscript(i, x) : invalid subscript >> type >> > >> > sessionInfo() >> > >> > R Under development (unstable) (2013-11-06 r64163) >> > Platform: x86_64-unknown-linux-gnu (64-bit) >> > >> > locale: >> > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >> > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >> > [9] LC_ADDRESS=C LC_TELEPHONE=C >> > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> > >> > attached base packages: >> > [1] parallel stats graphics grDevices utils datasets methods >> > [8] base >> > >> > other attached packages: >> > [1] GenomicRanges_1.15.10 XVector_0.3.2 IRanges_1.21.13 >> > [4] BiocGenerics_0.9.1 >> > >> > loaded via a namespace (and not attached): >> > [1] stats4_3.1.0 >> > >> > >> > >> > >> > ------------------------------------------------------------------- >> > >> > Dr. Janet Young >> > >> > Malik lab >> > http://research.fhcrc.org/malik/en.html >> > >> > Fred Hutchinson Cancer Research Center >> > 1100 Fairview Avenue N., A2-025, >> > P.O. Box 19024, Seattle, WA 98109-1024, USA. >> > >> > tel: (206) 667 4512 >> > email: jayoung ...at... fhcrc.org >> > >> > ------------------------------------------------------------------- >> > >> > _______________________________________________ >> > Bioconductor mailing list >> > Bioconductor@r-project.org >> > https://stat.ethz.ch/mailman/listinfo/bioconductor >> > Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > > [[alternative HTML version deleted]]
ADD REPLYlink written 4.0 years ago by Michael Lawrence9.8k
Sorry, you want unlist(rlelst, use.names=FALSE) for efficiency. On Tue, Dec 3, 2013 at 7:55 PM, Michael Lawrence <michafla@gene.com> wrote: > To coerce, starting from an RleList (ideally compressed), just > > rle <- unlist(rlelst) > runValue(rle) <- as.numeric(runValue(rle)) # or as(rle, "numericRle") with > VariantAnnotation > rlelst <- relist(rle, rlelst) > > I'll add runValue<-,RleList,List at some point to turn that into a > one-liner, like: > runValue(rlelst) <- as(runValue(rlelst), "NumericList") > > > > On Tue, Dec 3, 2013 at 7:35 PM, Robert Castelo <robert.castelo@upf.edu>wrote: > >> hi Michael, >> >> thanks for the pointer, however, i need to coerce the Rle values to >> numeric before taking the mean since, in my case, they are 'raw' in order >> to reduce the memory footprint, and i have not been able to find the way to >> do this through the RleViews route. again, any hint will be very much >> appreciated!! >> >> robert. >> >> >> On 12/4/13 4:21 AM, Michael Lawrence wrote: >> >> Hey Robert, >> >> My extractCoverageForPositions is a highly optimized extractor for >> width-one positions. To summarize general ranges, just use RleViews. It >> will be more efficient and simpler than your solution. >> >> Basically comes down to this (might need some house-keeping on the >> seqlevels): >> >> views <- RleViews(rlelst, as(gr, "RangesList")) >> viewMeans(views) >> >> Michael >> >> On Tue, Dec 3, 2013 at 7:09 PM, Robert Castelo <robert.castelo@upf.edu>wrote: >> >>> hi Janet, >>> >>> in october i asked to the list for a similar question and among other >>> things posted in >>> >>> https://stat.ethz.ch/pipermail/bioconductor/2013-October/055618.html >>> >>> i got the following hint from Michael Lawrence that may be relevant to >>> your goal: >>> >>> "If you want to query the RleList for single positions (in a GRanges) and >>> get the result as an atomic vector, it's most efficient to use >>> VariantTools::extractCoverageForPositions. That function should probably >>> live somewhere more general but anyway, it's a fast path for single >>> position RleList=>vector extraction." >>> >>> that particular function, extractCoverageForPositions, was not >>> completely solving my needs and i wrote yet another one which also takes >>> an RleList and a GRanges object as input. i paste it here and maybe >>> merging somehow the two of them you can satify what you need (watch out >>> line-wrapping by email software): >>> >>> rleGetValues <- function(rlelst, gr, summaryFun="mean", >>> coercionFun="as.numeric") { >>> summaryFun <- match.fun(summaryFun) >>> coercionFun <- match.fun(coercionFun) >>> seqlevels(gr) <- names(rlelst) >>> ord <- order(seqnames(gr)) >>> ans <- numeric(length(gr)) >>> startbyseq <- split(start(gr), seqnames(gr), drop=TRUE) >>> ans[ord] <- unlist(lapply(rlelst[startbyseq], coercionFun), >>> use.names=FALSE) >>> whregions <- which(width(gr) > 1) >>> if (length(whregions) > 0) { ## regions comprising more than one >>> position are summarized by summaryFun() >>> startbyseq <- split(start(gr)[whregions], seqnames(gr)[whregions], >>> drop=TRUE) >>> widthbyseq <- split(width(gr)[whregions], seqnames(gr)[whregions], >>> drop=TRUE) >>> ans[whregions] <- unlist(mapply(function(r, p, w) >>> sapply(seq_along(p), >>> function(i, r, p, w) >>> summaryFun(coercionFun(r[p[i]:(p[i]+w[i]-1)])), >>> r, p, w), >>> rlelst[names(startbyseq)], >>> startbyseq, widthbyseq, SIMPLIFY=FALSE), >>> use.names=FALSE) >>> } >>> ans >>> } >>> >>> with your example it works as follows: >>> >>> library(GenomicRanges) >>> >>> x <- Rle(10:1, 1:10) >>> y <- Rle(10:1, 1:10) >>> z <- RleList( chr1=x, chr2=y) >>> >>> ## an example GRanges object >>> myRange <- GRanges( seqnames="chr1", ranges=IRanges(start=10,end=15) ) >>> >>> ## subsetting an Rle using an IRanges object works, as expected: >>> z[["chr1"]] [ ranges(myRange) ] >>> integer-Rle of length 6 with 2 runs >>> Lengths: 1 5 >>> Values : 7 6 >>> mean(z[["chr1"]] [ ranges(myRange) ]) >>> [1] 6.166667 >>> >>> ## now using the function 'rleGetValues()' >>> rleGetValues(z, myRange) >>> [1] 6.166667 >>> >>> cheers, >>> >>> robert. >>> >>> On 12/4/13 1:53 AM, Janet Young wrote: >>> > Hi there, >>> > >>> > I'm playing around with coverage data generated outside of R, planning >>> to plot RNA-seq coverage for some genes we're interested in. >>> > >>> > I have a request - it'd be nice from my point of view if it were >>> possible to look at a small region an RleList (or CompressedRleList) using >>> a GRanges object to focus on that smaller region. Looks like I can subset >>> a single Rle object with an IRanges object, but I wonder if this nice >>> feature could be extended to GRanges objects? >>> > >>> > Some code is below to show what I'm trying to do. >>> > >>> > thanks veruy much, >>> > >>> > Janet >>> > >>> > >>> > >>> > library(GenomicRanges) >>> > >>> > ## an example RleList >>> > x <- Rle(10:1, 1:10) >>> > y <- Rle(10:1, 1:10) >>> > z <- RleList( chr1=x, chr2=y) >>> > >>> > ## an example GRanges object >>> > myRange <- GRanges( seqnames="chr1", ranges=IRanges(start=10,end=15) ) >>> > >>> > ## subsetting an Rle using an IRanges object works, as expected: >>> > z[["chr1"]] [ ranges(myRange) ] >>> > >>> > ## but subsetting an RleList by GRanges object doesn't work >>> > z [ myRange ] >>> > # Error in normalizeSingleBracketSubscript(i, x) : invalid subscript >>> type >>> > >>> > sessionInfo() >>> > >>> > R Under development (unstable) (2013-11-06 r64163) >>> > Platform: x86_64-unknown-linux-gnu (64-bit) >>> > >>> > locale: >>> > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>> > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>> > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >>> > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >>> > [9] LC_ADDRESS=C LC_TELEPHONE=C >>> > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>> > >>> > attached base packages: >>> > [1] parallel stats graphics grDevices utils datasets methods >>> > [8] base >>> > >>> > other attached packages: >>> > [1] GenomicRanges_1.15.10 XVector_0.3.2 IRanges_1.21.13 >>> > [4] BiocGenerics_0.9.1 >>> > >>> > loaded via a namespace (and not attached): >>> > [1] stats4_3.1.0 >>> > >>> > >>> > >>> > >>> > ------------------------------------------------------------------- >>> > >>> > Dr. Janet Young >>> > >>> > Malik lab >>> > http://research.fhcrc.org/malik/en.html >>> > >>> > Fred Hutchinson Cancer Research Center >>> > 1100 Fairview Avenue N., A2-025, >>> > P.O. Box 19024, Seattle, WA 98109-1024, USA. >>> > >>> > tel: (206) 667 4512 >>> > email: jayoung ...at... fhcrc.org >>> > >>> > ------------------------------------------------------------------- >>> > >>> > _______________________________________________ >>> > Bioconductor mailing list >>> > Bioconductor@r-project.org >>> > https://stat.ethz.ch/mailman/listinfo/bioconductor >>> > Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor@r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> >> > [[alternative HTML version deleted]]
ADD REPLYlink written 4.0 years ago by Michael Lawrence9.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 152 users visited in the last hour