views on Rle using GRanges object

0

Entering edit mode

Janet Young ▴ 740

@janet-young-2360

Last seen 4.5 years ago

Fred Hutchinson Cancer Research Center,…

Hi again, I have another question, in the "I think I need a convoluted workaround, but maybe I'm missing a simple solution" genre (seems like most of my questions are like that). I have an RleList object representing mapability for the whole human genome. I'd like to: (a) calculate viewMeans for various regions of interest that I've been representing as GRanges, and (b) I'd like the underlying code to be smart and match chromosome names up in the RleList and the GRanges object (not rely on chromosomes being ordered the same in the two objects), and (c) I'd like to return the viewMeans results in the same order as the objects in my original GRanges I don't think this is currently implemented without doing several coercions (that introduce their own complications) but I'm not sure. Some code is below that shows what I'm trying to do. It seems like it might be a common kind of way to use viewMeans - I imagine people are gradually switching to use GRanges more and more? but really I don't know. My real world question is that I've read in mapability from a UCSC bigWig file, and made that into an RleList. I have a bunch of other regions I've read in (again from UCSC) using rtracklayer as GRanges, and have annotated those with various things I'm interested in (e.g. number of RNAseq reads in the region). I want to add the average mapability for each region, so that I can later look at how mapability affects those other things I'm annotating. Should I be being more strict with myself about how my GRanges are ordered and making sure they all have the same seqlevels and seqlengths? Perhaps that would help the coercion workarounds go more smoothly. thanks, Janet library(GenomicRanges) ### make an RleList. Just for this example, I starting with a GRanges object and used coverage to get an RleList example. To get my real data I read in a UCSC bigWig file. fakeRegions <- GRanges(seqnames=c("chrA","chrA", "chrB", "chrC"), ranges=IRanges(start=c(1,51,1,1), end=c(60,90,20,10) ) ) seqlengths(fakeRegions) <- c(100,100,100) myRleList <- coverage(fakeRegions) rm(fakeRegions) myRleList # SimpleRleList of length 3 # $chrA # 'integer' Rle of length 100 with 4 runs # Lengths: 50 10 30 10 # Values : 1 2 1 0 # # $chrB # 'integer' Rle of length 100 with 2 runs # Lengths: 20 80 # Values : 1 0 # # $chrC # 'integer' Rle of length 100 with 2 runs # Lengths: 10 90 # Values : 1 0 ### make some regions of interest myRegions <- GRanges(seqnames=c("chrB", "chrC", "chrB"), ranges=IRanges(start=c(1,1,5), end=c(20,20,10) ), geneNames=c("g1","g2","g3") ) myRegions # GRanges with 3 ranges and 1 elementMetadata value # seqnames ranges strand | geneNames # <rle> <iranges> <rle> | <character> # [1] chrB [1, 20] * | g1 # [2] chrC [1, 20] * | g2 # [3] chrB [5, 10] * | g3 # # seqlengths # chrB chrC # NA NA ## can't use GRanges directly Views( myRleList, myRegions) # Error in RleViewsList(rleList = subject, rangesList = start) : # 'rangesList' must be a RangesList object ## can't use a simple coercion Views( myRleList, as(myRegions,"RangesList") ) # Error in .Method(..., FUN = FUN, MoreArgs = MoreArgs, SIMPLIFY = SIMPLIFY, : # all object lengths must be multiple of longest object length ### although I can use that coercion if I first give the GRanges object the same levels as the RleList to force the lists to have the same names as each other: seqlevels(myRegions) <- names(myRleList) viewMeans(Views( myRleList, as(myRegions,"RangesList") ) ) # SimpleNumericList of length 3 # [["chrA"]] numeric(0) # [["chrB"]] 1 1 # [["chrC"]] 0.5 ### getting close to a solution - but I'd have liked to have been able to unlist this and directly add to my GRanges object e.g. values(myRegions)$meanCov <- unlist(viewMeans(Views( myRleList, as(myRegions,"RangesList") ) ), use.names=FALSE) ### but the regions are in a different order to how I started, so the above command would associate the wrong scores with the wrong regions. sessionInfo() R version 2.13.1 (2011-07-08) Platform: i386-apple-darwin9.8.0/i386 (32-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] GenomicRanges_1.4.6 IRanges_1.10.5

RNASeq Coverage GO rtracklayer RNASeq Coverage GO rtracklayer • 3.4k views

ADD COMMENT • link updated 12.7 years ago by Hervé Pagès 16k • written 12.7 years ago by Janet Young ▴ 740

0

Entering edit mode

Michael Lawrence ★ 11k

@michael-lawrence-3846

Last seen 2.4 years ago

United States

On Mon, Aug 15, 2011 at 5:25 PM, Janet Young <jayoung@fhcrc.org> wrote: > Hi again, > > I have another question, in the "I think I need a convoluted workaround, > but maybe I'm missing a simple solution" genre (seems like most of my > questions are like that). > > I have an RleList object representing mapability for the whole human > genome. I'd like to: > (a) calculate viewMeans for various regions of interest that I've been > representing as GRanges, and > (b) I'd like the underlying code to be smart and match chromosome names up > in the RleList and the GRanges object (not rely on chromosomes being ordered > the same in the two objects), and > (c) I'd like to return the viewMeans results in the same order as the > objects in my original GRanges > > I don't think this is currently implemented without doing several coercions > (that introduce their own complications) but I'm not sure. Some code is > below that shows what I'm trying to do. It seems like it might be a common > kind of way to use viewMeans - I imagine people are gradually switching to > use GRanges more and more? but really I don't know. > > My real world question is that I've read in mapability from a UCSC bigWig > file, and made that into an RleList. I have a bunch of other regions I've > read in (again from UCSC) using rtracklayer as GRanges, and have annotated > those with various things I'm interested in (e.g. number of RNAseq reads in > the region). I want to add the average mapability for each region, so that > I can later look at how mapability affects those other things I'm > annotating. > > Should I be being more strict with myself about how my GRanges are ordered > and making sure they all have the same seqlevels and seqlengths? Perhaps > that would help the coercion workarounds go more smoothly. > > thanks, > > Janet > > > library(GenomicRanges) > > ### make an RleList. Just for this example, I starting with a GRanges > object and used coverage to get an RleList example. To get my real data I > read in a UCSC bigWig file. > > fakeRegions <- GRanges(seqnames=c("chrA","chrA", "chrB", "chrC"), > ranges=IRanges(start=c(1,51,1,1), end=c(60,90,20,10) ) ) > seqlengths(fakeRegions) <- c(100,100,100) > myRleList <- coverage(fakeRegions) > rm(fakeRegions) > > myRleList > > # SimpleRleList of length 3 > # $chrA > # 'integer' Rle of length 100 with 4 runs > # Lengths: 50 10 30 10 > # Values : 1 2 1 0 > # > # $chrB > # 'integer' Rle of length 100 with 2 runs > # Lengths: 20 80 > # Values : 1 0 > # > # $chrC > # 'integer' Rle of length 100 with 2 runs > # Lengths: 10 90 > # Values : 1 0 > > ### make some regions of interest > myRegions <- GRanges(seqnames=c("chrB", "chrC", "chrB"), > ranges=IRanges(start=c(1,1,5), end=c(20,20,10) ), > geneNames=c("g1","g2","g3") ) > myRegions > # GRanges with 3 ranges and 1 elementMetadata value > # seqnames ranges strand | geneNames > # <rle> <iranges> <rle> | <character> > # [1] chrB [1, 20] * | g1 > # [2] chrC [1, 20] * | g2 > # [3] chrB [5, 10] * | g3 > # > # seqlengths > # chrB chrC > # NA NA > > ## can't use GRanges directly > Views( myRleList, myRegions) > # Error in RleViewsList(rleList = subject, rangesList = start) : > # 'rangesList' must be a RangesList object > > ## can't use a simple coercion > Views( myRleList, as(myRegions,"RangesList") ) > # Error in .Method(..., FUN = FUN, MoreArgs = MoreArgs, SIMPLIFY = > SIMPLIFY, : > # all object lengths must be multiple of longest object length > > ### although I can use that coercion if I first give the GRanges object the > same levels as the RleList to force the lists to have the same names as each > other: > seqlevels(myRegions) <- names(myRleList) > viewMeans(Views( myRleList, as(myRegions,"RangesList") ) ) > # SimpleNumericList of length 3 > # [["chrA"]] numeric(0) > # [["chrB"]] 1 1 > # [["chrC"]] 0.5 > > These two lines seem pretty concise to me. It makes sense to layer a RangesList on top of an RleList. Storing the result would of course be easier if you had sorted the GRanges by the seqlevels. Having a utility for that would be nice (orderBySeqlevels?). It would also be easy with RangedData, which enforces ordering by space. Michael > ### getting close to a solution - but I'd have liked to have been able to > unlist this and directly add to my GRanges object e.g. > values(myRegions)$meanCov <- unlist(viewMeans(Views( myRleList, > as(myRegions,"RangesList") ) ), use.names=FALSE) > ### but the regions are in a different order to how I started, so the above > command would associate the wrong scores with the wrong regions. > > sessionInfo() > > R version 2.13.1 (2011-07-08) > Platform: i386-apple-darwin9.8.0/i386 (32-bit) > > locale: > [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] GenomicRanges_1.4.6 IRanges_1.10.5 > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD COMMENT • link 12.7 years ago Michael Lawrence ★ 11k

0

Entering edit mode

On Mon, Aug 15, 2011 at 10:10 PM, Michael Lawrence <michafla@gene.com>wrote: > > > On Mon, Aug 15, 2011 at 5:25 PM, Janet Young <jayoung@fhcrc.org> wrote: > >> Hi again, >> >> I have another question, in the "I think I need a convoluted workaround, >> but maybe I'm missing a simple solution" genre (seems like most of my >> questions are like that). >> >> I have an RleList object representing mapability for the whole human >> genome. I'd like to: >> (a) calculate viewMeans for various regions of interest that I've been >> representing as GRanges, and >> (b) I'd like the underlying code to be smart and match chromosome names up >> in the RleList and the GRanges object (not rely on chromosomes being ordered >> the same in the two objects), and >> (c) I'd like to return the viewMeans results in the same order as the >> objects in my original GRanges >> >> I don't think this is currently implemented without doing several >> coercions (that introduce their own complications) but I'm not sure. Some >> code is below that shows what I'm trying to do. It seems like it might be >> a common kind of way to use viewMeans - I imagine people are gradually >> switching to use GRanges more and more? but really I don't know. >> >> My real world question is that I've read in mapability from a UCSC bigWig >> file, and made that into an RleList. I have a bunch of other regions I've >> read in (again from UCSC) using rtracklayer as GRanges, and have annotated >> those with various things I'm interested in (e.g. number of RNAseq reads in >> the region). I want to add the average mapability for each region, so that >> I can later look at how mapability affects those other things I'm >> annotating. >> >> Should I be being more strict with myself about how my GRanges are ordered >> and making sure they all have the same seqlevels and seqlengths? Perhaps >> that would help the coercion workarounds go more smoothly. >> >> thanks, >> >> Janet >> >> >> library(GenomicRanges) >> >> ### make an RleList. Just for this example, I starting with a GRanges >> object and used coverage to get an RleList example. To get my real data I >> read in a UCSC bigWig file. >> >> fakeRegions <- GRanges(seqnames=c("chrA","chrA", "chrB", "chrC"), >> ranges=IRanges(start=c(1,51,1,1), end=c(60,90,20,10) ) ) >> seqlengths(fakeRegions) <- c(100,100,100) >> myRleList <- coverage(fakeRegions) >> rm(fakeRegions) >> >> myRleList >> >> # SimpleRleList of length 3 >> # $chrA >> # 'integer' Rle of length 100 with 4 runs >> # Lengths: 50 10 30 10 >> # Values : 1 2 1 0 >> # >> # $chrB >> # 'integer' Rle of length 100 with 2 runs >> # Lengths: 20 80 >> # Values : 1 0 >> # >> # $chrC >> # 'integer' Rle of length 100 with 2 runs >> # Lengths: 10 90 >> # Values : 1 0 >> >> ### make some regions of interest >> myRegions <- GRanges(seqnames=c("chrB", "chrC", "chrB"), >> ranges=IRanges(start=c(1,1,5), end=c(20,20,10) ), >> geneNames=c("g1","g2","g3") ) >> myRegions >> # GRanges with 3 ranges and 1 elementMetadata value >> # seqnames ranges strand | geneNames >> # <rle> <iranges> <rle> | <character> >> # [1] chrB [1, 20] * | g1 >> # [2] chrC [1, 20] * | g2 >> # [3] chrB [5, 10] * | g3 >> # >> # seqlengths >> # chrB chrC >> # NA NA >> >> ## can't use GRanges directly >> Views( myRleList, myRegions) >> # Error in RleViewsList(rleList = subject, rangesList = start) : >> # 'rangesList' must be a RangesList object >> >> ## can't use a simple coercion >> Views( myRleList, as(myRegions,"RangesList") ) >> # Error in .Method(..., FUN = FUN, MoreArgs = MoreArgs, SIMPLIFY = >> SIMPLIFY, : >> # all object lengths must be multiple of longest object length >> >> ### although I can use that coercion if I first give the GRanges object >> the same levels as the RleList to force the lists to have the same names as >> each other: >> seqlevels(myRegions) <- names(myRleList) >> viewMeans(Views( myRleList, as(myRegions,"RangesList") ) ) >> # SimpleNumericList of length 3 >> # [["chrA"]] numeric(0) >> # [["chrB"]] 1 1 >> # [["chrC"]] 0.5 >> >> > These two lines seem pretty concise to me. It makes sense to layer a > RangesList on top of an RleList. Storing the result would of course be > easier if you had sorted the GRanges by the seqlevels. Having a utility for > that would be nice (orderBySeqlevels?). It would also be easy with > RangedData, which enforces ordering by space. > Just to clarify, here is how the function would work: values(myRegions)$meanCov[orderBySeqlevels(myRegions)] <- unlist(viewMeans(Views( myRleList, as(myRegions,"RangesList") ) ), use.names=FALSE) So 'orderBySeqlevels' would be a nice bridge back into the flat GRanges world from the Listy world of IRanges. > Michael > > > >> ### getting close to a solution - but I'd have liked to have been able to >> unlist this and directly add to my GRanges object e.g. >> values(myRegions)$meanCov <- unlist(viewMeans(Views( myRleList, >> as(myRegions,"RangesList") ) ), use.names=FALSE) >> ### but the regions are in a different order to how I started, so the >> above command would associate the wrong scores with the wrong regions. >> >> sessionInfo() >> >> R version 2.13.1 (2011-07-08) >> Platform: i386-apple-darwin9.8.0/i386 (32-bit) >> >> locale: >> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] GenomicRanges_1.4.6 IRanges_1.10.5 >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > [[alternative HTML version deleted]]

ADD REPLY • link 12.7 years ago Michael Lawrence ★ 11k

0

Entering edit mode

Hi Michael, On 11-08-15 10:28 PM, Michael Lawrence wrote: > On Mon, Aug 15, 2011 at 10:10 PM, Michael Lawrence<michafla at="" gene.com="">wrote: > >> >> >> On Mon, Aug 15, 2011 at 5:25 PM, Janet Young<jayoung at="" fhcrc.org=""> wrote: >> >>> Hi again, >>> >>> I have another question, in the "I think I need a convoluted workaround, >>> but maybe I'm missing a simple solution" genre (seems like most of my >>> questions are like that). >>> >>> I have an RleList object representing mapability for the whole human >>> genome. I'd like to: >>> (a) calculate viewMeans for various regions of interest that I've been >>> representing as GRanges, and >>> (b) I'd like the underlying code to be smart and match chromosome names up >>> in the RleList and the GRanges object (not rely on chromosomes being ordered >>> the same in the two objects), and >>> (c) I'd like to return the viewMeans results in the same order as the >>> objects in my original GRanges >>> >>> I don't think this is currently implemented without doing several >>> coercions (that introduce their own complications) but I'm not sure. Some >>> code is below that shows what I'm trying to do. It seems like it might be >>> a common kind of way to use viewMeans - I imagine people are gradually >>> switching to use GRanges more and more? but really I don't know. >>> >>> My real world question is that I've read in mapability from a UCSC bigWig >>> file, and made that into an RleList. I have a bunch of other regions I've >>> read in (again from UCSC) using rtracklayer as GRanges, and have annotated >>> those with various things I'm interested in (e.g. number of RNAseq reads in >>> the region). I want to add the average mapability for each region, so that >>> I can later look at how mapability affects those other things I'm >>> annotating. >>> >>> Should I be being more strict with myself about how my GRanges are ordered >>> and making sure they all have the same seqlevels and seqlengths? Perhaps >>> that would help the coercion workarounds go more smoothly. >>> >>> thanks, >>> >>> Janet >>> >>> >>> library(GenomicRanges) >>> >>> ### make an RleList. Just for this example, I starting with a GRanges >>> object and used coverage to get an RleList example. To get my real data I >>> read in a UCSC bigWig file. >>> >>> fakeRegions<- GRanges(seqnames=c("chrA","chrA", "chrB", "chrC"), >>> ranges=IRanges(start=c(1,51,1,1), end=c(60,90,20,10) ) ) >>> seqlengths(fakeRegions)<- c(100,100,100) >>> myRleList<- coverage(fakeRegions) >>> rm(fakeRegions) >>> >>> myRleList >>> >>> # SimpleRleList of length 3 >>> # $chrA >>> # 'integer' Rle of length 100 with 4 runs >>> # Lengths: 50 10 30 10 >>> # Values : 1 2 1 0 >>> # >>> # $chrB >>> # 'integer' Rle of length 100 with 2 runs >>> # Lengths: 20 80 >>> # Values : 1 0 >>> # >>> # $chrC >>> # 'integer' Rle of length 100 with 2 runs >>> # Lengths: 10 90 >>> # Values : 1 0 >>> >>> ### make some regions of interest >>> myRegions<- GRanges(seqnames=c("chrB", "chrC", "chrB"), >>> ranges=IRanges(start=c(1,1,5), end=c(20,20,10) ), >>> geneNames=c("g1","g2","g3") ) >>> myRegions >>> # GRanges with 3 ranges and 1 elementMetadata value >>> # seqnames ranges strand | geneNames >>> #<rle> <iranges> <rle> |<character> >>> # [1] chrB [1, 20] * | g1 >>> # [2] chrC [1, 20] * | g2 >>> # [3] chrB [5, 10] * | g3 >>> # >>> # seqlengths >>> # chrB chrC >>> # NA NA >>> >>> ## can't use GRanges directly >>> Views( myRleList, myRegions) >>> # Error in RleViewsList(rleList = subject, rangesList = start) : >>> # 'rangesList' must be a RangesList object >>> >>> ## can't use a simple coercion >>> Views( myRleList, as(myRegions,"RangesList") ) >>> # Error in .Method(..., FUN = FUN, MoreArgs = MoreArgs, SIMPLIFY = >>> SIMPLIFY, : >>> # all object lengths must be multiple of longest object length >>> >>> ### although I can use that coercion if I first give the GRanges object >>> the same levels as the RleList to force the lists to have the same names as >>> each other: >>> seqlevels(myRegions)<- names(myRleList) >>> viewMeans(Views( myRleList, as(myRegions,"RangesList") ) ) >>> # SimpleNumericList of length 3 >>> # [["chrA"]] numeric(0) >>> # [["chrB"]] 1 1 >>> # [["chrC"]] 0.5 >>> >>> >> These two lines seem pretty concise to me. It makes sense to layer a >> RangesList on top of an RleList. Storing the result would of course be >> easier if you had sorted the GRanges by the seqlevels. Having a utility for >> that would be nice (orderBySeqlevels?). It would also be easy with >> RangedData, which enforces ordering by space. >> > > Just to clarify, here is how the function would work: > > values(myRegions)$meanCov[orderBySeqlevels(myRegions)]<- > unlist(viewMeans(Views( myRleList, as(myRegions,"RangesList") ) ), > use.names=FALSE) > > So 'orderBySeqlevels' would be a nice bridge back into the flat GRanges > world from the Listy world of IRanges. Well I'm just remembering now that we still don't have an "order" method for GRanges objects like we do for IRanges object. The "natural" order for a GRanges object would be to order first by (a) seqlevel, (b) then by strand, (c) then by start, (d) and finally by end. We already do (c) and (d) for IRanges. Also the "reduce" method for GRanges already uses this "natural" order implicitly. So we will add this "order" method and the other basic order-related methods (sort, >=, <=, ==, !=, unique, duplicated, they are all missing) for GRanges objects. That will cover Janet's use case and other user cases (I think someone asked how to find duplicated in a GRanges object a while ago on this list). Cheers, H. > > >> Michael >> >> >> >>> ### getting close to a solution - but I'd have liked to have been able to >>> unlist this and directly add to my GRanges object e.g. >>> values(myRegions)$meanCov<- unlist(viewMeans(Views( myRleList, >>> as(myRegions,"RangesList") ) ), use.names=FALSE) >>> ### but the regions are in a different order to how I started, so the >>> above command would associate the wrong scores with the wrong regions. >>> >>> sessionInfo() >>> >>> R version 2.13.1 (2011-07-08) >>> Platform: i386-apple-darwin9.8.0/i386 (32-bit) >>> >>> locale: >>> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 >>> >>> attached base packages: >>> [1] stats graphics grDevices utils datasets methods base >>> >>> other attached packages: >>> [1] GenomicRanges_1.4.6 IRanges_1.10.5 >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319

ADD REPLY • link 12.7 years ago Hervé Pagès 16k

0

Entering edit mode

2011/8/16 Hervé Pagès <hpages@fhcrc.org> > Hi Michael, > > > On 11-08-15 10:28 PM, Michael Lawrence wrote: > >> On Mon, Aug 15, 2011 at 10:10 PM, Michael Lawrence<michafla@gene.com>** >> wrote: >> >> >>> >>> On Mon, Aug 15, 2011 at 5:25 PM, Janet Young<jayoung@fhcrc.org> wrote: >>> >>> Hi again, >>>> >>>> I have another question, in the "I think I need a convoluted workaround, >>>> but maybe I'm missing a simple solution" genre (seems like most of my >>>> questions are like that). >>>> >>>> I have an RleList object representing mapability for the whole human >>>> genome. I'd like to: >>>> (a) calculate viewMeans for various regions of interest that I've been >>>> representing as GRanges, and >>>> (b) I'd like the underlying code to be smart and match chromosome names >>>> up >>>> in the RleList and the GRanges object (not rely on chromosomes being >>>> ordered >>>> the same in the two objects), and >>>> (c) I'd like to return the viewMeans results in the same order as the >>>> objects in my original GRanges >>>> >>>> I don't think this is currently implemented without doing several >>>> coercions (that introduce their own complications) but I'm not sure. >>>> Some >>>> code is below that shows what I'm trying to do. It seems like it might >>>> be >>>> a common kind of way to use viewMeans - I imagine people are gradually >>>> switching to use GRanges more and more? but really I don't know. >>>> >>>> My real world question is that I've read in mapability from a UCSC >>>> bigWig >>>> file, and made that into an RleList. I have a bunch of other regions >>>> I've >>>> read in (again from UCSC) using rtracklayer as GRanges, and have >>>> annotated >>>> those with various things I'm interested in (e.g. number of RNAseq reads >>>> in >>>> the region). I want to add the average mapability for each region, so >>>> that >>>> I can later look at how mapability affects those other things I'm >>>> annotating. >>>> >>>> Should I be being more strict with myself about how my GRanges are >>>> ordered >>>> and making sure they all have the same seqlevels and seqlengths? >>>> Perhaps >>>> that would help the coercion workarounds go more smoothly. >>>> >>>> thanks, >>>> >>>> Janet >>>> >>>> >>>> library(GenomicRanges) >>>> >>>> ### make an RleList. Just for this example, I starting with a GRanges >>>> object and used coverage to get an RleList example. To get my real data >>>> I >>>> read in a UCSC bigWig file. >>>> >>>> fakeRegions<- GRanges(seqnames=c("chrA","**chrA", "chrB", "chrC"), >>>> ranges=IRanges(start=c(1,51,1,**1), end=c(60,90,20,10) ) ) >>>> seqlengths(fakeRegions)<- c(100,100,100) >>>> myRleList<- coverage(fakeRegions) >>>> rm(fakeRegions) >>>> >>>> myRleList >>>> >>>> # SimpleRleList of length 3 >>>> # $chrA >>>> # 'integer' Rle of length 100 with 4 runs >>>> # Lengths: 50 10 30 10 >>>> # Values : 1 2 1 0 >>>> # >>>> # $chrB >>>> # 'integer' Rle of length 100 with 2 runs >>>> # Lengths: 20 80 >>>> # Values : 1 0 >>>> # >>>> # $chrC >>>> # 'integer' Rle of length 100 with 2 runs >>>> # Lengths: 10 90 >>>> # Values : 1 0 >>>> >>>> ### make some regions of interest >>>> myRegions<- GRanges(seqnames=c("chrB", "chrC", "chrB"), >>>> ranges=IRanges(start=c(1,1,5), end=c(20,20,10) ), >>>> geneNames=c("g1","g2","g3") ) >>>> myRegions >>>> # GRanges with 3 ranges and 1 elementMetadata value >>>> # seqnames ranges strand | geneNames >>>> #<rle> <iranges> <rle> |<character> >>>> # [1] chrB [1, 20] * | g1 >>>> # [2] chrC [1, 20] * | g2 >>>> # [3] chrB [5, 10] * | g3 >>>> # >>>> # seqlengths >>>> # chrB chrC >>>> # NA NA >>>> >>>> ## can't use GRanges directly >>>> Views( myRleList, myRegions) >>>> # Error in RleViewsList(rleList = subject, rangesList = start) : >>>> # 'rangesList' must be a RangesList object >>>> >>>> ## can't use a simple coercion >>>> Views( myRleList, as(myRegions,"RangesList") ) >>>> # Error in .Method(..., FUN = FUN, MoreArgs = MoreArgs, SIMPLIFY = >>>> SIMPLIFY, : >>>> # all object lengths must be multiple of longest object length >>>> >>>> ### although I can use that coercion if I first give the GRanges object >>>> the same levels as the RleList to force the lists to have the same names >>>> as >>>> each other: >>>> seqlevels(myRegions)<- names(myRleList) >>>> viewMeans(Views( myRleList, as(myRegions,"RangesList") ) ) >>>> # SimpleNumericList of length 3 >>>> # [["chrA"]] numeric(0) >>>> # [["chrB"]] 1 1 >>>> # [["chrC"]] 0.5 >>>> >>>> >>>> These two lines seem pretty concise to me. It makes sense to layer a >>> RangesList on top of an RleList. Storing the result would of course be >>> easier if you had sorted the GRanges by the seqlevels. Having a utility >>> for >>> that would be nice (orderBySeqlevels?). It would also be easy with >>> RangedData, which enforces ordering by space. >>> >>> >> Just to clarify, here is how the function would work: >> >> values(myRegions)$meanCov[**orderBySeqlevels(myRegions)]<- >> unlist(viewMeans(Views( myRleList, as(myRegions,"RangesList") ) ), >> use.names=FALSE) >> >> So 'orderBySeqlevels' would be a nice bridge back into the flat GRanges >> world from the Listy world of IRanges. >> > > Well I'm just remembering now that we still don't have an "order" > method for GRanges objects like we do for IRanges object. The "natural" > order for a GRanges object would be to order first by (a) seqlevel, > (b) then by strand, (c) then by start, (d) and finally by end. > > We already do (c) and (d) for IRanges. > > Also the "reduce" method for GRanges already uses this "natural" > order implicitly. > > So we will add this "order" method and the other basic order-related > methods (sort, >=, <=, ==, !=, unique, duplicated, they are all > missing) for GRanges objects. That will cover Janet's use case and > other user cases (I think someone asked how to find duplicated in > a GRanges object a while ago on this list). > > An "order" method would be great. Note though that the coercion to RangesList only sorts by seqlevels, so we would need something that gave out that order vector. The point here is to avoid forcing the user to modify the order of the original GRanges. Michael > Cheers, > H. > > >> >> Michael >>> >>> >>> >>> ### getting close to a solution - but I'd have liked to have been able >>>> to >>>> unlist this and directly add to my GRanges object e.g. >>>> values(myRegions)$meanCov<- unlist(viewMeans(Views( myRleList, >>>> as(myRegions,"RangesList") ) ), use.names=FALSE) >>>> ### but the regions are in a different order to how I started, so the >>>> above command would associate the wrong scores with the wrong regions. >>>> >>>> sessionInfo() >>>> >>>> R version 2.13.1 (2011-07-08) >>>> Platform: i386-apple-darwin9.8.0/i386 (32-bit) >>>> >>>> locale: >>>> [1] en_US.UTF-8/en_US.UTF-8/C/C/**en_US.UTF-8/en_US.UTF-8 >>>> >>>> attached base packages: >>>> [1] stats graphics grDevices utils datasets methods base >>>> >>>> other attached packages: >>>> [1] GenomicRanges_1.4.6 IRanges_1.10.5 >>>> >>>> ______________________________**_________________ >>>> Bioconductor mailing list >>>> Bioconductor@r-project.org >>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat="" .ethz.ch="" mailman="" listinfo="" bioconductor=""> >>>> Search the archives: >>>> http://news.gmane.org/gmane.**science.biology.informatics.**condu ctor<http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> >>>> >>>> >>> >>> >> [[alternative HTML version deleted]] >> >> >> ______________________________**_________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.e="" thz.ch="" mailman="" listinfo="" bioconductor=""> >> Search the archives: http://news.gmane.org/gmane.** >> science.biology.informatics.**conductor<http: news.gmane.org="" gmane="" .science.biology.informatics.conductor=""> >> >> > > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpages@fhcrc.org > Phone: (206) 667-5791 > Fax: (206) 667-1319 > [[alternative HTML version deleted]]

ADD REPLY • link 12.7 years ago Michael Lawrence ★ 11k

0

Entering edit mode

Michael, On 11-08-16 10:52 AM, Michael Lawrence wrote: > > > 2011/8/16 Hervé Pagès <hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">> > > Hi Michael, > > > On 11-08-15 10:28 PM, Michael Lawrence wrote: > > On Mon, Aug 15, 2011 at 10:10 PM, Michael > Lawrence<michafla at="" gene.com="" <mailto:michafla="" at="" gene.com="">>__wrote: > > > > On Mon, Aug 15, 2011 at 5:25 PM, Janet > Young<jayoung at="" fhcrc.org="" <mailto:jayoung="" at="" fhcrc.org="">> wrote: > > Hi again, > > I have another question, in the "I think I need a > convoluted workaround, > but maybe I'm missing a simple solution" genre (seems > like most of my > questions are like that). > > I have an RleList object representing mapability for the > whole human > genome. I'd like to: > (a) calculate viewMeans for various regions of interest > that I've been > representing as GRanges, and > (b) I'd like the underlying code to be smart and match > chromosome names up > in the RleList and the GRanges object (not rely on > chromosomes being ordered > the same in the two objects), and > (c) I'd like to return the viewMeans results in the same > order as the > objects in my original GRanges > > I don't think this is currently implemented without > doing several > coercions (that introduce their own complications) but > I'm not sure. Some > code is below that shows what I'm trying to do. It > seems like it might be > a common kind of way to use viewMeans - I imagine people > are gradually > switching to use GRanges more and more? but really I > don't know. > > My real world question is that I've read in mapability > from a UCSC bigWig > file, and made that into an RleList. I have a bunch of > other regions I've > read in (again from UCSC) using rtracklayer as GRanges, > and have annotated > those with various things I'm interested in (e.g. number > of RNAseq reads in > the region). I want to add the average mapability for > each region, so that > I can later look at how mapability affects those other > things I'm > annotating. > > Should I be being more strict with myself about how my > GRanges are ordered > and making sure they all have the same seqlevels and > seqlengths? Perhaps > that would help the coercion workarounds go more smoothly. > > thanks, > > Janet > > > library(GenomicRanges) > > ### make an RleList. Just for this example, I starting > with a GRanges > object and used coverage to get an RleList example. To > get my real data I > read in a UCSC bigWig file. > > fakeRegions<- GRanges(seqnames=c("chrA","__chrA", > "chrB", "chrC"), > ranges=IRanges(start=c(1,51,1,__1), end=c(60,90,20,10) ) ) > seqlengths(fakeRegions)<- c(100,100,100) > myRleList<- coverage(fakeRegions) > rm(fakeRegions) > > myRleList > > # SimpleRleList of length 3 > # $chrA > # 'integer' Rle of length 100 with 4 runs > # Lengths: 50 10 30 10 > # Values : 1 2 1 0 > # > # $chrB > # 'integer' Rle of length 100 with 2 runs > # Lengths: 20 80 > # Values : 1 0 > # > # $chrC > # 'integer' Rle of length 100 with 2 runs > # Lengths: 10 90 > # Values : 1 0 > > ### make some regions of interest > myRegions<- GRanges(seqnames=c("chrB", "chrC", "chrB"), > ranges=IRanges(start=c(1,1,5), end=c(20,20,10) ), > geneNames=c("g1","g2","g3") ) > myRegions > # GRanges with 3 ranges and 1 elementMetadata value > # seqnames ranges strand | geneNames > #<rle> <iranges> <rle> |<character> > # [1] chrB [1, 20] * | g1 > # [2] chrC [1, 20] * | g2 > # [3] chrB [5, 10] * | g3 > # > # seqlengths > # chrB chrC > # NA NA > > ## can't use GRanges directly > Views( myRleList, myRegions) > # Error in RleViewsList(rleList = subject, rangesList = > start) : > # 'rangesList' must be a RangesList object > > ## can't use a simple coercion > Views( myRleList, as(myRegions,"RangesList") ) > # Error in .Method(..., FUN = FUN, MoreArgs = MoreArgs, > SIMPLIFY = > SIMPLIFY, : > # all object lengths must be multiple of longest > object length > > ### although I can use that coercion if I first give the > GRanges object > the same levels as the RleList to force the lists to > have the same names as > each other: > seqlevels(myRegions)<- names(myRleList) > viewMeans(Views( myRleList, as(myRegions,"RangesList") ) ) > # SimpleNumericList of length 3 > # [["chrA"]] numeric(0) > # [["chrB"]] 1 1 > # [["chrC"]] 0.5 > > > These two lines seem pretty concise to me. It makes sense to > layer a > RangesList on top of an RleList. Storing the result would of > course be > easier if you had sorted the GRanges by the seqlevels. > Having a utility for > that would be nice (orderBySeqlevels?). It would also be > easy with > RangedData, which enforces ordering by space. > > > Just to clarify, here is how the function would work: > > values(myRegions)$meanCov[__orderBySeqlevels(myRegions)]<- > unlist(viewMeans(Views( myRleList, as(myRegions,"RangesList") ) ), > use.names=FALSE) > > So 'orderBySeqlevels' would be a nice bridge back into the flat > GRanges > world from the Listy world of IRanges. > > > Well I'm just remembering now that we still don't have an "order" > method for GRanges objects like we do for IRanges object. The "natural" > order for a GRanges object would be to order first by (a) seqlevel, > (b) then by strand, (c) then by start, (d) and finally by end. > > We already do (c) and (d) for IRanges. > > Also the "reduce" method for GRanges already uses this "natural" > order implicitly. > > So we will add this "order" method and the other basic order- related > methods (sort, >=, <=, ==, !=, unique, duplicated, they are all > missing) for GRanges objects. That will cover Janet's use case and > other user cases (I think someone asked how to find duplicated in > a GRanges object a while ago on this list). > > > An "order" method would be great. Note though that the coercion to > RangesList only sorts by seqlevels, so we would need something that gave > out that order vector. The point here is to avoid forcing the user to > modify the order of the original GRanges. With order() the user will still be able to achieve this with something like: regionMeans <- function(regions, cvg) { seqlevels(regions) <- names(cvg) oo <- order(regions) regions <- regions[oo] ans <- unlist( viewMeans( Views(cvg, as(regions, "RangesList")) ), use.names=FALSE) ans[oo] <- ans # restore original order ans } values(myRegions)$meanCov <- regionMeans(myRegions, myRleList) Tricky :-/ But maybe we could provide a "mean" method that accepts 2 args: a GRanges and an RleList/IntegerList/NumericList. Same with min, max, sum etc... they would all return a numeric vector of the same length as their first argument (the GRanges object). Another approach could be that we make the Views() constructor more flexible so Views(myRleList, myRegions) would just work and return a Views object (not a ViewsList) but that means redefining what a Views object can be (@subject can now be a named List and @ranges a GRanges object). Maybe a cleaner design would be to use a new container for this e.g. GViews (for Genomic Views)... H. > > Michael > > Cheers, > H. > > > > Michael > > > > ### getting close to a solution - but I'd have liked to > have been able to > unlist this and directly add to my GRanges object e.g. > values(myRegions)$meanCov<- unlist(viewMeans(Views( > myRleList, > as(myRegions,"RangesList") ) ), use.names=FALSE) > ### but the regions are in a different order to how I > started, so the > above command would associate the wrong scores with the > wrong regions. > > sessionInfo() > > R version 2.13.1 (2011-07-08) > Platform: i386-apple-darwin9.8.0/i386 (32-bit) > > locale: > [1] en_US.UTF-8/en_US.UTF-8/C/C/__en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets > methods base > > other attached packages: > [1] GenomicRanges_1.4.6 IRanges_1.10.5 > > _________________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > <mailto:bioconductor at="" r-project.org=""> > https://stat.ethz.ch/mailman/__listinfo/bioconductor > <https: stat.ethz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: > http://news.gmane.org/gmane.__science.biology.informatics.__conductor > <http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> > > > > > [[alternative HTML version deleted]] > > > _________________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> > https://stat.ethz.ch/mailman/__listinfo/bioconductor > <https: stat.ethz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: > http://news.gmane.org/gmane.__science.biology.informatics.__conductor > <http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> > > > > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpages at fhcrc.org <mailto:hpages at="" fhcrc.org=""> > Phone: (206) 667-5791 <tel:%28206%29%20667-5791> > Fax: (206) 667-1319 <tel:%28206%29%20667-1319> > > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319

ADD REPLY • link 12.7 years ago Hervé Pagès 16k

0

Entering edit mode

2011/8/16 Hervé Pagès <hpages@fhcrc.org> > Michael, > > On 11-08-16 10:52 AM, Michael Lawrence wrote: > >> >> >> 2011/8/16 Hervé Pagès <hpages@fhcrc.org <mailto:hpages@fhcrc.org="">> >> >> >> Hi Michael, >> >> >> On 11-08-15 10:28 PM, Michael Lawrence wrote: >> >> On Mon, Aug 15, 2011 at 10:10 PM, Michael >> Lawrence<michafla@gene.com <mailto:michafla@gene.com="">>__**wrote: >> >> >> >> >> On Mon, Aug 15, 2011 at 5:25 PM, Janet >> Young<jayoung@fhcrc.org <mailto:jayoung@fhcrc.org="">> wrote: >> >> >> Hi again, >> >> I have another question, in the "I think I need a >> convoluted workaround, >> but maybe I'm missing a simple solution" genre (seems >> like most of my >> questions are like that). >> >> I have an RleList object representing mapability for the >> whole human >> genome. I'd like to: >> (a) calculate viewMeans for various regions of interest >> that I've been >> representing as GRanges, and >> (b) I'd like the underlying code to be smart and match >> chromosome names up >> in the RleList and the GRanges object (not rely on >> chromosomes being ordered >> the same in the two objects), and >> (c) I'd like to return the viewMeans results in the same >> order as the >> objects in my original GRanges >> >> I don't think this is currently implemented without >> doing several >> coercions (that introduce their own complications) but >> I'm not sure. Some >> code is below that shows what I'm trying to do. It >> seems like it might be >> a common kind of way to use viewMeans - I imagine people >> are gradually >> switching to use GRanges more and more? but really I >> don't know. >> >> My real world question is that I've read in mapability >> from a UCSC bigWig >> file, and made that into an RleList. I have a bunch of >> other regions I've >> read in (again from UCSC) using rtracklayer as GRanges, >> and have annotated >> those with various things I'm interested in (e.g. number >> of RNAseq reads in >> the region). I want to add the average mapability for >> each region, so that >> I can later look at how mapability affects those other >> things I'm >> annotating. >> >> Should I be being more strict with myself about how my >> GRanges are ordered >> and making sure they all have the same seqlevels and >> seqlengths? Perhaps >> that would help the coercion workarounds go more smoothly. >> >> thanks, >> >> Janet >> >> >> library(GenomicRanges) >> >> ### make an RleList. Just for this example, I starting >> with a GRanges >> object and used coverage to get an RleList example. To >> get my real data I >> read in a UCSC bigWig file. >> >> fakeRegions<- GRanges(seqnames=c("chrA","__**chrA", >> "chrB", "chrC"), >> ranges=IRanges(start=c(1,51,1,**__1), end=c(60,90,20,10) ) >> ) >> seqlengths(fakeRegions)<- c(100,100,100) >> myRleList<- coverage(fakeRegions) >> rm(fakeRegions) >> >> myRleList >> >> # SimpleRleList of length 3 >> # $chrA >> # 'integer' Rle of length 100 with 4 runs >> # Lengths: 50 10 30 10 >> # Values : 1 2 1 0 >> # >> # $chrB >> # 'integer' Rle of length 100 with 2 runs >> # Lengths: 20 80 >> # Values : 1 0 >> # >> # $chrC >> # 'integer' Rle of length 100 with 2 runs >> # Lengths: 10 90 >> # Values : 1 0 >> >> ### make some regions of interest >> myRegions<- GRanges(seqnames=c("chrB", "chrC", "chrB"), >> ranges=IRanges(start=c(1,1,5), end=c(20,20,10) ), >> geneNames=c("g1","g2","g3") ) >> myRegions >> # GRanges with 3 ranges and 1 elementMetadata value >> # seqnames ranges strand | geneNames >> #<rle> <iranges> <rle> |<character> >> # [1] chrB [1, 20] * | g1 >> # [2] chrC [1, 20] * | g2 >> # [3] chrB [5, 10] * | g3 >> # >> # seqlengths >> # chrB chrC >> # NA NA >> >> ## can't use GRanges directly >> Views( myRleList, myRegions) >> # Error in RleViewsList(rleList = subject, rangesList = >> start) : >> # 'rangesList' must be a RangesList object >> >> ## can't use a simple coercion >> Views( myRleList, as(myRegions,"RangesList") ) >> # Error in .Method(..., FUN = FUN, MoreArgs = MoreArgs, >> SIMPLIFY = >> SIMPLIFY, : >> # all object lengths must be multiple of longest >> object length >> >> ### although I can use that coercion if I first give the >> GRanges object >> the same levels as the RleList to force the lists to >> have the same names as >> each other: >> seqlevels(myRegions)<- names(myRleList) >> viewMeans(Views( myRleList, as(myRegions,"RangesList") ) ) >> # SimpleNumericList of length 3 >> # [["chrA"]] numeric(0) >> # [["chrB"]] 1 1 >> # [["chrC"]] 0.5 >> >> >> These two lines seem pretty concise to me. It makes sense to >> layer a >> RangesList on top of an RleList. Storing the result would of >> course be >> easier if you had sorted the GRanges by the seqlevels. >> Having a utility for >> that would be nice (orderBySeqlevels?). It would also be >> easy with >> RangedData, which enforces ordering by space. >> >> >> Just to clarify, here is how the function would work: >> >> values(myRegions)$meanCov[__**orderBySeqlevels(myRegions)]<- >> unlist(viewMeans(Views( myRleList, as(myRegions,"RangesList") ) ), >> use.names=FALSE) >> >> So 'orderBySeqlevels' would be a nice bridge back into the flat >> GRanges >> world from the Listy world of IRanges. >> >> >> Well I'm just remembering now that we still don't have an "order" >> method for GRanges objects like we do for IRanges object. The "natural" >> order for a GRanges object would be to order first by (a) seqlevel, >> (b) then by strand, (c) then by start, (d) and finally by end. >> >> We already do (c) and (d) for IRanges. >> >> Also the "reduce" method for GRanges already uses this "natural" >> order implicitly. >> >> So we will add this "order" method and the other basic order- related >> methods (sort, >=, <=, ==, !=, unique, duplicated, they are all >> missing) for GRanges objects. That will cover Janet's use case and >> other user cases (I think someone asked how to find duplicated in >> a GRanges object a while ago on this list). >> >> >> An "order" method would be great. Note though that the coercion to >> RangesList only sorts by seqlevels, so we would need something that gave >> out that order vector. The point here is to avoid forcing the user to >> modify the order of the original GRanges. >> > > With order() the user will still be able to achieve this with something > like: > > regionMeans <- function(regions, cvg) > { > seqlevels(regions) <- names(cvg) > oo <- order(regions) > regions <- regions[oo] > ans <- unlist( > viewMeans( > Views(cvg, as(regions, "RangesList")) > ), use.names=FALSE) > ans[oo] <- ans # restore original order > ans > } > > values(myRegions)$meanCov <- regionMeans(myRegions, myRleList) > > Tricky :-/ > > But maybe we could provide a "mean" method that accepts 2 args: > a GRanges and an RleList/IntegerList/**NumericList. Same with > min, max, sum etc... they would all return a numeric vector of the > same length as their first argument (the GRanges object). > > Patrick and I had talked about this. We were calling it the rangeMeans function. It would go along with rangeMins, rangeMaxs, etc. Basically a way to perform a windowed computation without having to construct an intermediate Views. This is a lot easier to design and implement, as you were saying below. There are a lot of other (non-Rle) Vectors that would benefit from window-based summaries. These include the basic vector types, as well as ranges themselves. For example, I and others have encountered a need to find the nearest neighbors of a set of ranges, with the constraint that they fall within the same gene or exon or whatever. One could come up with a GRanges with the seqnames corresponding to a gene, but that's a hack and the current looping implementation of nearest,GRanges is a bit slow. Michael Another approach could be that we make the Views() constructor more > flexible so Views(myRleList, myRegions) would just work and return > a Views object (not a ViewsList) but that means redefining what a > Views object can be (@subject can now be a named List and @ranges > a GRanges object). Maybe a cleaner design would be to use a new > container for this e.g. GViews (for Genomic Views)... > > H. > > > >> Michael >> >> Cheers, >> H. >> >> >> >> Michael >> >> >> >> ### getting close to a solution - but I'd have liked to >> have been able to >> unlist this and directly add to my GRanges object e.g. >> values(myRegions)$meanCov<- unlist(viewMeans(Views( >> myRleList, >> as(myRegions,"RangesList") ) ), use.names=FALSE) >> ### but the regions are in a different order to how I >> started, so the >> above command would associate the wrong scores with the >> wrong regions. >> >> sessionInfo() >> >> R version 2.13.1 (2011-07-08) >> Platform: i386-apple-darwin9.8.0/i386 (32-bit) >> >> locale: >> [1] en_US.UTF-8/en_US.UTF-8/C/C/__** >> en_US.UTF-8/en_US.UTF-8 >> >> attached base packages: >> [1] stats graphics grDevices utils datasets >> methods base >> >> other attached packages: >> [1] GenomicRanges_1.4.6 IRanges_1.10.5 >> >> ______________________________**___________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> <mailto:bioconductor@r-**project.org<bioconductor@r-project.org> >> > >> >> https://stat.ethz.ch/mailman/_**_listinfo/bioconduct or<https: stat.ethz.ch="" mailman="" __listinfo="" bioconductor=""> >> <https: stat.ethz.ch="" mailman="" **listinfo="" bioconducto="" r<https:="" stat.ethz.ch="" mailman="" listinfo="" bioconductor=""> >> > >> Search the archives: >> http://news.gmane.org/gmane.__** >> science.biology.informatics.__**conductor<http: news.gmane.org="" gma="" ne.__science.biology.informatics.__conductor=""> >> <http: news.gmane.org="" gmane.**="">> science.biology.informatics.**conductor<http: news.gmane.org="" gmane="" .science.biology.informatics.conductor=""> >> > >> >> >> >> >> [[alternative HTML version deleted]] >> >> >> ______________________________**___________________ >> Bioconductor mailing list >> Bioconductor@r-project.org <mailto:bioconductor@r-**project.org<bioconductor@r-project.org> >> > >> >> https://stat.ethz.ch/mailman/_**_listinfo/bioconductor<https :="" stat.ethz.ch="" mailman="" __listinfo="" bioconductor=""> >> <https: stat.ethz.ch="" mailman="" **listinfo="" bioconductor<https:="" stat.ethz.ch="" mailman="" listinfo="" bioconductor=""> >> > >> Search the archives: >> http://news.gmane.org/gmane.__**science.biology.informatics.__** >> conductor<http: news.gmane.org="" gmane.__science.biology.informatics="" .__conductor=""> >> <http: news.gmane.org="" gmane.**science.biology.informatics.**="">> conductor<http: news.gmane.org="" gmane.science.biology.informatics.c="" onductor=""> >> > >> >> >> >> -- >> Hervé Pagès >> >> Program in Computational Biology >> Division of Public Health Sciences >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N, M1-B514 >> P.O. Box 19024 >> Seattle, WA 98109-1024 >> >> E-mail: hpages@fhcrc.org <mailto:hpages@fhcrc.org> >> Phone: (206) 667-5791 <tel:%28206%29%20667-5791> >> Fax: (206) 667-1319 <tel:%28206%29%20667-1319> >> >> >> > > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpages@fhcrc.org > Phone: (206) 667-5791 > Fax: (206) 667-1319 > [[alternative HTML version deleted]]

ADD REPLY • link 12.7 years ago Michael Lawrence ★ 11k

0

Entering edit mode

Hi, Thank you both - I'll make sure I keep everything sorted for now (and perhaps later I can be less vigilant and the code will enforce sorting, as well as provide a method to help me do it - that would be nice).

ADD REPLY • link 12.7 years ago Janet Young ▴ 740

0

Entering edit mode

Hervé Pagès 16k

@herve-pages-1542

Last seen 1 day ago

Seattle, WA, United States

Hi Janet, On 11-08-15 05:25 PM, Janet Young wrote: > Hi again, > > I have another question, in the "I think I need a convoluted workaround, but maybe I'm missing a simple solution" genre (seems like most of my questions are like that). > > I have an RleList object representing mapability for the whole human genome. I'd like to: > (a) calculate viewMeans for various regions of interest that I've been representing as GRanges, and > (b) I'd like the underlying code to be smart and match chromosome names up in the RleList and the GRanges object (not rely on chromosomes being ordered the same in the two objects), and > (c) I'd like to return the viewMeans results in the same order as the objects in my original GRanges > > I don't think this is currently implemented without doing several coercions (that introduce their own complications) but I'm not sure. Some code is below that shows what I'm trying to do. It seems like it might be a common kind of way to use viewMeans - I imagine people are gradually switching to use GRanges more and more? but really I don't know. > > My real world question is that I've read in mapability from a UCSC bigWig file, and made that into an RleList. I have a bunch of other regions I've read in (again from UCSC) using rtracklayer as GRanges, and have annotated those with various things I'm interested in (e.g. number of RNAseq reads in the region). I want to add the average mapability for each region, so that I can later look at how mapability affects those other things I'm annotating. > > Should I be being more strict with myself about how my GRanges are ordered and making sure they all have the same seqlevels and seqlengths? Perhaps that would help the coercion workarounds go more smoothly. > > thanks, > > Janet > > > library(GenomicRanges) > > ### make an RleList. Just for this example, I starting with a GRanges object and used coverage to get an RleList example. To get my real data I read in a UCSC bigWig file. > > fakeRegions<- GRanges(seqnames=c("chrA","chrA", "chrB", "chrC"), ranges=IRanges(start=c(1,51,1,1), end=c(60,90,20,10) ) ) > seqlengths(fakeRegions)<- c(100,100,100) > myRleList<- coverage(fakeRegions) > rm(fakeRegions) > > myRleList > > # SimpleRleList of length 3 > # $chrA > # 'integer' Rle of length 100 with 4 runs > # Lengths: 50 10 30 10 > # Values : 1 2 1 0 > # > # $chrB > # 'integer' Rle of length 100 with 2 runs > # Lengths: 20 80 > # Values : 1 0 > # > # $chrC > # 'integer' Rle of length 100 with 2 runs > # Lengths: 10 90 > # Values : 1 0 > > ### make some regions of interest > myRegions<- GRanges(seqnames=c("chrB", "chrC", "chrB"), ranges=IRanges(start=c(1,1,5), end=c(20,20,10) ), geneNames=c("g1","g2","g3") ) > myRegions > # GRanges with 3 ranges and 1 elementMetadata value > # seqnames ranges strand | geneNames > #<rle> <iranges> <rle> |<character> > # [1] chrB [1, 20] * | g1 > # [2] chrC [1, 20] * | g2 > # [3] chrB [5, 10] * | g3 > # > # seqlengths > # chrB chrC > # NA NA > > ## can't use GRanges directly > Views( myRleList, myRegions) > # Error in RleViewsList(rleList = subject, rangesList = start) : > # 'rangesList' must be a RangesList object > > ## can't use a simple coercion > Views( myRleList, as(myRegions,"RangesList") ) > # Error in .Method(..., FUN = FUN, MoreArgs = MoreArgs, SIMPLIFY = SIMPLIFY, : > # all object lengths must be multiple of longest object length > > ### although I can use that coercion if I first give the GRanges object the same levels as the RleList to force the lists to have the same names as each other: > seqlevels(myRegions)<- names(myRleList) > viewMeans(Views( myRleList, as(myRegions,"RangesList") ) ) > # SimpleNumericList of length 3 > # [["chrA"]] numeric(0) > # [["chrB"]] 1 1 > # [["chrC"]] 0.5 > > ### getting close to a solution - but I'd have liked to have been able to unlist this and directly add to my GRanges object e.g. > values(myRegions)$meanCov<- unlist(viewMeans(Views( myRleList, as(myRegions,"RangesList") ) ), use.names=FALSE) > ### but the regions are in a different order to how I started, so the above command would associate the wrong scores with the wrong regions. Yep, 'myRegions' would first need to be sorted by seqlevels for the above to work properly: > oo <- order(as.factor(seqnames(myRegions))) > myRegions <- myRegions[oo] > myRegions GRanges with 3 ranges and 1 elementMetadata value seqnames ranges strand | geneNames <rle> <iranges> <rle> | <character> [1] chrB [1, 20] * | g1 [2] chrB [5, 10] * | g3 [3] chrC [1, 20] * | g2 seqlengths chrA chrB chrC NA NA NA Cheers, H. > > sessionInfo() > > R version 2.13.1 (2011-07-08) > Platform: i386-apple-darwin9.8.0/i386 (32-bit) > > locale: > [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] GenomicRanges_1.4.6 IRanges_1.10.5 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319

ADD COMMENT • link 12.7 years ago Hervé Pagès 16k

Login before adding your answer.