Search
Question: Obtain overlap coordinates in GenomicRanges findSpliceOverlaps
0
gravatar for Guest User
3.7 years ago by
Guest User12k
Guest User12k wrote:
Hi, I was wondering whether it is possible in anyway to obtain the overlap coordinates when intersecting GAlignments objects as query with a GRangesList object, using the findSpliceOverlaps function? Specifically, I would like to obtain the transcriptomic coordinates of the GAlignments in the transcripts that they compatibly intersect with. Right now I'm obtaining this information in a 2 step approach: 1. findSpliceOverlaps(GAlignments, GRangesList, ignore.strand=FALSE) 2. Keeping only the hits that are compatible, I then intersect again each GAlignment and the ranges of the compatible GRange transcript and sum the widths of the exons up to the intersection coordinate. My problem is that the second step is extremely slow. I'd be grateful for some discussion -- output of sessionInfo(): R version 3.0.2 (2013-09-25) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] hash_2.2.6 data.table_1.8.10 Rsamtools_1.14.3 [4] Biostrings_2.30.1 GenomicRanges_1.14.4 XVector_0.2.0 [7] IRanges_1.20.6 BiocGenerics_0.8.0 loaded via a namespace (and not attached): [1] bitops_1.0-6 stats4_3.0.2 tools_3.0.2 zlibbioc_1.8.0 -- Sent via the guest posting facility at bioconductor.org.
ADD COMMENTlink modified 3.7 years ago by Michael Lawrence9.8k • written 3.7 years ago by Guest User12k
0
gravatar for Michael Lawrence
3.7 years ago by
United States
Michael Lawrence9.8k wrote:
Currently there is m <- map(granges, grangeslist) Where 'm' is a RangesMapping indicating the within overlaps (Hits) and the mapped ranges. You would get the granges from the GAlignments with the granges() function. The problem is that the overlap computation uses findOverlaps(type="within") instead of findSpliceOverlaps. One idea would be to take a Hits object as an optional argument. Or, we could add a "pmap" method that would assume the from and to are matched up already and simply perform the mapping. One quick fix would be to create a granges that consists a width-1 range at the start position (and likewise the end position) for each read and pass it to map() as above. Then filter the mappings based on the compatibility results from findSpliceOverlaps(). Not that pretty nor very efficient but it takes care of the nasty stuff. Michael On Fri, Mar 21, 2014 at 9:44 AM, rubi [guest] <guest@bioconductor.org>wrote: > > Hi, > > I was wondering whether it is possible in anyway to obtain the overlap > coordinates when intersecting GAlignments objects as query with a > GRangesList object, using the findSpliceOverlaps function? > > Specifically, I would like to obtain the transcriptomic coordinates of the > GAlignments in the transcripts that they compatibly intersect with. > > Right now I'm obtaining this information in a 2 step approach: > 1. findSpliceOverlaps(GAlignments, GRangesList, ignore.strand=FALSE) > 2. Keeping only the hits that are compatible, I then intersect again each > GAlignment and the ranges of the compatible GRange transcript and sum the > widths of the exons up to the intersection coordinate. > > My problem is that the second step is extremely slow. > > I'd be grateful for some discussion > > -- output of sessionInfo(): > > R version 3.0.2 (2013-09-25) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] hash_2.2.6 data.table_1.8.10 Rsamtools_1.14.3 > [4] Biostrings_2.30.1 GenomicRanges_1.14.4 XVector_0.2.0 > [7] IRanges_1.20.6 BiocGenerics_0.8.0 > > loaded via a namespace (and not attached): > [1] bitops_1.0-6 stats4_3.0.2 tools_3.0.2 zlibbioc_1.8.0 > > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENTlink written 3.7 years ago by Michael Lawrence9.8k
Michael, +1 for pmap! I like the separation of concerns this would offer. I seems to me that the combination of pmap and findSpliceOverlaps should afford a more general solution to the problem solved by VariantAnnotation:: refLocsToLocalLocs (and should be equally performant?). ~Malcolm >-----Original Message----- >From: bioconductor-bounces at r-project.org [mailto:bioconductor- bounces at r-project.org] On Behalf Of Michael Lawrence >Sent: Friday, March 21, 2014 12:17 PM >To: rubi [guest] >Cc: GenomicRanges Maintainer; bioconductor at r-project.org; nimrod.rubinstein at gmail.com >Subject: Re: [BioC] Obtain overlap coordinates in GenomicRanges findSpliceOverlaps > >Currently there is > >m <- map(granges, grangeslist) > >Where 'm' is a RangesMapping indicating the within overlaps (Hits) and the >mapped ranges. You would get the granges from the GAlignments with the >granges() function. The problem is that the overlap computation uses >findOverlaps(type="within") instead of findSpliceOverlaps. One idea would >be to take a Hits object as an optional argument. Or, we could add a "pmap" >method that would assume the from and to are matched up already and simply >perform the mapping. > >One quick fix would be to create a granges that consists a width-1 range at >the start position (and likewise the end position) for each read and pass >it to map() as above. Then filter the mappings based on the compatibility >results from findSpliceOverlaps(). Not that pretty nor very efficient but >it takes care of the nasty stuff. > >Michael > > > >On Fri, Mar 21, 2014 at 9:44 AM, rubi [guest] <guest at="" bioconductor.org="">wrote: > >> >> Hi, >> >> I was wondering whether it is possible in anyway to obtain the overlap >> coordinates when intersecting GAlignments objects as query with a >> GRangesList object, using the findSpliceOverlaps function? >> >> Specifically, I would like to obtain the transcriptomic coordinates of the >> GAlignments in the transcripts that they compatibly intersect with. >> >> Right now I'm obtaining this information in a 2 step approach: >> 1. findSpliceOverlaps(GAlignments, GRangesList, ignore.strand=FALSE) >> 2. Keeping only the hits that are compatible, I then intersect again each >> GAlignment and the ranges of the compatible GRange transcript and sum the >> widths of the exons up to the intersection coordinate. >> >> My problem is that the second step is extremely slow. >> >> I'd be grateful for some discussion >> >> -- output of sessionInfo(): >> >> R version 3.0.2 (2013-09-25) >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] parallel stats graphics grDevices utils datasets methods >> [8] base >> >> other attached packages: >> [1] hash_2.2.6 data.table_1.8.10 Rsamtools_1.14.3 >> [4] Biostrings_2.30.1 GenomicRanges_1.14.4 XVector_0.2.0 >> [7] IRanges_1.20.6 BiocGenerics_0.8.0 >> >> loaded via a namespace (and not attached): >> [1] bitops_1.0-6 stats4_3.0.2 tools_3.0.2 zlibbioc_1.8.0 >> >> >> -- >> Sent via the guest posting facility at bioconductor.org. >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > [[alternative HTML version deleted]] > >_______________________________________________ >Bioconductor mailing list >Bioconductor at r-project.org >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLYlink written 3.7 years ago by Malcolm Cook1.4k
On Fri, Mar 21, 2014 at 10:56 AM, Cook, Malcolm <mec@stowers.org> wrote: > Michael, > > +1 for pmap! > > I like the separation of concerns this would offer. > > I seems to me that the combination of pmap and findSpliceOverlaps should > afford a more general solution to the problem solved by VariantAnnotation:: > refLocsToLocalLocs (and should be equally performant?). > > Yea, actually both map and refLocsToLocalLocs rely on the same underlying function for speed: GenomicRanges:::.listCumsumShifted (writing that one gave me a headache). Unfortunately I don't have the time to spend on things like pmap but I would encourage someone in Seattle to take it on. There's already a method for Ranges,GAlignments but that's the opposite direction as requested in this thread. I write these things as they come up in my work. ~Malcolm > > >-----Original Message----- > >From: bioconductor-bounces@r-project.org [mailto: > bioconductor-bounces@r-project.org] On Behalf Of Michael Lawrence > >Sent: Friday, March 21, 2014 12:17 PM > >To: rubi [guest] > >Cc: GenomicRanges Maintainer; bioconductor@r-project.org; > nimrod.rubinstein@gmail.com > >Subject: Re: [BioC] Obtain overlap coordinates in GenomicRanges > findSpliceOverlaps > > > >Currently there is > > > >m <- map(granges, grangeslist) > > > >Where 'm' is a RangesMapping indicating the within overlaps (Hits) and > the > >mapped ranges. You would get the granges from the GAlignments with the > >granges() function. The problem is that the overlap computation uses > >findOverlaps(type="within") instead of findSpliceOverlaps. One idea would > >be to take a Hits object as an optional argument. Or, we could add a > "pmap" > >method that would assume the from and to are matched up already and > simply > >perform the mapping. > > > >One quick fix would be to create a granges that consists a width-1 range > at > >the start position (and likewise the end position) for each read and pass > >it to map() as above. Then filter the mappings based on the compatibility > >results from findSpliceOverlaps(). Not that pretty nor very efficient but > >it takes care of the nasty stuff. > > > >Michael > > > > > > > >On Fri, Mar 21, 2014 at 9:44 AM, rubi [guest] <guest@bioconductor.org> >wrote: > > > >> > >> Hi, > >> > >> I was wondering whether it is possible in anyway to obtain the overlap > >> coordinates when intersecting GAlignments objects as query with a > >> GRangesList object, using the findSpliceOverlaps function? > >> > >> Specifically, I would like to obtain the transcriptomic coordinates of > the > >> GAlignments in the transcripts that they compatibly intersect with. > >> > >> Right now I'm obtaining this information in a 2 step approach: > >> 1. findSpliceOverlaps(GAlignments, GRangesList, ignore.strand=FALSE) > >> 2. Keeping only the hits that are compatible, I then intersect again > each > >> GAlignment and the ranges of the compatible GRange transcript and sum > the > >> widths of the exons up to the intersection coordinate. > >> > >> My problem is that the second step is extremely slow. > >> > >> I'd be grateful for some discussion > >> > >> -- output of sessionInfo(): > >> > >> R version 3.0.2 (2013-09-25) > >> Platform: x86_64-unknown-linux-gnu (64-bit) > >> > >> locale: > >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > >> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > >> [9] LC_ADDRESS=C LC_TELEPHONE=C > >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > >> > >> attached base packages: > >> [1] parallel stats graphics grDevices utils datasets methods > >> [8] base > >> > >> other attached packages: > >> [1] hash_2.2.6 data.table_1.8.10 Rsamtools_1.14.3 > >> [4] Biostrings_2.30.1 GenomicRanges_1.14.4 XVector_0.2.0 > >> [7] IRanges_1.20.6 BiocGenerics_0.8.0 > >> > >> loaded via a namespace (and not attached): > >> [1] bitops_1.0-6 stats4_3.0.2 tools_3.0.2 zlibbioc_1.8.0 > >> > >> > >> -- > >> Sent via the guest posting facility at bioconductor.org. > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor@r-project.org > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: > >> http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > > > > [[alternative HTML version deleted]] > > > >_______________________________________________ > >Bioconductor mailing list > >Bioconductor@r-project.org > >https://stat.ethz.ch/mailman/listinfo/bioconductor > >Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLYlink written 3.7 years ago by Michael Lawrence9.8k
Thanks for the help. Correct me if I'm wrong but it seems that I first intersect the GAlignments with the GRangesList using the findSpliceOverlaps function, and then run the map function where the granges are of the compatible GAlignments and grangeslist is the corresponding list of GRanges from GRangesList. Makes sense? On Fri, Mar 21, 2014 at 2:20 PM, Michael Lawrence <lawrence.michael@gene.com> wrote: > > > > On Fri, Mar 21, 2014 at 10:56 AM, Cook, Malcolm <mec@stowers.org> wrote: > >> Michael, >> >> +1 for pmap! >> >> I like the separation of concerns this would offer. >> >> I seems to me that the combination of pmap and findSpliceOverlaps should >> afford a more general solution to the problem solved by VariantAnnotation:: >> refLocsToLocalLocs (and should be equally performant?). >> >> > Yea, actually both map and refLocsToLocalLocs rely on the same underlying > function for speed: GenomicRanges:::.listCumsumShifted (writing that one > gave me a headache). > > Unfortunately I don't have the time to spend on things like pmap but I > would encourage someone in Seattle to take it on. There's already a method > for Ranges,GAlignments but that's the opposite direction as requested in > this thread. I write these things as they come up in my work. > > ~Malcolm >> >> >-----Original Message----- >> >From: bioconductor-bounces@r-project.org [mailto: >> bioconductor-bounces@r-project.org] On Behalf Of Michael Lawrence >> >Sent: Friday, March 21, 2014 12:17 PM >> >To: rubi [guest] >> >Cc: GenomicRanges Maintainer; bioconductor@r-project.org; >> nimrod.rubinstein@gmail.com >> >Subject: Re: [BioC] Obtain overlap coordinates in GenomicRanges >> findSpliceOverlaps >> > >> >Currently there is >> > >> >m <- map(granges, grangeslist) >> > >> >Where 'm' is a RangesMapping indicating the within overlaps (Hits) and >> the >> >mapped ranges. You would get the granges from the GAlignments with the >> >granges() function. The problem is that the overlap computation uses >> >findOverlaps(type="within") instead of findSpliceOverlaps. One idea >> would >> >be to take a Hits object as an optional argument. Or, we could add a >> "pmap" >> >method that would assume the from and to are matched up already and >> simply >> >perform the mapping. >> > >> >One quick fix would be to create a granges that consists a width-1 >> range at >> >the start position (and likewise the end position) for each read and >> pass >> >it to map() as above. Then filter the mappings based on the >> compatibility >> >results from findSpliceOverlaps(). Not that pretty nor very efficient >> but >> >it takes care of the nasty stuff. >> > >> >Michael >> > >> > >> > >> >On Fri, Mar 21, 2014 at 9:44 AM, rubi [guest] <guest@bioconductor.org>> >wrote: >> > >> >> >> >> Hi, >> >> >> >> I was wondering whether it is possible in anyway to obtain the overlap >> >> coordinates when intersecting GAlignments objects as query with a >> >> GRangesList object, using the findSpliceOverlaps function? >> >> >> >> Specifically, I would like to obtain the transcriptomic coordinates >> of the >> >> GAlignments in the transcripts that they compatibly intersect with. >> >> >> >> Right now I'm obtaining this information in a 2 step approach: >> >> 1. findSpliceOverlaps(GAlignments, GRangesList, ignore.strand=FALSE) >> >> 2. Keeping only the hits that are compatible, I then intersect again >> each >> >> GAlignment and the ranges of the compatible GRange transcript and sum >> the >> >> widths of the exons up to the intersection coordinate. >> >> >> >> My problem is that the second step is extremely slow. >> >> >> >> I'd be grateful for some discussion >> >> >> >> -- output of sessionInfo(): >> >> >> >> R version 3.0.2 (2013-09-25) >> >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> >> >> locale: >> >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >> >> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >> >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> >> >> attached base packages: >> >> [1] parallel stats graphics grDevices utils datasets >> methods >> >> [8] base >> >> >> >> other attached packages: >> >> [1] hash_2.2.6 data.table_1.8.10 Rsamtools_1.14.3 >> >> [4] Biostrings_2.30.1 GenomicRanges_1.14.4 XVector_0.2.0 >> >> [7] IRanges_1.20.6 BiocGenerics_0.8.0 >> >> >> >> loaded via a namespace (and not attached): >> >> [1] bitops_1.0-6 stats4_3.0.2 tools_3.0.2 zlibbioc_1.8.0 >> >> >> >> >> >> -- >> >> Sent via the guest posting facility at bioconductor.org. >> >> >> >> _______________________________________________ >> >> Bioconductor mailing list >> >> Bioconductor@r-project.org >> >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> >> Search the archives: >> >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> > >> > [[alternative HTML version deleted]] >> > >> >_______________________________________________ >> >Bioconductor mailing list >> >Bioconductor@r-project.org >> >https://stat.ethz.ch/mailman/listinfo/bioconductor >> >Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > [[alternative HTML version deleted]]
ADD REPLYlink written 3.7 years ago by rubi70
On Fri, Mar 21, 2014 at 11:29 AM, nimrod.rubinstein < nimrod.rubinstein@gmail.com> wrote: > Thanks for the help. > > Correct me if I'm wrong but it seems that I first intersect the > GAlignments with the GRangesList using the findSpliceOverlaps function, and > then run the map function where the granges are of the compatible GAlignments > and grangeslist is the corresponding list of GRanges from GRangesList. > > Makes sense? > > That will not quite work, you will always have to filter the results from the map() call, because it may try to map things that are not compatible. > > On Fri, Mar 21, 2014 at 2:20 PM, Michael Lawrence < > lawrence.michael@gene.com> wrote: > >> >> >> >> On Fri, Mar 21, 2014 at 10:56 AM, Cook, Malcolm <mec@stowers.org> wrote: >> >>> Michael, >>> >>> +1 for pmap! >>> >>> I like the separation of concerns this would offer. >>> >>> I seems to me that the combination of pmap and findSpliceOverlaps should >>> afford a more general solution to the problem solved by VariantAnnotation:: >>> refLocsToLocalLocs (and should be equally performant?). >>> >>> >> Yea, actually both map and refLocsToLocalLocs rely on the same underlying >> function for speed: GenomicRanges:::.listCumsumShifted (writing that one >> gave me a headache). >> >> Unfortunately I don't have the time to spend on things like pmap but I >> would encourage someone in Seattle to take it on. There's already a method >> for Ranges,GAlignments but that's the opposite direction as requested in >> this thread. I write these things as they come up in my work. >> >> ~Malcolm >>> >>> >-----Original Message----- >>> >From: bioconductor-bounces@r-project.org [mailto: >>> bioconductor-bounces@r-project.org] On Behalf Of Michael Lawrence >>> >Sent: Friday, March 21, 2014 12:17 PM >>> >To: rubi [guest] >>> >Cc: GenomicRanges Maintainer; bioconductor@r-project.org; >>> nimrod.rubinstein@gmail.com >>> >Subject: Re: [BioC] Obtain overlap coordinates in GenomicRanges >>> findSpliceOverlaps >>> > >>> >Currently there is >>> > >>> >m <- map(granges, grangeslist) >>> > >>> >Where 'm' is a RangesMapping indicating the within overlaps (Hits) and >>> the >>> >mapped ranges. You would get the granges from the GAlignments with the >>> >granges() function. The problem is that the overlap computation uses >>> >findOverlaps(type="within") instead of findSpliceOverlaps. One idea >>> would >>> >be to take a Hits object as an optional argument. Or, we could add a >>> "pmap" >>> >method that would assume the from and to are matched up already and >>> simply >>> >perform the mapping. >>> > >>> >One quick fix would be to create a granges that consists a width-1 >>> range at >>> >the start position (and likewise the end position) for each read and >>> pass >>> >it to map() as above. Then filter the mappings based on the >>> compatibility >>> >results from findSpliceOverlaps(). Not that pretty nor very efficient >>> but >>> >it takes care of the nasty stuff. >>> > >>> >Michael >>> > >>> > >>> > >>> >On Fri, Mar 21, 2014 at 9:44 AM, rubi [guest] <guest@bioconductor.org>>> >wrote: >>> > >>> >> >>> >> Hi, >>> >> >>> >> I was wondering whether it is possible in anyway to obtain the >>> overlap >>> >> coordinates when intersecting GAlignments objects as query with a >>> >> GRangesList object, using the findSpliceOverlaps function? >>> >> >>> >> Specifically, I would like to obtain the transcriptomic coordinates >>> of the >>> >> GAlignments in the transcripts that they compatibly intersect with. >>> >> >>> >> Right now I'm obtaining this information in a 2 step approach: >>> >> 1. findSpliceOverlaps(GAlignments, GRangesList, ignore.strand=FALSE) >>> >> 2. Keeping only the hits that are compatible, I then intersect again >>> each >>> >> GAlignment and the ranges of the compatible GRange transcript and >>> sum the >>> >> widths of the exons up to the intersection coordinate. >>> >> >>> >> My problem is that the second step is extremely slow. >>> >> >>> >> I'd be grateful for some discussion >>> >> >>> >> -- output of sessionInfo(): >>> >> >>> >> R version 3.0.2 (2013-09-25) >>> >> Platform: x86_64-unknown-linux-gnu (64-bit) >>> >> >>> >> locale: >>> >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>> >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>> >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >>> >> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >>> >> [9] LC_ADDRESS=C LC_TELEPHONE=C >>> >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>> >> >>> >> attached base packages: >>> >> [1] parallel stats graphics grDevices utils datasets >>> methods >>> >> [8] base >>> >> >>> >> other attached packages: >>> >> [1] hash_2.2.6 data.table_1.8.10 Rsamtools_1.14.3 >>> >> [4] Biostrings_2.30.1 GenomicRanges_1.14.4 XVector_0.2.0 >>> >> [7] IRanges_1.20.6 BiocGenerics_0.8.0 >>> >> >>> >> loaded via a namespace (and not attached): >>> >> [1] bitops_1.0-6 stats4_3.0.2 tools_3.0.2 zlibbioc_1.8.0 >>> >> >>> >> >>> >> -- >>> >> Sent via the guest posting facility at bioconductor.org. >>> >> >>> >> _______________________________________________ >>> >> Bioconductor mailing list >>> >> Bioconductor@r-project.org >>> >> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> >> Search the archives: >>> >> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >>> > >>> > [[alternative HTML version deleted]] >>> > >>> >_______________________________________________ >>> >Bioconductor mailing list >>> >Bioconductor@r-project.org >>> >https://stat.ethz.ch/mailman/listinfo/bioconductor >>> >Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> > [[alternative HTML version deleted]]
ADD REPLYlink written 3.7 years ago by Michael Lawrence9.8k
I guess I thought that map only maps ranges[x] with grangeslist[x] for every x. Do I understand you correctly that it rather maps all ranges against all grangeslist? On Fri, Mar 21, 2014 at 2:39 PM, Michael Lawrence <lawrence.michael@gene.com> wrote: > > > > On Fri, Mar 21, 2014 at 11:29 AM, nimrod.rubinstein < > nimrod.rubinstein@gmail.com> wrote: > >> Thanks for the help. >> >> Correct me if I'm wrong but it seems that I first intersect the >> GAlignments with the GRangesList using the findSpliceOverlaps function, and >> then run the map function where the granges are of the compatible GAlignments >> and grangeslist is the corresponding list of GRanges from GRangesList. >> >> Makes sense? >> >> > That will not quite work, you will always have to filter the results from > the map() call, because it may try to map things that are not compatible. > > >> >> On Fri, Mar 21, 2014 at 2:20 PM, Michael Lawrence < >> lawrence.michael@gene.com> wrote: >> >>> >>> >>> >>> On Fri, Mar 21, 2014 at 10:56 AM, Cook, Malcolm <mec@stowers.org> wrote: >>> >>>> Michael, >>>> >>>> +1 for pmap! >>>> >>>> I like the separation of concerns this would offer. >>>> >>>> I seems to me that the combination of pmap and findSpliceOverlaps >>>> should afford a more general solution to the problem solved by >>>> VariantAnnotation:: refLocsToLocalLocs (and should be equally >>>> performant?). >>>> >>>> >>> Yea, actually both map and refLocsToLocalLocs rely on the same >>> underlying function for speed: GenomicRanges:::.listCumsumShifted (writing >>> that one gave me a headache). >>> >>> Unfortunately I don't have the time to spend on things like pmap but I >>> would encourage someone in Seattle to take it on. There's already a method >>> for Ranges,GAlignments but that's the opposite direction as requested in >>> this thread. I write these things as they come up in my work. >>> >>> ~Malcolm >>>> >>>> >-----Original Message----- >>>> >From: bioconductor-bounces@r-project.org [mailto: >>>> bioconductor-bounces@r-project.org] On Behalf Of Michael Lawrence >>>> >Sent: Friday, March 21, 2014 12:17 PM >>>> >To: rubi [guest] >>>> >Cc: GenomicRanges Maintainer; bioconductor@r-project.org; >>>> nimrod.rubinstein@gmail.com >>>> >Subject: Re: [BioC] Obtain overlap coordinates in GenomicRanges >>>> findSpliceOverlaps >>>> > >>>> >Currently there is >>>> > >>>> >m <- map(granges, grangeslist) >>>> > >>>> >Where 'm' is a RangesMapping indicating the within overlaps (Hits) >>>> and the >>>> >mapped ranges. You would get the granges from the GAlignments with the >>>> >granges() function. The problem is that the overlap computation uses >>>> >findOverlaps(type="within") instead of findSpliceOverlaps. One idea >>>> would >>>> >be to take a Hits object as an optional argument. Or, we could add a >>>> "pmap" >>>> >method that would assume the from and to are matched up already and >>>> simply >>>> >perform the mapping. >>>> > >>>> >One quick fix would be to create a granges that consists a width-1 >>>> range at >>>> >the start position (and likewise the end position) for each read and >>>> pass >>>> >it to map() as above. Then filter the mappings based on the >>>> compatibility >>>> >results from findSpliceOverlaps(). Not that pretty nor very efficient >>>> but >>>> >it takes care of the nasty stuff. >>>> > >>>> >Michael >>>> > >>>> > >>>> > >>>> >On Fri, Mar 21, 2014 at 9:44 AM, rubi [guest] <guest@bioconductor.org>>>> >wrote: >>>> > >>>> >> >>>> >> Hi, >>>> >> >>>> >> I was wondering whether it is possible in anyway to obtain the >>>> overlap >>>> >> coordinates when intersecting GAlignments objects as query with a >>>> >> GRangesList object, using the findSpliceOverlaps function? >>>> >> >>>> >> Specifically, I would like to obtain the transcriptomic coordinates >>>> of the >>>> >> GAlignments in the transcripts that they compatibly intersect with. >>>> >> >>>> >> Right now I'm obtaining this information in a 2 step approach: >>>> >> 1. findSpliceOverlaps(GAlignments, GRangesList, ignore.strand=FALSE) >>>> >> 2. Keeping only the hits that are compatible, I then intersect >>>> again each >>>> >> GAlignment and the ranges of the compatible GRange transcript and >>>> sum the >>>> >> widths of the exons up to the intersection coordinate. >>>> >> >>>> >> My problem is that the second step is extremely slow. >>>> >> >>>> >> I'd be grateful for some discussion >>>> >> >>>> >> -- output of sessionInfo(): >>>> >> >>>> >> R version 3.0.2 (2013-09-25) >>>> >> Platform: x86_64-unknown-linux-gnu (64-bit) >>>> >> >>>> >> locale: >>>> >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>>> >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>>> >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >>>> >> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >>>> >> [9] LC_ADDRESS=C LC_TELEPHONE=C >>>> >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>>> >> >>>> >> attached base packages: >>>> >> [1] parallel stats graphics grDevices utils datasets >>>> methods >>>> >> [8] base >>>> >> >>>> >> other attached packages: >>>> >> [1] hash_2.2.6 data.table_1.8.10 Rsamtools_1.14.3 >>>> >> [4] Biostrings_2.30.1 GenomicRanges_1.14.4 XVector_0.2.0 >>>> >> [7] IRanges_1.20.6 BiocGenerics_0.8.0 >>>> >> >>>> >> loaded via a namespace (and not attached): >>>> >> [1] bitops_1.0-6 stats4_3.0.2 tools_3.0.2 zlibbioc_1.8.0 >>>> >> >>>> >> >>>> >> -- >>>> >> Sent via the guest posting facility at bioconductor.org. >>>> >> >>>> >> _______________________________________________ >>>> >> Bioconductor mailing list >>>> >> Bioconductor@r-project.org >>>> >> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> >> Search the archives: >>>> >> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >> >>>> > >>>> > [[alternative HTML version deleted]] >>>> > >>>> >_______________________________________________ >>>> >Bioconductor mailing list >>>> >Bioconductor@r-project.org >>>> >https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> >Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>> >>> >> > [[alternative HTML version deleted]]
ADD REPLYlink written 3.7 years ago by rubi70
Yea, pmap() would do ranges[x] to grangeslist[x] but pmap() does not exist yet. map() is all by all. That's the downside of it. On Fri, Mar 21, 2014 at 11:49 AM, nimrod.rubinstein < nimrod.rubinstein@gmail.com> wrote: > I guess I thought that map only maps ranges[x] with grangeslist[x] for > every x. Do I understand you correctly that it rather maps all ranges > against all grangeslist? > > > On Fri, Mar 21, 2014 at 2:39 PM, Michael Lawrence < > lawrence.michael@gene.com> wrote: > >> >> >> >> On Fri, Mar 21, 2014 at 11:29 AM, nimrod.rubinstein < >> nimrod.rubinstein@gmail.com> wrote: >> >>> Thanks for the help. >>> >>> Correct me if I'm wrong but it seems that I first intersect the >>> GAlignments with the GRangesList using the findSpliceOverlaps function, and >>> then run the map function where the granges are of the compatible GAlignments >>> and grangeslist is the corresponding list of GRanges from GRangesList. >>> >>> Makes sense? >>> >>> >> That will not quite work, you will always have to filter the results from >> the map() call, because it may try to map things that are not compatible. >> >> >>> >>> On Fri, Mar 21, 2014 at 2:20 PM, Michael Lawrence < >>> lawrence.michael@gene.com> wrote: >>> >>>> >>>> >>>> >>>> On Fri, Mar 21, 2014 at 10:56 AM, Cook, Malcolm <mec@stowers.org>wrote: >>>> >>>>> Michael, >>>>> >>>>> +1 for pmap! >>>>> >>>>> I like the separation of concerns this would offer. >>>>> >>>>> I seems to me that the combination of pmap and findSpliceOverlaps >>>>> should afford a more general solution to the problem solved by >>>>> VariantAnnotation:: refLocsToLocalLocs (and should be equally >>>>> performant?). >>>>> >>>>> >>>> Yea, actually both map and refLocsToLocalLocs rely on the same >>>> underlying function for speed: GenomicRanges:::.listCumsumShifted (writing >>>> that one gave me a headache). >>>> >>>> Unfortunately I don't have the time to spend on things like pmap but I >>>> would encourage someone in Seattle to take it on. There's already a method >>>> for Ranges,GAlignments but that's the opposite direction as requested in >>>> this thread. I write these things as they come up in my work. >>>> >>>> ~Malcolm >>>>> >>>>> >-----Original Message----- >>>>> >From: bioconductor-bounces@r-project.org [mailto: >>>>> bioconductor-bounces@r-project.org] On Behalf Of Michael Lawrence >>>>> >Sent: Friday, March 21, 2014 12:17 PM >>>>> >To: rubi [guest] >>>>> >Cc: GenomicRanges Maintainer; bioconductor@r-project.org; >>>>> nimrod.rubinstein@gmail.com >>>>> >Subject: Re: [BioC] Obtain overlap coordinates in GenomicRanges >>>>> findSpliceOverlaps >>>>> > >>>>> >Currently there is >>>>> > >>>>> >m <- map(granges, grangeslist) >>>>> > >>>>> >Where 'm' is a RangesMapping indicating the within overlaps (Hits) >>>>> and the >>>>> >mapped ranges. You would get the granges from the GAlignments with >>>>> the >>>>> >granges() function. The problem is that the overlap computation uses >>>>> >findOverlaps(type="within") instead of findSpliceOverlaps. One idea >>>>> would >>>>> >be to take a Hits object as an optional argument. Or, we could add a >>>>> "pmap" >>>>> >method that would assume the from and to are matched up already and >>>>> simply >>>>> >perform the mapping. >>>>> > >>>>> >One quick fix would be to create a granges that consists a width-1 >>>>> range at >>>>> >the start position (and likewise the end position) for each read and >>>>> pass >>>>> >it to map() as above. Then filter the mappings based on the >>>>> compatibility >>>>> >results from findSpliceOverlaps(). Not that pretty nor very >>>>> efficient but >>>>> >it takes care of the nasty stuff. >>>>> > >>>>> >Michael >>>>> > >>>>> > >>>>> > >>>>> >On Fri, Mar 21, 2014 at 9:44 AM, rubi [guest] < >>>>> guest@bioconductor.org>wrote: >>>>> > >>>>> >> >>>>> >> Hi, >>>>> >> >>>>> >> I was wondering whether it is possible in anyway to obtain the >>>>> overlap >>>>> >> coordinates when intersecting GAlignments objects as query with a >>>>> >> GRangesList object, using the findSpliceOverlaps function? >>>>> >> >>>>> >> Specifically, I would like to obtain the transcriptomic >>>>> coordinates of the >>>>> >> GAlignments in the transcripts that they compatibly intersect with. >>>>> >> >>>>> >> Right now I'm obtaining this information in a 2 step approach: >>>>> >> 1. findSpliceOverlaps(GAlignments, GRangesList, >>>>> ignore.strand=FALSE) >>>>> >> 2. Keeping only the hits that are compatible, I then intersect >>>>> again each >>>>> >> GAlignment and the ranges of the compatible GRange transcript and >>>>> sum the >>>>> >> widths of the exons up to the intersection coordinate. >>>>> >> >>>>> >> My problem is that the second step is extremely slow. >>>>> >> >>>>> >> I'd be grateful for some discussion >>>>> >> >>>>> >> -- output of sessionInfo(): >>>>> >> >>>>> >> R version 3.0.2 (2013-09-25) >>>>> >> Platform: x86_64-unknown-linux-gnu (64-bit) >>>>> >> >>>>> >> locale: >>>>> >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>>>> >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>>>> >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >>>>> >> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >>>>> >> [9] LC_ADDRESS=C LC_TELEPHONE=C >>>>> >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>>>> >> >>>>> >> attached base packages: >>>>> >> [1] parallel stats graphics grDevices utils datasets >>>>> methods >>>>> >> [8] base >>>>> >> >>>>> >> other attached packages: >>>>> >> [1] hash_2.2.6 data.table_1.8.10 Rsamtools_1.14.3 >>>>> >> [4] Biostrings_2.30.1 GenomicRanges_1.14.4 XVector_0.2.0 >>>>> >> [7] IRanges_1.20.6 BiocGenerics_0.8.0 >>>>> >> >>>>> >> loaded via a namespace (and not attached): >>>>> >> [1] bitops_1.0-6 stats4_3.0.2 tools_3.0.2 zlibbioc_1.8.0 >>>>> >> >>>>> >> >>>>> >> -- >>>>> >> Sent via the guest posting facility at bioconductor.org. >>>>> >> >>>>> >> _______________________________________________ >>>>> >> Bioconductor mailing list >>>>> >> Bioconductor@r-project.org >>>>> >> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>> >> Search the archives: >>>>> >> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>> >> >>>>> > >>>>> > [[alternative HTML version deleted]] >>>>> > >>>>> >_______________________________________________ >>>>> >Bioconductor mailing list >>>>> >Bioconductor@r-project.org >>>>> >https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>> >Search the archives: >>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>> >>>> >>>> >>> >> > [[alternative HTML version deleted]]
ADD REPLYlink written 3.7 years ago by Michael Lawrence9.8k
I see. So I assume the p in pmap stands for paired?. Any ballpark as to when this implementation will be added? On Fri, Mar 21, 2014 at 3:59 PM, Michael Lawrence <lawrence.michael@gene.com> wrote: > Yea, pmap() would do ranges[x] to grangeslist[x] but pmap() does not exist > yet. map() is all by all. That's the downside of it. > > > On Fri, Mar 21, 2014 at 11:49 AM, nimrod.rubinstein < > nimrod.rubinstein@gmail.com> wrote: > >> I guess I thought that map only maps ranges[x] with grangeslist[x] for >> every x. Do I understand you correctly that it rather maps all ranges >> against all grangeslist? >> >> >> On Fri, Mar 21, 2014 at 2:39 PM, Michael Lawrence < >> lawrence.michael@gene.com> wrote: >> >>> >>> >>> >>> On Fri, Mar 21, 2014 at 11:29 AM, nimrod.rubinstein < >>> nimrod.rubinstein@gmail.com> wrote: >>> >>>> Thanks for the help. >>>> >>>> Correct me if I'm wrong but it seems that I first intersect the >>>> GAlignments with the GRangesList using the findSpliceOverlaps function, and >>>> then run the map function where the granges are of the compatible GAlignments >>>> and grangeslist is the corresponding list of GRanges from GRangesList. >>>> >>>> Makes sense? >>>> >>>> >>> That will not quite work, you will always have to filter the results >>> from the map() call, because it may try to map things that are not >>> compatible. >>> >>> >>>> >>>> On Fri, Mar 21, 2014 at 2:20 PM, Michael Lawrence < >>>> lawrence.michael@gene.com> wrote: >>>> >>>>> >>>>> >>>>> >>>>> On Fri, Mar 21, 2014 at 10:56 AM, Cook, Malcolm <mec@stowers.org>wrote: >>>>> >>>>>> Michael, >>>>>> >>>>>> +1 for pmap! >>>>>> >>>>>> I like the separation of concerns this would offer. >>>>>> >>>>>> I seems to me that the combination of pmap and findSpliceOverlaps >>>>>> should afford a more general solution to the problem solved by >>>>>> VariantAnnotation:: refLocsToLocalLocs (and should be equally >>>>>> performant?). >>>>>> >>>>>> >>>>> Yea, actually both map and refLocsToLocalLocs rely on the same >>>>> underlying function for speed: GenomicRanges:::.listCumsumShifted (writing >>>>> that one gave me a headache). >>>>> >>>>> Unfortunately I don't have the time to spend on things like pmap but I >>>>> would encourage someone in Seattle to take it on. There's already a method >>>>> for Ranges,GAlignments but that's the opposite direction as requested in >>>>> this thread. I write these things as they come up in my work. >>>>> >>>>> ~Malcolm >>>>>> >>>>>> >-----Original Message----- >>>>>> >From: bioconductor-bounces@r-project.org [mailto: >>>>>> bioconductor-bounces@r-project.org] On Behalf Of Michael Lawrence >>>>>> >Sent: Friday, March 21, 2014 12:17 PM >>>>>> >To: rubi [guest] >>>>>> >Cc: GenomicRanges Maintainer; bioconductor@r-project.org; >>>>>> nimrod.rubinstein@gmail.com >>>>>> >Subject: Re: [BioC] Obtain overlap coordinates in GenomicRanges >>>>>> findSpliceOverlaps >>>>>> > >>>>>> >Currently there is >>>>>> > >>>>>> >m <- map(granges, grangeslist) >>>>>> > >>>>>> >Where 'm' is a RangesMapping indicating the within overlaps (Hits) >>>>>> and the >>>>>> >mapped ranges. You would get the granges from the GAlignments with >>>>>> the >>>>>> >granges() function. The problem is that the overlap computation uses >>>>>> >findOverlaps(type="within") instead of findSpliceOverlaps. One idea >>>>>> would >>>>>> >be to take a Hits object as an optional argument. Or, we could add >>>>>> a "pmap" >>>>>> >method that would assume the from and to are matched up already and >>>>>> simply >>>>>> >perform the mapping. >>>>>> > >>>>>> >One quick fix would be to create a granges that consists a width-1 >>>>>> range at >>>>>> >the start position (and likewise the end position) for each read >>>>>> and pass >>>>>> >it to map() as above. Then filter the mappings based on the >>>>>> compatibility >>>>>> >results from findSpliceOverlaps(). Not that pretty nor very >>>>>> efficient but >>>>>> >it takes care of the nasty stuff. >>>>>> > >>>>>> >Michael >>>>>> > >>>>>> > >>>>>> > >>>>>> >On Fri, Mar 21, 2014 at 9:44 AM, rubi [guest] < >>>>>> guest@bioconductor.org>wrote: >>>>>> > >>>>>> >> >>>>>> >> Hi, >>>>>> >> >>>>>> >> I was wondering whether it is possible in anyway to obtain the >>>>>> overlap >>>>>> >> coordinates when intersecting GAlignments objects as query with a >>>>>> >> GRangesList object, using the findSpliceOverlaps function? >>>>>> >> >>>>>> >> Specifically, I would like to obtain the transcriptomic >>>>>> coordinates of the >>>>>> >> GAlignments in the transcripts that they compatibly intersect >>>>>> with. >>>>>> >> >>>>>> >> Right now I'm obtaining this information in a 2 step approach: >>>>>> >> 1. findSpliceOverlaps(GAlignments, GRangesList, >>>>>> ignore.strand=FALSE) >>>>>> >> 2. Keeping only the hits that are compatible, I then intersect >>>>>> again each >>>>>> >> GAlignment and the ranges of the compatible GRange transcript and >>>>>> sum the >>>>>> >> widths of the exons up to the intersection coordinate. >>>>>> >> >>>>>> >> My problem is that the second step is extremely slow. >>>>>> >> >>>>>> >> I'd be grateful for some discussion >>>>>> >> >>>>>> >> -- output of sessionInfo(): >>>>>> >> >>>>>> >> R version 3.0.2 (2013-09-25) >>>>>> >> Platform: x86_64-unknown-linux-gnu (64-bit) >>>>>> >> >>>>>> >> locale: >>>>>> >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>>>>> >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>>>>> >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >>>>>> >> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >>>>>> >> [9] LC_ADDRESS=C LC_TELEPHONE=C >>>>>> >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>>>>> >> >>>>>> >> attached base packages: >>>>>> >> [1] parallel stats graphics grDevices utils datasets >>>>>> methods >>>>>> >> [8] base >>>>>> >> >>>>>> >> other attached packages: >>>>>> >> [1] hash_2.2.6 data.table_1.8.10 Rsamtools_1.14.3 >>>>>> >> [4] Biostrings_2.30.1 GenomicRanges_1.14.4 XVector_0.2.0 >>>>>> >> [7] IRanges_1.20.6 BiocGenerics_0.8.0 >>>>>> >> >>>>>> >> loaded via a namespace (and not attached): >>>>>> >> [1] bitops_1.0-6 stats4_3.0.2 tools_3.0.2 zlibbioc_1.8.0 >>>>>> >> >>>>>> >> >>>>>> >> -- >>>>>> >> Sent via the guest posting facility at bioconductor.org. >>>>>> >> >>>>>> >> _______________________________________________ >>>>>> >> Bioconductor mailing list >>>>>> >> Bioconductor@r-project.org >>>>>> >> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>> >> Search the archives: >>>>>> >> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>> >> >>>>>> > >>>>>> > [[alternative HTML version deleted]] >>>>>> > >>>>>> >_______________________________________________ >>>>>> >Bioconductor mailing list >>>>>> >Bioconductor@r-project.org >>>>>> >https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>> >Search the archives: >>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>> >>>>> >>>>> >>>> >>> >> > [[alternative HTML version deleted]]
ADD REPLYlink written 3.7 years ago by rubi70
the p would stand for "parallel", by analogy with pintersect, punion, psetdiff From: nimrod.rubinstein [mailto:nimrod.rubinstein@gmail.com] Sent: Friday, March 21, 2014 4:16 PM To: Michael Lawrence Cc: Cook, Malcolm; rubi [guest]; GenomicRanges Maintainer; bioconductor@r-project.org Subject: Re: [BioC] Obtain overlap coordinates in GenomicRanges findSpliceOverlaps I see. So I assume the p in pmap stands for paired?. Any ballpark as to when this implementation will be added? On Fri, Mar 21, 2014 at 3:59 PM, Michael Lawrence <lawrence.michael@gene.com<mailto:lawrence.michael@gene.com>> wrote: Yea, pmap() would do ranges[x] to grangeslist[x] but pmap() does not exist yet. map() is all by all. That's the downside of it. On Fri, Mar 21, 2014 at 11:49 AM, nimrod.rubinstein <nimrod.rubinstein@gmail.com<mailto:nimrod.rubinstein@gmail.com>> wrote: I guess I thought that map only maps ranges[x] with grangeslist[x] for every x. Do I understand you correctly that it rather maps all ranges against all grangeslist? On Fri, Mar 21, 2014 at 2:39 PM, Michael Lawrence <lawrence.michael@gene.com<mailto:lawrence.michael@gene.com>> wrote: On Fri, Mar 21, 2014 at 11:29 AM, nimrod.rubinstein <nimrod.rubinstein@gmail.com<mailto:nimrod.rubinstein@gmail.com>> wrote: Thanks for the help. Correct me if I'm wrong but it seems that I first intersect the GAlignments with the GRangesList using the findSpliceOverlaps function, and then run the map function where the granges are of the compatible GAlignments and grangeslist is the corresponding list of GRanges from GRangesList. Makes sense? That will not quite work, you will always have to filter the results from the map() call, because it may try to map things that are not compatible. On Fri, Mar 21, 2014 at 2:20 PM, Michael Lawrence <lawrence.michael@gene.com<mailto:lawrence.michael@gene.com>> wrote: On Fri, Mar 21, 2014 at 10:56 AM, Cook, Malcolm <mec@stowers.org<mailto:mec@stowers.org>> wrote: Michael, +1 for pmap! I like the separation of concerns this would offer. I seems to me that the combination of pmap and findSpliceOverlaps should afford a more general solution to the problem solved by VariantAnnotation:: refLocsToLocalLocs (and should be equally performant?). Yea, actually both map and refLocsToLocalLocs rely on the same underlying function for speed: GenomicRanges:::.listCumsumShifted (writing that one gave me a headache). Unfortunately I don't have the time to spend on things like pmap but I would encourage someone in Seattle to take it on. There's already a method for Ranges,GAlignments but that's the opposite direction as requested in this thread. I write these things as they come up in my work. ~Malcolm >-----Original Message----- >From: bioconductor-bounces@r-project.org<mailto:bioconductor- bounces@r-project.org=""> [mailto:bioconductor- bounces@r-project.org<mailto:bioconductor-bounces@r-project.org>] On Behalf Of Michael Lawrence >Sent: Friday, March 21, 2014 12:17 PM >To: rubi [guest] >Cc: GenomicRanges Maintainer; bioconductor@r-project.org<mailto:bioconductor@r-project.org>; nimrod.rubinstein@gmail.com<mailto:nimrod.rubinstein@gmail.com> >Subject: Re: [BioC] Obtain overlap coordinates in GenomicRanges findSpliceOverlaps > >Currently there is > >m <- map(granges, grangeslist) > >Where 'm' is a RangesMapping indicating the within overlaps (Hits) and the >mapped ranges. You would get the granges from the GAlignments with the >granges() function. The problem is that the overlap computation uses >findOverlaps(type="within") instead of findSpliceOverlaps. One idea would >be to take a Hits object as an optional argument. Or, we could add a "pmap" >method that would assume the from and to are matched up already and simply >perform the mapping. > >One quick fix would be to create a granges that consists a width-1 range at >the start position (and likewise the end position) for each read and pass >it to map() as above. Then filter the mappings based on the compatibility >results from findSpliceOverlaps(). Not that pretty nor very efficient but >it takes care of the nasty stuff. > >Michael > > > >On Fri, Mar 21, 2014 at 9:44 AM, rubi [guest] <guest@bioconductor.org<mailto:guest@bioconductor.org>>wrote: > >> >> Hi, >> >> I was wondering whether it is possible in anyway to obtain the overlap >> coordinates when intersecting GAlignments objects as query with a >> GRangesList object, using the findSpliceOverlaps function? >> >> Specifically, I would like to obtain the transcriptomic coordinates of the >> GAlignments in the transcripts that they compatibly intersect with. >> >> Right now I'm obtaining this information in a 2 step approach: >> 1. findSpliceOverlaps(GAlignments, GRangesList, ignore.strand=FALSE) >> 2. Keeping only the hits that are compatible, I then intersect again each >> GAlignment and the ranges of the compatible GRange transcript and sum the >> widths of the exons up to the intersection coordinate. >> >> My problem is that the second step is extremely slow. >> >> I'd be grateful for some discussion >> >> -- output of sessionInfo(): >> >> R version 3.0.2 (2013-09-25) >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] parallel stats graphics grDevices utils datasets methods >> [8] base >> >> other attached packages: >> [1] hash_2.2.6 data.table_1.8.10 Rsamtools_1.14.3 >> [4] Biostrings_2.30.1 GenomicRanges_1.14.4 XVector_0.2.0 >> [7] IRanges_1.20.6 BiocGenerics_0.8.0 >> >> loaded via a namespace (and not attached): >> [1] bitops_1.0-6 stats4_3.0.2 tools_3.0.2 zlibbioc_1.8.0 >> >> >> -- >> Sent via the guest posting facility at bioconductor.org<http: bioconductor.org="">. >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org<mailto:bioconductor@r-project.org> >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > [[alternative HTML version deleted]] > >_______________________________________________ >Bioconductor mailing list >Bioconductor@r-project.org<mailto:bioconductor@r-project.org> >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]
ADD REPLYlink written 3.7 years ago by Malcolm Cook1.4k
Nimrod, I think I have a possible workaround for you involving use of - pintersect : to figure out the regions of compatible overlap - restrict : to find the left and right regions in your transcript models that are outside of the overlapping region But can you send a test case? ~Malcolm From: nimrod.rubinstein [mailto:nimrod.rubinstein@gmail.com] Sent: Friday, March 21, 2014 4:16 PM To: Michael Lawrence Cc: Cook, Malcolm; rubi [guest]; GenomicRanges Maintainer; bioconductor at r-project.org Subject: Re: [BioC] Obtain overlap coordinates in GenomicRanges findSpliceOverlaps I see. So I assume the p in pmap stands for paired?. Any ballpark as to when this implementation will be added? On Fri, Mar 21, 2014 at 3:59 PM, Michael Lawrence <lawrence.michael at="" gene.com=""> wrote: Yea, pmap() would do ranges[x] to grangeslist[x] but pmap() does not exist yet. map() is all by all. That's the downside of it. On Fri, Mar 21, 2014 at 11:49 AM, nimrod.rubinstein <nimrod.rubinstein at="" gmail.com=""> wrote: I guess I thought that map only maps ranges[x] with?grangeslist[x] for every x. Do I understand you correctly that it rather maps all ranges against all?grangeslist? On Fri, Mar 21, 2014 at 2:39 PM, Michael Lawrence <lawrence.michael at="" gene.com=""> wrote: On Fri, Mar 21, 2014 at 11:29 AM, nimrod.rubinstein <nimrod.rubinstein at="" gmail.com=""> wrote: Thanks for the help.? Correct me if I'm wrong but it seems that I first intersect the GAlignments with the?GRangesList using the findSpliceOverlaps function, and then run the map function where the?granges are of the compatible?GAlignments and?grangeslist is the corresponding list of?GRanges from GRangesList. Makes sense?? That will not quite work, you will always have to filter the results from the map() call, because it may try to map things that are not compatible. ? On Fri, Mar 21, 2014 at 2:20 PM, Michael Lawrence <lawrence.michael at="" gene.com=""> wrote: On Fri, Mar 21, 2014 at 10:56 AM, Cook, Malcolm <mec at="" stowers.org=""> wrote: Michael, +1 for pmap! I like the separation of concerns this would offer. I seems to me that the combination of pmap and findSpliceOverlaps should afford a more general solution to the problem solved by VariantAnnotation:: refLocsToLocalLocs ?(and ?should be equally performant?). Yea, actually both map and refLocsToLocalLocs rely on the same underlying function for speed: GenomicRanges:::.listCumsumShifted (writing that one gave me a headache). ? Unfortunately I don't have the time to spend on things like pmap but I would encourage someone in Seattle to take it on. There's already a method for Ranges,GAlignments but that's the opposite direction as requested in this thread. I write these things as they come up in my work. ~Malcolm ?>-----Original Message----- ?>From: bioconductor-bounces at r-project.org [mailto:bioconductor- bounces at r-project.org] On Behalf Of Michael Lawrence ?>Sent: Friday, March 21, 2014 12:17 PM ?>To: rubi [guest] ?>Cc: GenomicRanges Maintainer; bioconductor at r-project.org; nimrod.rubinstein at gmail.com ?>Subject: Re: [BioC] Obtain overlap coordinates in GenomicRanges findSpliceOverlaps ?> ?>Currently there is ?> ?>m <- map(granges, grangeslist) ?> ?>Where 'm' is a RangesMapping indicating the within overlaps (Hits) and the ?>mapped ranges. You would get the granges from the GAlignments with the ?>granges() function. The problem is that the overlap computation uses ?>findOverlaps(type="within") instead of findSpliceOverlaps. One idea would ?>be to take a Hits object as an optional argument. Or, we could add a "pmap" ?>method that would assume the from and to are matched up already and simply ?>perform the mapping. ?> ?>One quick fix would be to create a granges that consists a width-1 range at ?>the start position (and likewise the end position) for each read and pass ?>it to map() as above. Then filter the mappings based on the compatibility ?>results from findSpliceOverlaps(). Not that pretty nor very efficient but ?>it takes care of the nasty stuff. ?> ?>Michael ?> ?> ?> ?>On Fri, Mar 21, 2014 at 9:44 AM, rubi [guest] <guest at="" bioconductor.org="">wrote: ?> ?>> ?>> Hi, ?>> ?>> I was wondering whether it is possible in anyway to obtain the overlap ?>> coordinates when intersecting GAlignments objects as query with a ?>> GRangesList object, using the findSpliceOverlaps function? ?>> ?>> Specifically, I would like to obtain the transcriptomic coordinates of the ?>> GAlignments in the transcripts that they compatibly intersect with. ?>> ?>> Right now I'm obtaining this information in a 2 step approach: ?>> 1. findSpliceOverlaps(GAlignments, GRangesList, ignore.strand=FALSE) ?>> 2. Keeping only the hits that are compatible, I then intersect again each ?>> GAlignment and the ranges of the compatible GRange transcript and sum the ?>> widths of the exons up to the intersection coordinate. ?>> ?>> My problem is that the second step is extremely slow. ?>> ?>> I'd be grateful for some discussion ?>> ?>> ?-- output of sessionInfo(): ?>> ?>> R version 3.0.2 (2013-09-25) ?>> Platform: x86_64-unknown-linux-gnu (64-bit) ?>> ?>> locale: ?>> ?[1] LC_CTYPE=en_US.UTF-8 ? ? ? LC_NUMERIC=C ?>> ?[3] LC_TIME=en_US.UTF-8 ? ? ? ?LC_COLLATE=en_US.UTF-8 ?>> ?[5] LC_MONETARY=en_US.UTF-8 ? ?LC_MESSAGES=en_US.UTF-8 ?>> ?[7] LC_PAPER=en_US.UTF-8 ? ? ? LC_NAME=C ?>> ?[9] LC_ADDRESS=C ? ? ? ? ? ? ? LC_TELEPHONE=C ?>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C ?>> ?>> attached base packages: ?>> [1] parallel ?stats ? ? graphics ?grDevices utils ? ? datasets ?methods ?>> [8] base ?>> ?>> other attached packages: ?>> [1] hash_2.2.6 ? ? ? ? ? data.table_1.8.10 ? ?Rsamtools_1.14.3 ?>> [4] Biostrings_2.30.1 ? ?GenomicRanges_1.14.4 XVector_0.2.0 ?>> [7] IRanges_1.20.6 ? ? ? BiocGenerics_0.8.0 ?>> ?>> loaded via a namespace (and not attached): ?>> [1] bitops_1.0-6 ? stats4_3.0.2 ? tools_3.0.2 ? ?zlibbioc_1.8.0 ?>> ?>> ?>> -- ?>> Sent via the guest posting facility at bioconductor.org. ?>> ?>> _______________________________________________ ?>> Bioconductor mailing list ?>> Bioconductor at r-project.org ?>> https://stat.ethz.ch/mailman/listinfo/bioconductor ?>> Search the archives: ?>> http://news.gmane.org/gmane.science.biology.informatics.conductor ?>> ?> ?> ? ? ?[[alternative HTML version deleted]] ?> ?>_______________________________________________ ?>Bioconductor mailing list ?>Bioconductor at r-project.org ?>https://stat.ethz.ch/mailman/listinfo/bioconductor ?>Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLYlink written 3.7 years ago by Malcolm Cook1.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 235 users visited in the last hour