Mapping genomic coordinates to transcript coordinates?
1
0
Entering edit mode
wrighth ▴ 260
@wrighth-3452
Last seen 9.6 years ago
Hi, all; is there an easy way/function to map genomic coordinates into coordinates in a given transcript? We've got a number of potential polymorphisms we've mapped into UCSC coordinates and we are trying to figure out their positions in associated transcripts so we can figure out the reading frame and extract the changes (if any) to coding, but I haven't found an easy way to do so in Bioconductor. This seems like the sort of thing someone would have a package for but I haven't been able to find it. Any thoughts? Hollis Wright, PhD Oregon Clinical and Translational Research Institute
• 2.2k views
ADD COMMENT
0
Entering edit mode
@michael-lawrence-3846
Last seen 2.3 years ago
United States
Sounds like this would be relatively straight-forward: call IRanges::findOverlaps to find the transcript(s) for each polymorphism and then subtract the transcription start from the position (and add 1), remembering to flip the operation for the negative strand. I could see the "shift" function being used for this, where the 'shift' argument is a Ranges, using the starts. A GenomicRanges method could account for the strand, but "shift" does not yet dispatch on its second argument. But maybe there needs to be a new generic with a better name. On Mon, Nov 29, 2010 at 1:01 PM, Hollis Wright <wrighth@ohsu.edu> wrote: > Hi, all; is there an easy way/function to map genomic coordinates into > coordinates in a given transcript? We've got a number of potential > polymorphisms we've mapped into UCSC coordinates and we are trying to figure > out their positions in associated transcripts so we can figure out the > reading frame and extract the changes (if any) to coding, but I haven't > found an easy way to do so in Bioconductor. This seems like the sort of > thing someone would have a package for but I haven't been able to find it. > Any thoughts? > > Hollis Wright, PhD > Oregon Clinical and Translational Research Institute > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Something similar to this is what I initially tried, yes. We'd like to do it for the spliced transcript, however, and that's where we are having the difficulty since the splice can move the reading frame. Hollis Wright, PhD Oregon Clinical and Translational Research Institute ________________________________________ From: Michael Lawrence [lawrence.michael@gene.com] Sent: Monday, November 29, 2010 1:41 PM To: Hollis Wright Cc: bioconductor at r-project.org Subject: Re: [BioC] Mapping genomic coordinates to transcript coordinates? Sounds like this would be relatively straight-forward: call IRanges::findOverlaps to find the transcript(s) for each polymorphism and then subtract the transcription start from the position (and add 1), remembering to flip the operation for the negative strand. I could see the "shift" function being used for this, where the 'shift' argument is a Ranges, using the starts. A GenomicRanges method could account for the strand, but "shift" does not yet dispatch on its second argument. But maybe there needs to be a new generic with a better name. On Mon, Nov 29, 2010 at 1:01 PM, Hollis Wright <wrighth at="" ohsu.edu<mailto:wrighth="" at="" ohsu.edu="">> wrote: Hi, all; is there an easy way/function to map genomic coordinates into coordinates in a given transcript? We've got a number of potential polymorphisms we've mapped into UCSC coordinates and we are trying to figure out their positions in associated transcripts so we can figure out the reading frame and extract the changes (if any) to coding, but I haven't found an easy way to do so in Bioconductor. This seems like the sort of thing someone would have a package for but I haven't been able to find it. Any thoughts? Hollis Wright, PhD Oregon Clinical and Translational Research Institute _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org<mailto:bioconductor at="" r-project.org=""> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
Hi Hollis, Have you looked at the GenomicFeatures package? It should allow you to get annotations for transcripts from biomaRt or UCSC and then pull out GRangesList objects that represent the different transcripts for each gene mapped into genomic coordinates. You could then compare that to your data using findOverlaps(). Marc On 11/29/2010 02:07 PM, Hollis Wright wrote: > Something similar to this is what I initially tried, yes. We'd like to do it for the spliced transcript, however, and that's where we are having the difficulty since the splice can move the reading frame. > > Hollis Wright, PhD > Oregon Clinical and Translational Research Institute > ________________________________________ > From: Michael Lawrence [lawrence.michael at gene.com] > Sent: Monday, November 29, 2010 1:41 PM > To: Hollis Wright > Cc: bioconductor at r-project.org > Subject: Re: [BioC] Mapping genomic coordinates to transcript coordinates? > > Sounds like this would be relatively straight-forward: call IRanges::findOverlaps to find the transcript(s) for each polymorphism and then subtract the transcription start from the position (and add 1), remembering to flip the operation for the negative strand. > > I could see the "shift" function being used for this, where the 'shift' argument is a Ranges, using the starts. A GenomicRanges method could account for the strand, but "shift" does not yet dispatch on its second argument. But maybe there needs to be a new generic with a better name. > > On Mon, Nov 29, 2010 at 1:01 PM, Hollis Wright <wrighth at="" ohsu.edu<mailto:wrighth="" at="" ohsu.edu="">> wrote: > Hi, all; is there an easy way/function to map genomic coordinates into coordinates in a given transcript? We've got a number of potential polymorphisms we've mapped into UCSC coordinates and we are trying to figure out their positions in associated transcripts so we can figure out the reading frame and extract the changes (if any) to coding, but I haven't found an easy way to do so in Bioconductor. This seems like the sort of thing someone would have a package for but I haven't been able to find it. Any thoughts? > > Hollis Wright, PhD > Oregon Clinical and Translational Research Institute > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org<mailto:bioconductor at="" r-project.org=""> > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD REPLY
0
Entering edit mode
There probably needs to be a feature in GenomicRanges that does this for a GRangesList, considering splicing, etc. It wouldn't be that hard to implement. Any takers? On Mon, Nov 29, 2010 at 3:11 PM, Marc Carlson <mcarlson@fhcrc.org> wrote: > Hi Hollis, > > Have you looked at the GenomicFeatures package? It should allow you to > get annotations for transcripts from biomaRt or UCSC and then pull out > GRangesList objects that represent the different transcripts for each > gene mapped into genomic coordinates. You could then compare that to > your data using findOverlaps(). > > > Marc > > > > On 11/29/2010 02:07 PM, Hollis Wright wrote: > > Something similar to this is what I initially tried, yes. We'd like to do > it for the spliced transcript, however, and that's where we are having the > difficulty since the splice can move the reading frame. > > > > Hollis Wright, PhD > > Oregon Clinical and Translational Research Institute > > ________________________________________ > > From: Michael Lawrence [lawrence.michael@gene.com] > > Sent: Monday, November 29, 2010 1:41 PM > > To: Hollis Wright > > Cc: bioconductor@r-project.org > > Subject: Re: [BioC] Mapping genomic coordinates to transcript > coordinates? > > > > Sounds like this would be relatively straight-forward: call > IRanges::findOverlaps to find the transcript(s) for each polymorphism and > then subtract the transcription start from the position (and add 1), > remembering to flip the operation for the negative strand. > > > > I could see the "shift" function being used for this, where the 'shift' > argument is a Ranges, using the starts. A GenomicRanges method could account > for the strand, but "shift" does not yet dispatch on its second argument. > But maybe there needs to be a new generic with a better name. > > > > On Mon, Nov 29, 2010 at 1:01 PM, Hollis Wright <wrighth@ohsu.edu<mailto:> wrighth@ohsu.edu>> wrote: > > Hi, all; is there an easy way/function to map genomic coordinates into > coordinates in a given transcript? We've got a number of potential > polymorphisms we've mapped into UCSC coordinates and we are trying to figure > out their positions in associated transcripts so we can figure out the > reading frame and extract the changes (if any) to coding, but I haven't > found an easy way to do so in Bioconductor. This seems like the sort of > thing someone would have a package for but I haven't been able to find it. > Any thoughts? > > > > Hollis Wright, PhD > > Oregon Clinical and Translational Research Institute > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org<mailto:bioconductor@r-project.org> > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
You might want to look at BioPerl's module for performing such transformations for inspiration: Bio::Coordinate::GeneMapper http://doc.bioperl.org/releases/bioperl-current/bioperl- live/Bio/Coordinate/GeneMapper.html -Aaron On Mon, Nov 29, 2010 at 7:12 PM, Michael Lawrence <lawrence.michael@gene.com> wrote: > There probably needs to be a feature in GenomicRanges that does this for a > GRangesList, considering splicing, etc. It wouldn't be that hard to > implement. > > Any takers? > > On Mon, Nov 29, 2010 at 3:11 PM, Marc Carlson <mcarlson@fhcrc.org> wrote: > > > Hi Hollis, > > > > Have you looked at the GenomicFeatures package? It should allow you to > > get annotations for transcripts from biomaRt or UCSC and then pull out > > GRangesList objects that represent the different transcripts for each > > gene mapped into genomic coordinates. You could then compare that to > > your data using findOverlaps(). > > > > > > Marc > > > > > > > > On 11/29/2010 02:07 PM, Hollis Wright wrote: > > > Something similar to this is what I initially tried, yes. We'd like to > do > > it for the spliced transcript, however, and that's where we are having > the > > difficulty since the splice can move the reading frame. > > > > > > Hollis Wright, PhD > > > Oregon Clinical and Translational Research Institute > > > ________________________________________ > > > From: Michael Lawrence [lawrence.michael@gene.com] > > > Sent: Monday, November 29, 2010 1:41 PM > > > To: Hollis Wright > > > Cc: bioconductor@r-project.org > > > Subject: Re: [BioC] Mapping genomic coordinates to transcript > > coordinates? > > > > > > Sounds like this would be relatively straight-forward: call > > IRanges::findOverlaps to find the transcript(s) for each polymorphism and > > then subtract the transcription start from the position (and add 1), > > remembering to flip the operation for the negative strand. > > > > > > I could see the "shift" function being used for this, where the 'shift' > > argument is a Ranges, using the starts. A GenomicRanges method could > account > > for the strand, but "shift" does not yet dispatch on its second argument. > > But maybe there needs to be a new generic with a better name. > > > > > > On Mon, Nov 29, 2010 at 1:01 PM, Hollis Wright <wrighth@ohsu.edu> <mailto:> > wrighth@ohsu.edu>> wrote: > > > Hi, all; is there an easy way/function to map genomic coordinates into > > coordinates in a given transcript? We've got a number of potential > > polymorphisms we've mapped into UCSC coordinates and we are trying to > figure > > out their positions in associated transcripts so we can figure out the > > reading frame and extract the changes (if any) to coding, but I haven't > > found an easy way to do so in Bioconductor. This seems like the sort of > > thing someone would have a package for but I haven't been able to find > it. > > Any thoughts? > > > > > > Hollis Wright, PhD > > > Oregon Clinical and Translational Research Institute > > > _______________________________________________ > > > Bioconductor mailing list > > > Bioconductor@r-project.org<mailto:bioconductor@r-project.org> > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > Search the archives: > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > _______________________________________________ > > > Bioconductor mailing list > > > Bioconductor@r-project.org > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > Search the archives: > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Thanks. That is fancier than my thoughts: just a function that finds overlap in an arbitrary Ranges, RangesList, GRanges, GRangesList, etc and subtracts the start of the overlapping region, with some special semantics for GRangesList. It looks like Bio::Coordinate::GeneMapper maps on a per gene basis, while we will want something vectorized over large sets of genes. The complication that arises is that genes and especially transcripts can overlap. Many positions will land in multiple transcripts. Maybe the process could be more incremental. First, findOverlaps(), get a RangesMatching, so the user is aware of the matching. Then a function could take the matching and the input ranges and perform the transformation, returning a result for each matching. Maybe a convenience function that does both, returning an IntegerList without matchings, if it proves useful. Michael On Tue, Nov 30, 2010 at 5:54 AM, Aaron Mackey <amackey@virginia.edu> wrote: > You might want to look at BioPerl's module for performing such > transformations for inspiration: Bio::Coordinate::GeneMapper > > > http://doc.bioperl.org/releases/bioperl-current/bioperl- live/Bio/Coordinate/GeneMapper.html > > -Aaron > > On Mon, Nov 29, 2010 at 7:12 PM, Michael Lawrence < > lawrence.michael@gene.com> wrote: > >> There probably needs to be a feature in GenomicRanges that does this for a >> GRangesList, considering splicing, etc. It wouldn't be that hard to >> implement. >> >> Any takers? >> >> On Mon, Nov 29, 2010 at 3:11 PM, Marc Carlson <mcarlson@fhcrc.org> wrote: >> >> > Hi Hollis, >> > >> > Have you looked at the GenomicFeatures package? It should allow you to >> > get annotations for transcripts from biomaRt or UCSC and then pull out >> > GRangesList objects that represent the different transcripts for each >> > gene mapped into genomic coordinates. You could then compare that to >> > your data using findOverlaps(). >> > >> > >> > Marc >> > >> > >> > >> > On 11/29/2010 02:07 PM, Hollis Wright wrote: >> > > Something similar to this is what I initially tried, yes. We'd like to >> do >> > it for the spliced transcript, however, and that's where we are having >> the >> > difficulty since the splice can move the reading frame. >> > > >> > > Hollis Wright, PhD >> > > Oregon Clinical and Translational Research Institute >> > > ________________________________________ >> > > From: Michael Lawrence [lawrence.michael@gene.com] >> > > Sent: Monday, November 29, 2010 1:41 PM >> > > To: Hollis Wright >> > > Cc: bioconductor@r-project.org >> > > Subject: Re: [BioC] Mapping genomic coordinates to transcript >> > coordinates? >> > > >> > > Sounds like this would be relatively straight-forward: call >> > IRanges::findOverlaps to find the transcript(s) for each polymorphism >> and >> > then subtract the transcription start from the position (and add 1), >> > remembering to flip the operation for the negative strand. >> > > >> > > I could see the "shift" function being used for this, where the >> 'shift' >> > argument is a Ranges, using the starts. A GenomicRanges method could >> account >> > for the strand, but "shift" does not yet dispatch on its second >> argument. >> > But maybe there needs to be a new generic with a better name. >> > > >> > > On Mon, Nov 29, 2010 at 1:01 PM, Hollis Wright <wrighth@ohsu.edu>> <mailto:>> > wrighth@ohsu.edu>> wrote: >> > > Hi, all; is there an easy way/function to map genomic coordinates into >> > coordinates in a given transcript? We've got a number of potential >> > polymorphisms we've mapped into UCSC coordinates and we are trying to >> figure >> > out their positions in associated transcripts so we can figure out the >> > reading frame and extract the changes (if any) to coding, but I haven't >> > found an easy way to do so in Bioconductor. This seems like the sort of >> > thing someone would have a package for but I haven't been able to find >> it. >> > Any thoughts? >> > > >> > > Hollis Wright, PhD >> > > Oregon Clinical and Translational Research Institute >> > > _______________________________________________ >> > > Bioconductor mailing list >> > > Bioconductor@r-project.org<mailto:bioconductor@r-project.org> >> > > https://stat.ethz.ch/mailman/listinfo/bioconductor >> > > Search the archives: >> > http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > >> > > _______________________________________________ >> > > Bioconductor mailing list >> > > Bioconductor@r-project.org >> > > https://stat.ethz.ch/mailman/listinfo/bioconductor >> > > Search the archives: >> > http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > >> > >> > _______________________________________________ >> > Bioconductor mailing list >> > Bioconductor@r-project.org >> > https://stat.ethz.ch/mailman/listinfo/bioconductor >> > Search the archives: >> > http://news.gmane.org/gmane.science.biology.informatics.conductor >> > >> >> [[alternative HTML version deleted]] >> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 944 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6