Search
Question: How to find the amino acid codons corresponding to a subset of a range of genomic positions
0
gravatar for madsheilskov
19 months ago by
Denmark
madsheilskov0 wrote:

 

Recently I asked how one - if given some genomic positions as a GRanges object - could find the amino acid codons that corresponded to these positions (if in a coding exon): C: How to find the amino acid codon corresponding to a translated genomic position . Got a nice solution from the site.

I now find myself in a similar but somewhat extended (complicated ?) situation. I have a GRanges with ~1000 > width > 330  bases. Parts of these ranges overlap with coding exons and I wish to know what are the corresponding amino acid sequences in these overlaps. The catch is that part of each range - since they are genomic - may also overlap with intron or untranslated parts of exons before they 'continue' into some translated part. tried to fiddle around the solution I got in the answer to the above question/link, but couldnt figure it out. How can I find the amino acids "overlapping" these ranges ?

ADD COMMENTlink modified 19 months ago by Michael Lawrence9.3k • written 19 months ago by madsheilskov0
1
gravatar for Michael Lawrence
19 months ago by
United States
Michael Lawrence9.3k wrote:

A first step would probably be to match the ranges to overlapping transcripts (cds) with findOverlaps(). Then, subscript with the hits to line up the ranges with the transcripts and find the portion of each range within the matched transcript with pintersect(). Use pmapToTranscript() to map coordinates, then proceed as before.

ADD COMMENTlink written 19 months ago by Michael Lawrence9.3k

I get the idea, but is stucked after pintersect(). This returns a GRangesList object with the coordinates of the overlaps with transcritps exons and includes a boolean hit mcols. How do I subset this object's TRUE values in order to get the genomic positions (of cds transcripts) with overlap to my regions. I gues this is my next step. The object is like:

GRangesList object of length 249:
$62638 
GRanges object with 7 ranges and 4 metadata columns:
      seqnames             ranges strand |    cds_id    cds_name exon_rank       hit
         <Rle>          <IRanges>  <Rle> | <integer> <character> <integer> <logical>
  [1]    chr17 [7578371, 7578370]      - |    183302        <NA>         1     FALSE
  [2]    chr17 [7578177, 7578176]      - |    183301        <NA>         2     FALSE
  [3]    chr17 [7577499, 7577498]      - |    183298        <NA>         3     FALSE
  [4]    chr17 [7577019, 7577018]      - |    183297        <NA>         4     FALSE
  [5]    chr17 [7576853, 7576852]      - |    183296        <NA>         5     FALSE
  [6]    chr17 [7573927, 7574033]      - |    183293        <NA>         6      TRUE
  [7]    chr17 [7572927, 7572926]      - |    183292        <NA>         7     FALSE

 

ADD REPLYlink written 19 months ago by madsheilskov0

Pass drop.nohit.ranges=TRUE to pintersect().

ADD REPLYlink written 19 months ago by Michael Lawrence9.3k

Smart, and unlist() afterwards gets you a nice GRanges object. Anyway, do you extract/access individual elements of a GRangeList object; would it be something along lapply perhaps ?  

ADD REPLYlink written 19 months ago by madsheilskov0

I don't think you want to unlist, at least not until after the pmapToTranscript() call. While lapply() does of course work (GRangesList is essentially a list), it is typically not the most convenient, nor the most performant, way to things.

ADD REPLYlink written 19 months ago by Michael Lawrence9.3k

This will not work as pmapToTranscripts() can not take two GrangesLists, namely the one from my pintersect() and the other being cds transcripts from cdsBy().

Perhaps I should not use cdsBy() here (that would intuitively make sense though) ?

ADD REPLYlink modified 19 months ago • written 19 months ago by madsheilskov0

Yet, maybe I should not have used unlist() as I now more ranges than the initial 249 list elements since some of my regions ranges overlaps multiple exons. Hence I loose the connection to my query (the ranges) and cannot use  pmapToTranscript().  

So in the end, I seem to be stucked anyway ? 

I use pmapToTranscripts() with the intersect coordinates and the GrangesList of cds transcripts from cdsBy() which does not work as both are List objects.

ADD REPLYlink modified 19 months ago • written 19 months ago by madsheilskov0
1

Basic strategy is

grl_tx <- pmapToTranscripts(unlist(grl), cds[togroup(grl)])

gr <- unlist(reduce(relist(grl_tx, grl)))

But I will add a method to pmapToTranscripts that does that.

ADD REPLYlink written 19 months ago by Michael Lawrence9.3k

Do you mean (following the ideas above):

grl <- pintersect(tr_hits, tx_hits, drop.nohit.ranges=TRUE )

Here tr_hits and tx_hits are output from findOverlaps() as suggested. Then, 

grl_tx <- pmapToTranscripts(unlist(grl), cds[togroup(grl)])

does not work since cds[togroup(grl)] only subsets the first few transcripts of cds (togroup(grl) gives a vector 1:length(grl) ) and cds is the collection of hg19 cds transcripts. I must have misunderstood something here. 

ADD REPLYlink modified 19 months ago • written 19 months ago by madsheilskov0

I was just writing the rough algorithm. In that example, you want to replace cds with tx_hits.

ADD REPLYlink written 19 months ago by Michael Lawrence9.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 149 users visited in the last hour