mapCoords for converting genomic coordinates to cDNA coordinates?
1
1
Entering edit mode
amias ▴ 10
@amias-6932
Last seen 5.4 years ago
United States

Hi,

Is it possible to convert all genomic coordinates in a gene annotation file (gff3) to cDNA coordinates with the mapCoords function in GenomicRanges package in Bioconductor? I have a list of miRNA target sites with cDNA/transcript-based coordinates, and I would like to check if these sites overlap UTR/CDS regions in the transcript. Based on the GenomicRanges manual, I tired to follow the example usage codes for mapCoords to do this (please see below), but the results are not what I wanted (all exon positions based on cDNA coordinate) and I cannot figure out the correct way. Can anyone advise? Thanks in advance!

I am using the TAIR10 Arabidopsis gene-exon annotation file from Phytozome v10. Here is what I did:

>txdb <- makeTranscriptDbFromGFF(file="Athaliana_167_TAIR10.gene_exons.gff3",
                                format="gff3")

>transcripts<-exonsBy(txdb,by="tx",use.names=TRUE)
>transcripts

GRangesList object of length 35386:
$AT1G01010.1.TAIR10

GRanges object with 6 ranges and 3 metadata columns:
      seqnames       ranges strand |   exon_id                 exon_name exon_rank
         <Rle>    <IRanges>  <Rle> | <integer>               <character> <integer>
  [1]     Chr1 [3631, 3913]      + |         1 AT1G01010.1.TAIR10.exon.1         1
  [2]     Chr1 [3996, 4276]      + |         2 AT1G01010.1.TAIR10.exon.2         2
  [3]     Chr1 [4486, 4605]      + |         3 AT1G01010.1.TAIR10.exon.3         3
  [4]     Chr1 [4706, 5095]      + |         4 AT1G01010.1.TAIR10.exon.4         4
  [5]     Chr1 [5174, 5326]      + |         5 AT1G01010.1.TAIR10.exon.5         5
  [6]     Chr1 [5439, 5899]      + |         6 AT1G01010.1.TAIR10.exon.6         6

...
<35385 more elements>
-------
seqinfo: 7 sequences (1 circular) from an unspecified genome; no seqlengths

>GR <- transcripts(txdb) #extract transcript coordinates

>map<-mapCoords(GR,transcripts,elt.hits=TRUE, elt.loc=TRUE)
> map
GRanges object with 6483 ranges and 4 metadata columns:
         seqnames     ranges strand   | queryHits subjectHits     eltLoc   eltHits
            <Rle>  <IRanges>  <Rle>   | <integer>   <integer>  <IRanges> <integer>
     [1]     Chr1 [ 1,  111]      +   |         4           4 [ 1,  111]        47
     [2]     Chr1 [ 1,  117]      +   |         7           7 [ 1,  117]        56
     [3]     Chr1 [ 1, 1176]      +   |        10          10 [ 1, 1176]        68
     [4]     Chr1 [ 1, 1941]      +   |        15          15 [ 1, 1941]        85
     [5]     Chr1 [29, 1941]      +   |        16          15 [29, 1941]        85
     ...      ...        ...    ... ...       ...         ...        ...       ...
  [6479]     ChrM  [1,  876]      -   |     35381       35381  [1,  876]    207458
  [6480]     ChrM  [1,  384]      -   |     35383       35383  [1,  384]    207462
  [6481]     ChrM  [1, 1584]      -   |     35384       35384  [1, 1584]    207463
  [6482]     ChrM  [1,  336]      -   |     35385       35385  [1,  336]    207464
  [6483]     ChrM  [1,  615]      -   |     35386       35386  [1,  615]    207465
  -------
  seqinfo: 7 sequences from an unspecified genome; no seqlengths

 

> sessionInfo() 
R version 3.1.1 (2014-07-10)
Platform: x86_64-apple-darwin13.1.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] Biostrings_2.34.0      XVector_0.6.0          BiocInstaller_1.16.0   GenomicFeatures_1.18.1 AnnotationDbi_1.28.0  
 [6] Biobase_2.26.0         GenomicRanges_1.18.1   GenomeInfoDb_1.2.2     IRanges_2.0.0          S4Vectors_0.4.0       
[11] BiocGenerics_0.12.0   

loaded via a namespace (and not attached):
 [1] base64enc_0.1-2         BatchJobs_1.4           BBmisc_1.7              BiocParallel_1.0.0      biomaRt_2.22.0         
 [6] bitops_1.0-6            brew_1.0-6              checkmate_1.5.0         codetools_0.2-9         colorspace_1.2-4       
[11] DBI_0.3.1               digest_0.6.4            fail_1.2                foreach_1.4.2           GenomicAlignments_1.2.0
[16] ggplot2_1.0.0           grid_3.1.1              gtable_0.1.2            iterators_1.0.7         MASS_7.3-35            
[21] munsell_0.4.2           plyr_1.8.1              proto_0.3-10            Rcpp_0.11.3             RCurl_1.95-4.3         
[26] reshape2_1.4            Rsamtools_1.18.1        RSQLite_1.0.0           rtracklayer_1.26.1      scales_0.2.4           
[31] sendmailR_1.2-1         stringr_0.6.2           tools_3.1.1             XML_3.98-1.1            zlibbioc_1.12.0

genomicranges • 3.4k views
ADD COMMENT
2
Entering edit mode
@valerie-obenchain-4275
Last seen 2.3 years ago
United States
Hi,
 
The 'x' gets mapped to the 'to'. To map gene ranges (instead of exons) 'x' should be a GRanges of genes.
 
genes <- genes(txdb)
 
Valerie
ADD COMMENT
0
Entering edit mode

I may have misunderstood what you want. In your example 'x' is transcripts and 'to' is the TAIR exons by transcript.

To map gene ranges to the TAIR exons 'x' should be a GRanges of genes and 'to' the TAIR exons. If instead you want to map transcripts to TAIR gene ranges then 'x' should be the transcripts ('GR' object above) and 'to' should be a GRangesList of genes.

ADD REPLY
0
Entering edit mode

Hi Valerie, thanks for your response. I checked the manual again and I think understand what you mean now--thanks again! However I see the local coordinates also includes the length for introns -- since the second exon start position does not continue after the first exon (i.e 284), but incorporates the intron length. I need the transcript local coordinates to be continuous without accounting for introns. I am also losing the metadata for transcript name and feature attribute. How can I achieve those things? here is what I tried:

>exon_ranges<-exons(txdb) # a GRanges object

>allgenes<-genes(txdb)# a GRanges object

>allgenes<-GRangesList(allgenes)

> mapCoords(exon_ranges,mygenes,elt.hits=TRUE, elt.loc=TRUE)

GRanges object with 16 ranges and 4 metadata columns:
       seqnames       ranges strand   | queryHits subjectHits       eltLoc   eltHits
          <Rle>    <IRanges>  <Rle>   | <integer>   <integer>    <IRanges> <integer>
   [1]     Chr1 [   1,  283]      +   |         1           1 [   1,  283]         1
   [2]     Chr1 [ 366,  646]      +   |         2           1 [ 366,  646]         1
   [3]     Chr1 [ 856,  975]      +   |         3           1 [ 856,  975]         1
   [4]     Chr1 [1076, 1465]      +   |         4           1 [1076, 1465]         1
   [5]     Chr1 [1544, 1696]      +   |         5           1 [1544, 1696]         1
   ...      ...          ...    ... ...       ...         ...          ...       ...
  [12]     Chr1 [3172, 3245]      -   |        12           1   [903, 976]         2
  [13]     Chr1 [3020, 3065]      -   |        13           1   [751, 796]         2
  [14]     Chr1 [2682, 2771]      -   |        14           1   [413, 502]         2
  [15]     Chr1 [2543, 2590]      -   |        15           1   [274, 321]         2
  [16]     Chr1 [2270, 2436]      -   |        16           1   [  1, 167]         2
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

 

ADD REPLY
0
Entering edit mode

update: I solved the intron interval issue (mapCoords(exon_ranges,transcripts)-- now the issue is only adding metadata to the results?

ADD REPLY
0
Entering edit mode

Hi,

merge() isn't supported on GRanges objects. Conceptually GRanges and GAlignments are vector-like objects of 1 dimension so only c() works on them. rbind(), cbind() and merge() typically only work on 2-D objects.

You can merge() on a DataFrame which is what the GRanges metadata is held in. Extract the metadata with mcols(GRanges) then you can merge, add columns with '$' or remove with '['. 

Valerie

ADD REPLY
0
Entering edit mode

Thanks so much for your help and patience--I got the metadata now!

ADD REPLY
0
Entering edit mode

Hi Valerie, I have another question--- I want to use pmapCoords instead of mapCoords to get one-to-one results for my comparison (exon_ranges,exonsBy(txdb,"tx"), however I can't make it to work even though my GRanges has the same number of ranges as those stored in the GRangesList. Could you suggest how can I get this?

>pmapCoords(exon_ranges,transcripts)
Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘pmapCoords’ for signature ‘"GRanges", "GRangesList"’

ADD REPLY
1
Entering edit mode

This has been added to GenomicRanges 1.19.5. Should be available via biocLite() Friday around noon PST or immediately in svn.

Valerie

ADD REPLY
0
Entering edit mode

A method for GRanges, GRangesList hasn't been implemented yet. Currently we only have

> library(GenomicAlignments)
> showMethods("pmapCoords")
Function: pmapCoords (package IRanges)
x="Ranges", to="GAlignments"

I'll put this on my TODO but can't get to it until next week. I'll post back here when it's done.

Valerie

ADD REPLY
0
Entering edit mode

Thank you, that would be great!

ADD REPLY

Login before adding your answer.

Traffic: 697 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6