Question

mapFromTranscripts strand problem

0

Entering edit mode

lorenzo.calviello • 0

@lorenzocalviello-10122

Last seen 7.8 years ago

Hi Everybody,
I am using the mapFromTranscripts function to have genomic coordinate of transcript elements, which is absolutely great !
However, I am having some trouble with one specific transcript (same coordinates and ID of an annotated transcript in Gencode23):

library("GenomicFeatures")
tx_pos<-GRanges(seqnames="ENST00000417659.1",ranges=IRanges(start=278,end=286),strand="+")
tx_pos
GRanges object with 1 range and 0 metadata columns:
               seqnames     ranges strand
                  <Rle>  <IRanges>  <Rle>
  [1] ENST00000417659.1 [278, 286]      +
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

As you see, my position of interest starts at 278 and ends at 286, on the plus strand (of the transcript).

Here I define the exons of this transcript (which map on the minus strand):

exon1<-GRanges(seqnames=1,ranges=IRanges(start=764723,end=764925),strand="-",exon_rank=1)
exon2<-GRanges(seqnames=1,ranges=IRanges(start=759032,end=759123),strand="-",exon_rank=2)
exons<-GRangesList(c(exon1,exon2))
names(exons)<-seqnames(tx_pos)

exons
GRangesList object of length 1:
$ENST00000417659.1  
GRanges object with 2 ranges and 1 metadata column:
      seqnames           ranges strand | exon_rank
         <Rle>        <IRanges>  <Rle> | <numeric>
  [1]        1 [764723, 764925]      - |         1
  [2]        1 [759032, 759123]      - |         2

-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths

Now I would like to have the genomic coordinates of my tx_pos element:

As expected, when we take the strand into account with ignore.strand=F in the mapFromTranscripts function, we do not see any result (gene on the minus strand and element on plus strand of the transcript).

mapFromTranscripts(x = tx_pos,transcripts = exons,ignore.strand=F)
GRanges object with 0 ranges and 2 metadata columns:
   seqnames    ranges strand |     xHits transcriptsHits
      <Rle> <IRanges>  <Rle> | <integer>       <integer>
  -------
  seqinfo: 2 sequences from an unspecified genome; no seqlengths

When we put ignore.strand=T now we do have an output:

mapFromTranscripts(x = tx_pos,transcripts = exons,ignore.strand=T)
GRanges object with 1 range and 2 metadata columns:
      seqnames           ranges strand |     xHits transcriptsHits
         <Rle>        <IRanges>  <Rle> | <integer>       <integer>
  [1]        1 [764908, 764916]      * |         1               1
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

but unfortunately is not what I expected, as this position actually does not map to any of my exons.
My transcript element lies in the second exon (as the first one is 203 nt long), thus what I expect as starting genomic position is 759123-(278-203-1) = 759049

If I put the strand of my tx_pos element to be "*" and ignore.strand=F I the function outputs the correct coordinates.

strand(tx_pos)<-"*"
mapFromTranscripts(x = tx_pos,transcripts = exons,ignore.strand=F)
GRanges object with 1 range and 2 metadata columns:
      seqnames           ranges strand |     xHits transcriptsHits
         <Rle>        <IRanges>  <Rle> | <integer>       <integer>
  [1]        1 [759041, 759049]      - |         1               1
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

while the ignore.strand=T still gives me the wrong coordinates:

mapFromTranscripts(x = tx_pos,transcripts = exons,ignore.strand=T)
GRanges object with 1 range and 2 metadata columns:
      seqnames           ranges strand |     xHits transcriptsHits
         <Rle>        <IRanges>  <Rle> | <integer>       <integer>
  [1]        1 [764908, 764916]      * |         1               1
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

Am I doing something very wrong which I can't see ? In the help page it was not clear to me how we should set the strand of the transcript element (so I guess it's set to "*").

Should we always put the strand coordinates on the transcript to "*" and ignore.strand=F to have the correct genomic coordinates of a transcript element?

If that's the case I hope other confused researchers can look at this and clear their minds a bit : )

Thank you so much for the great work, it is really making my life so much easier !

sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS release 6.5 (Final)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] GenomicFeatures_1.22.13 AnnotationDbi_1.32.3    Biobase_2.30.0          GenomicRanges_1.22.4   
[5] GenomeInfoDb_1.6.3      IRanges_2.4.8           S4Vectors_0.8.11        BiocGenerics_0.16.1    

loaded via a namespace (and not attached):
 [1] XVector_0.10.0             zlibbioc_1.16.0            GenomicAlignments_1.6.3   
 [4] BiocParallel_1.4.3         tools_3.2.2                SummarizedExperiment_1.0.2
 [7] DBI_0.3.1                  lambda.r_1.1.7             futile.logger_1.4.1       
[10] rtracklayer_1.30.4         futile.options_1.0.0       bitops_1.0-6              
[13] RCurl_1.95-4.8             biomaRt_2.26.1             RSQLite_1.0.0             
[16] Biostrings_2.38.4          Rsamtools_1.22.0           XML_3.98-1.4

mapFromTranscripts genomicfeatures • 1.4k views

ADD COMMENT • link updated 8.0 years ago by Valerie Obenchain ★ 6.8k • written 8.0 years ago by lorenzo.calviello • 0

0

Entering edit mode

Thanks for the report. I'll have a look.

Valerie

ADD REPLY • link 8.0 years ago Valerie Obenchain ★ 6.8k

score 0 · Answer 1 · 2016-04-19

Hi,

I think you've got your exons reversed, i.e., it's the exon with width 92 that comes first when ignore.strand=TRUE.

When ignore.strand=TRUE the results are the same as if both 'x' and 'transcripts' were on the positive strand. mapFromTranscripts() concatenates the ranges in 'transcripts' as if they were touching so a visual would be something like this:

759032 --------> 759123 764723 --------> 764925

To find the location of tx_pos we need to move 278 from the transcription start; on the positive strand this is on the left at the lowest number 759032. Because the width of the first exon is only 92 we must go into the second exon:

start(second exon) + (start(tx_pos) - width(first exon) -1)

> 764723 + (278 - 92 - 1) [1] 764908

Your example where the strand of tx_pos is '*', exons are '-' and ignore.strand=FALSE is returning the expected coordinates for a negative strand match (transcription starts at the right with the largest value). This is the same result you'd get if both tx_pos and exons were '-'. Note the strand of the result will always match the strand of 'exons'. When ignore.strand=TRUE the strand of the result is '*'.

Valerie