Question: mapFromTranscripts strand problem
0
gravatar for lorenzo.calviello
3.5 years ago by
lorenzo.calviello0 wrote:

Hi Everybody,
I am using the mapFromTranscripts function to have genomic coordinate of transcript elements, which is absolutely great !
However, I am having some trouble with one specific transcript (same coordinates and ID of an annotated transcript in Gencode23):

library("GenomicFeatures")
tx_pos<-GRanges(seqnames="ENST00000417659.1",ranges=IRanges(start=278,end=286),strand="+")
tx_pos
GRanges object with 1 range and 0 metadata columns:
               seqnames     ranges strand
                  <Rle>  <IRanges>  <Rle>
  [1] ENST00000417659.1 [278, 286]      +
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

As you see, my position of interest starts at 278 and ends at 286, on the plus strand (of the transcript).

Here I define the exons of this transcript (which map on the minus strand):

exon1<-GRanges(seqnames=1,ranges=IRanges(start=764723,end=764925),strand="-",exon_rank=1)
exon2<-GRanges(seqnames=1,ranges=IRanges(start=759032,end=759123),strand="-",exon_rank=2)
exons<-GRangesList(c(exon1,exon2))
names(exons)<-seqnames(tx_pos)

exons
GRangesList object of length 1:
$ENST00000417659.1  
GRanges object with 2 ranges and 1 metadata column:
      seqnames           ranges strand | exon_rank
         <Rle>        <IRanges>  <Rle> | <numeric>
  [1]        1 [764723, 764925]      - |         1
  [2]        1 [759032, 759123]      - |         2

-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths

Now I would like to have the genomic coordinates of my tx_pos element:

As expected, when we take the strand into account with ignore.strand=F in the mapFromTranscripts function, we do not see any result (gene on the minus strand and element on plus strand of the transcript).

mapFromTranscripts(x = tx_pos,transcripts = exons,ignore.strand=F)
GRanges object with 0 ranges and 2 metadata columns:
   seqnames    ranges strand |     xHits transcriptsHits
      <Rle> <IRanges>  <Rle> | <integer>       <integer>
  -------
  seqinfo: 2 sequences from an unspecified genome; no seqlengths

When we put ignore.strand=T now we do have an output:

mapFromTranscripts(x = tx_pos,transcripts = exons,ignore.strand=T)
GRanges object with 1 range and 2 metadata columns:
      seqnames           ranges strand |     xHits transcriptsHits
         <Rle>        <IRanges>  <Rle> | <integer>       <integer>
  [1]        1 [764908, 764916]      * |         1               1
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

but unfortunately is not what I expected, as this position actually does not map to any of my exons.
My transcript element lies in the second exon (as the first one is 203 nt long), thus what I expect as starting genomic position is 759123-(278-203-1) = 759049

If I put the strand of my tx_pos element to be "*" and ignore.strand=F I the function outputs the correct coordinates.

strand(tx_pos)<-"*"
mapFromTranscripts(x = tx_pos,transcripts = exons,ignore.strand=F)
GRanges object with 1 range and 2 metadata columns:
      seqnames           ranges strand |     xHits transcriptsHits
         <Rle>        <IRanges>  <Rle> | <integer>       <integer>
  [1]        1 [759041, 759049]      - |         1               1
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

while the ignore.strand=T still gives me the wrong coordinates:

mapFromTranscripts(x = tx_pos,transcripts = exons,ignore.strand=T)
GRanges object with 1 range and 2 metadata columns:
      seqnames           ranges strand |     xHits transcriptsHits
         <Rle>        <IRanges>  <Rle> | <integer>       <integer>
  [1]        1 [764908, 764916]      * |         1               1
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

 

Am I doing something very wrong which I can't see ? In the help page it was not clear to me how we should set the strand of the transcript element (so I guess it's set to "*").

Should we always put the strand coordinates on the transcript to "*" and  ignore.strand=F to have the correct genomic coordinates of a transcript element?

If that's the case I hope other confused researchers can look at this and clear their minds a bit : )

Thank you so much for the great work, it is really making my life so much easier !

sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS release 6.5 (Final)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] GenomicFeatures_1.22.13 AnnotationDbi_1.32.3    Biobase_2.30.0          GenomicRanges_1.22.4   
[5] GenomeInfoDb_1.6.3      IRanges_2.4.8           S4Vectors_0.8.11        BiocGenerics_0.16.1    

loaded via a namespace (and not attached):
 [1] XVector_0.10.0             zlibbioc_1.16.0            GenomicAlignments_1.6.3   
 [4] BiocParallel_1.4.3         tools_3.2.2                SummarizedExperiment_1.0.2
 [7] DBI_0.3.1                  lambda.r_1.1.7             futile.logger_1.4.1       
[10] rtracklayer_1.30.4         futile.options_1.0.0       bitops_1.0-6              
[13] RCurl_1.95-4.8             biomaRt_2.26.1             RSQLite_1.0.0             
[16] Biostrings_2.38.4          Rsamtools_1.22.0           XML_3.98-1.4   

 

ADD COMMENTlink modified 3.5 years ago by Valerie Obenchain6.7k • written 3.5 years ago by lorenzo.calviello0

Thanks for the report. I'll have a look.

Valerie

ADD REPLYlink written 3.5 years ago by Valerie Obenchain6.7k
Answer: mapFromTranscripts strand problem
0
gravatar for Valerie Obenchain
3.5 years ago by
United States
Valerie Obenchain6.7k wrote:

Hi,

I think you've got your exons reversed, i.e., it's the exon with width 92 that comes first when ignore.strand=TRUE.

When ignore.strand=TRUE the results are the same as if both 'x' and 'transcripts' were on the positive strand. mapFromTranscripts() concatenates the ranges in 'transcripts' as if they were touching so a visual would be something like this:

759032 --------> 759123 764723 --------> 764925

To find the location of tx_pos we need to move 278 from the transcription start; on the positive strand this is on the left at the lowest number 759032. Because the width of the first exon is only 92 we must go into the second exon:

start(second exon) + (start(tx_pos) - width(first exon) -1)

> 764723 + (278 - 92 - 1)
[1] 764908

Your example where the strand of tx_pos is '*', exons are '-' and ignore.strand=FALSE is returning the expected coordinates for a negative strand match (transcription starts at the right with the largest value). This is the same result you'd get if both tx_pos and exons were '-'. Note the strand of the result will always match the strand of 'exons'. When ignore.strand=TRUE the strand of the result is '*'.

Valerie

ADD COMMENTlink written 3.5 years ago by Valerie Obenchain6.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 294 users visited in the last hour