Search
Question: mapToTranscripts stranded behavior
0
gravatar for Michael Love
3.5 years ago by
Michael Love20k
United States
Michael Love20k wrote:

I've been playing with mapToTranscripts(), which is a great addition BTW. It's already useful for me.

It would be nice to give a little more description in the help file on the behavior regarding the positions in the returned object and the strand of 'transcripts'. I scanned the man page and found:

"Transcriptome-based coordinates start counting at 1 at the
beginning of the ‘transcripts’ range and return positions
where ‘x’ was aligned."

But from this it was not totally obvious that 1 starts from the left regardless of the strand. E.g.:

transcripts <- GRangesList(foo=GRanges("1", IRanges(c(101,201),width=50), strand="-"))
x <- GRanges("1", IRanges(248,width=1))
mapToTranscripts(x, transcripts)
 GRanges object with 1 range and 2 metadata columns:
       seqnames    ranges strand |     xHits transcriptsHits
          <Rle> <IRanges>  <Rle> | <integer>       <integer>
   [1]      foo  [98, 98]      - |         1               1
   -------
   seqinfo: 1 sequence from an unspecified genome; no seqlengths

I believe this is not the same behavior as getSeq() for example, which would consider the rightmost position of a negative strand GRanges(List) to be position 1 in the returned DNAStringSet(List).

just to be clear, I'm not suggesting a change in behavior, just a bit more description in the man page.

 

R version 3.2.0 (2015-04-16)
 Platform: x86_64-apple-darwin13.4.0 (64-bit)
 Running under: OS X 10.10.3 (Yosemite)

 locale:
 [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

 attached base packages:
  [1] splines   stats4    parallel  stats     graphics  grDevices datasets  utils     methods
 [10] base

 other attached packages:
  [1] alpine_0.1                              BiocParallel_1.2.1
  [3] speedglm_0.2-1.0                        Matrix_1.2-0
  [5] TxDb.Hsapiens.UCSC.hg19.knownGene_3.1.2 GenomicFeatures_1.20.1
  [7] AnnotationDbi_1.30.1                    Biobase_2.28.0
  [9] BSgenome.Hsapiens.UCSC.hg19_1.4.0       BSgenome_1.36.0
 [11] rtracklayer_1.28.2                      GenomicAlignments_1.4.1
 [13] Rsamtools_1.20.1                        Biostrings_2.36.1
 [15] XVector_0.8.0                           GenomicRanges_1.20.3
 [17] GenomeInfoDb_1.4.0                      IRanges_2.2.1
 [19] S4Vectors_0.6.0                         BiocGenerics_0.14.0
 [21] testthat_0.9.1                          devtools_1.8.0
 [23] knitr_1.10.5                            BiocInstaller_1.18.2

 loaded via a namespace (and not attached):
  [1] Rcpp_0.11.6          compiler_3.2.0       git2r_0.10.1         futile.logger_1.4.1
  [5] bitops_1.0-6         futile.options_1.0.0 tools_3.2.0          zlibbioc_1.14.0
  [9] biomaRt_2.24.0       digest_0.6.8         memoise_0.2.1        RSQLite_1.0.0
 [13] lattice_0.20-31      DBI_0.3.1            stringr_1.0.0        roxygen2_4.1.1
 [17] rversions_1.0.0      grid_3.2.0           XML_3.98-1.1         magrittr_1.5
 [21] lambda.r_1.1.7       stringi_0.4-1        RCurl_1.95-4.6
ADD COMMENTlink modified 3.5 years ago by Valerie Obenchain ♦♦ 6.6k • written 3.5 years ago by Michael Love20k
2
gravatar for Valerie Obenchain
3.5 years ago by
Valerie Obenchain ♦♦ 6.6k
United States
Valerie Obenchain ♦♦ 6.6k wrote:

Hi Mike,

'ignore.strand' is TRUE by default in these mapping functions. They do report position wrt strand the same as getSeq().

> mapToTranscripts(x, transcripts)
GRanges object with 1 range and 2 metadata columns:
      seqnames    ranges strand |     xHits transcriptsHits
         <Rle> <IRanges>  <Rle> | <integer>       <integer>
  [1]      foo  [98, 98]      - |         1               1
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
> mapToTranscripts(x, transcripts, ignore.strand=FALSE)
GRanges object with 1 range and 2 metadata columns:
      seqnames    ranges strand |     xHits transcriptsHits
         <Rle> <IRanges>  <Rle> | <integer>       <integer>
  [1]      foo    [3, 3]      - |         1               1
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

We chose the default as FALSE because strand appeared to be less important (common) in RNA Seq analysis. I've added an example emphasizing the strand and a 'Note' to the arg section. Hopefully this is more clear.

Valerie

ADD COMMENTlink written 3.5 years ago by Valerie Obenchain ♦♦ 6.6k

thanks!

bringing up getSeq, I just meant that getSeq,BSgenome on a GRanges(List) does not ignore strand by default.

 

ADD REPLYlink written 3.5 years ago by Michael Love20k

Hi again,

After talking with Herve, we thought the default for 'ignore.strand' should be changed to FALSE for the coordinate mapping functions. This would make them consistent with the (many) other functions in GenomicRanges and GenomicAlignments. While there are reasons for having the default be TRUE, at this point in the game it is probably just confusing. I've made the change in GenomicFeatures 1.21.9.

We also thought it's probably more useful to report position starting from the TSS regardless of strand.

x <- GRanges("chr3", IRanges(9, 9), strand="+")
transcripts <- 
    GRangesList(tx1=GRanges("chr3", IRanges(3, 10), strand="-"))

Returns an empty GRanges object.

>   mapToTranscripts(x, transcripts, ignore.strand=FALSE)
GRanges object with 0 ranges and 2 metadata columns:
   seqnames    ranges strand |     xHits transcriptsHits
      <Rle> <IRanges>  <Rle> | <integer>       <integer>
  -------
  seqinfo: no sequences


Currently mapToTranscripts returns 1 range at position 7 because it counts from TES instead of TSS. The proposed change would return 1 range at position 2.

>   mapToTranscripts(x, transcripts, ignore.strand=TRUE) 
GRanges object with 1 range and 2 metadata columns:
      seqnames    ranges strand |     xHits transcriptsHits
         <Rle> <IRanges>  <Rle> | <integer>       <integer>
  [1]      tx1    [7, 7]      - |         1               1
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

 

Let me know if you have a preference for counting from TSS vs TES.

Thanks.

Valerie

 

ADD REPLYlink written 3.5 years ago by Valerie Obenchain ♦♦ 6.6k

This default (ignore.strand=FALSE) does make more sense to me. I typically want to know where am I in the transcript with respect to the TSS. And this way, mapped ranges will correspond to the same positions in the DNAStringSet.

thanks,

Mike

ADD REPLYlink written 3.5 years ago by Michael Love20k

Changes are in GenomicFeatures 1.21.11 and apply to all coordinate mapping functions (mapToTranscripts(), mapFromTranscripts(), pmapToTranscripts() and pmapFromTranscripts()).

Summary of changes:

  • 'ignore.strand' default is FALSE
  • mapped position is computed from TSS, always
  • 'ignore.strand' is used in overlap operations only and does not affect where counting (of mapped position) starts from


Thanks for bringing these issues up. Fixing them has made the code cleaner and more consistent.

Valerie

ADD REPLYlink written 3.5 years ago by Valerie Obenchain ♦♦ 6.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 396 users visited in the last hour