Unexpceted behaviour of RiboProfiling aroundPromoter() function when processing multi-exon CDSs
0
0
Entering edit mode
d r ▴ 150
@d-r-5459
Last seen 6.7 years ago
Israel

Hello

I have been experimenting with the script in RiboProfiling pacakge vigenette and have noticed an unexpected beahviour of aroundPromoter().

In line 84 of the script we have the call:

oneBinRanges <- aroundPromoter(txdb, alnGRanges, percBestExpressed=0.001)

This call returns a GRanges covering  2 CDSs  (note that percBestExpressed=0.001 is a very stringent threshold compared to the default of 0.03, hence the small number of CDSs) , and the ranges covered are indeed around the translation start site.

However, when I re-ran this line after changing percBestExpressed to 0.01, I got among others the following ranges:

oneBinRanges[42:82]
GRanges object with 41 ranges and 2 metadata columns:
       seqnames             ranges strand |    values     idSeq
          <Rle>          <IRanges>  <Rle> | <numeric> <integer>
  2023     chr1 [8926540, 8926540]      - |         0     12858
  2023     chr1 [8926541, 8926541]      - |         0     12858
  2023     chr1 [8926542, 8926542]      - |         0     12858
  2023     chr1 [8926543, 8926543]      - |         0     12858
  2023     chr1 [8926544, 8926544]      - |         0     12858
   ...      ...                ...    ... .       ...       ...
  2023     chr1 [8926576, 8926576]      - |         0     12858
  2023     chr1 [8926577, 8926577]      - |         0     12858
  2023     chr1 [8926578, 8926578]      - |         0     12858
  2023     chr1 [8926579, 8926579]      - |         0     12858
  2023     chr1 [8926580, 8926580]      - |         0     12858
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths


This region fall in the boundary of exon 4 of UCSC gene uc001api.2 which is not a translation start site.

Furthermore, when I call readStartCov() I get these reuslts:

https://imgur.com/CNoa4FJ

Note that for length 29 there are three peaks, one of them downstream the translation start site.

Note that I run the script with the original TxDb and bam file indicate in the vignette. The only change I introduced was in the value of percBestExpressed.

I also got similar  results when running this code on another bam file and another TxDb. When I used the same bam and gff with plastid I get offsets that are more similar to what I expect (all peaks upstream the translation start site, mostly around -12 or -13).

Having looked at the source code of aroundPromoter() I think the problem is the lines:

#choose the longest transcript per gene (very rapid)

maxWidthTranscBestExpGene <- max(width(cdsByBestExpressed))

cdsByBestExprLongTransc <-

cdsByBestExpressed[width(cdsByBestExpressed) == maxWidthTranscBestExpGene, ]

 

From what I see, for multi-exon CDSs, running this code chooses the longest exon, not longest transcript.

Am I missing something?

 

Thanks in advance

Dolev Rahat

The Alexander Silberman Institute of Life Sciences,

Hebrew University of Jerusalem Israel

 sessionInfo()
R version 3.4.2 (2017-09-28)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.3 LTS

Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=he_IL.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=he_IL.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=he_IL.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=he_IL.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods  
[9] base     

other attached packages:
 [1] ggplot2_2.2.1              GenomicAlignments_1.12.2   SummarizedExperiment_1.6.5
 [4] DelayedArray_0.2.7         matrixStats_0.52.2         Rsamtools_1.28.0          
 [7] GenomicFeatures_1.28.5     AnnotationDbi_1.38.2       Biobase_2.36.2            
[10] GenomicRanges_1.28.6       GenomeInfoDb_1.12.3        RiboProfiling_1.6.0       
[13] Biostrings_2.44.2          XVector_0.16.0             IRanges_2.10.5            
[16] S4Vectors_0.14.7           BiocGenerics_0.22.1       

loaded via a namespace (and not attached):
 [1] httr_1.3.1                    AnnotationHub_2.8.2          
 [3] bit64_0.9-7                   splines_3.4.2                
 [5] gsubfn_0.6-6                  shiny_1.0.5                  
 [7] Formula_1.2-2                 interactiveDisplayBase_1.14.0
 [9] latticeExtra_0.6-28           RBGL_1.52.0                  
[11] blob_1.1.0                    BSgenome_1.44.2              
[13] GenomeInfoDbData_0.99.0       yaml_2.1.14                  
[15] RSQLite_2.0                   backports_1.1.1              
[17] lattice_0.20-35               biovizBase_1.24.0            
[19] chron_2.3-51                  digest_0.6.12                
[21] RColorBrewer_1.1-2            checkmate_1.8.4              
[23] colorspace_1.3-2              ggbio_1.24.1                 
[25] httpuv_1.3.5                  htmltools_0.3.6              
[27] Matrix_1.2-11                 plyr_1.8.4                   
[29] OrganismDbi_1.18.1            XML_3.98-1.9                 
[31] biomaRt_2.32.1                zlibbioc_1.22.0              
[33] xtable_1.8-2                  scales_0.5.0                 
[35] BiocParallel_1.10.1           htmlTable_1.9                
[37] tibble_1.3.4                  AnnotationFilter_1.0.0       
[39] sqldf_0.4-11                  nnet_7.3-12                  
[41] lazyeval_0.2.0                proto_1.0.0                  
[43] mime_0.5                      survival_2.41-3              
[45] magrittr_1.5                  memoise_1.1.0                
[47] GGally_1.3.2                  foreign_0.8-69               
[49] graph_1.54.0                  BiocInstaller_1.26.1         
[51] tools_3.4.2                   data.table_1.10.4-2          
[53] stringr_1.2.0                 munsell_0.4.3                
[55] cluster_2.0.6                 ensembldb_2.0.4              
[57] compiler_3.4.2                rlang_0.1.2                  
[59] grid_3.4.2                    RCurl_1.95-4.8               
[61] dichromat_2.0-0               VariantAnnotation_1.22.3     
[63] htmlwidgets_0.9               bitops_1.0-6                 
[65] base64enc_0.1-3               gtable_0.2.0                 
[67] curl_3.0                      DBI_0.7                      
[69] reshape_0.8.7                 R6_2.2.2                     
[71] reshape2_1.4.2                gridExtra_2.3                
[73] knitr_1.17                    rtracklayer_1.36.6           
[75] bit_1.1-12                    Hmisc_4.0-3                  
[77] ProtGenerics_1.8.0            stringi_1.1.5                
[79] Rcpp_0.12.13                  rpart_4.1-11                 
[81] acepack_1.4.1            

 

 

 

 

RiboProfiling • 928 views
ADD COMMENT

Login before adding your answer.

Traffic: 922 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6