How to build a GRangesList where each GRanges element is a CDS coordinate of gene transcripts?
1
0
Entering edit mode
Pratik Mehta ▴ 10
@0512b16f
Last seen 8 days ago
United States

Hello Bioconductor Community,

I posted this on Biostars originally, but removed it from there and posting it here now.

How do I build a GRangesList where each GRanges element is a CDS coordinate of gene transcripts? Basically, I am trying to overlap CDS coordinates from a TxDb object to CpG Loci from a GRanges object and make sure these CDS coordinates are grouped by gene transcripts.

The reproducible data is within the SesameData package used in the sesame package.


I am trying to create a txns GRangesList similar to the one below (txns.reference):

library(sesameData)
genomeInfo.mm10 <- sesameData::sesameDataGet('genomeInfo.mm10')
txns.reference <- genomeInfo.mm10$txns

I am trying to do this for the mm39 assembly, but for the sake of providing a reproducible example, I'll only include an mm10 working example.

This is how far I have gotten:

MM285.mm10.manifest <- sesameData::sesameDataGet('MM285.mm10.manifest')
mm10.txdb <- GenomicFeatures::makeTxDbFromEnsembl(organism = "Mus musculus", release = 102)
seqlevelsStyle(mm10.txdb) <- "UCSC"
txns.reproducible.example <- cdsByOverlaps(x = mm10.txdb, ranges =  MM285.mm10.manifest, columns = c("CDSSTART","CDSEND"))

The txns.reproducible.example is a GRanges object not a GRangesList, and it does not contain NAMES of the gene transcripts as txns.refernce does. I have tried many ways, but no success yet.

> sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 11.5.2

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics 
[5] grDevices utils     datasets  methods  
[9] base     

other attached packages:
 [1] GenomicFeatures_1.44.2
 [2] AnnotationDbi_1.54.1  
 [3] Biobase_2.52.0        
 [4] sesameData_1.11.3     
 [5] rmarkdown_2.10        
 [6] ExperimentHub_2.0.0   
 [7] AnnotationHub_3.0.1   
 [8] BiocFileCache_2.0.0   
 [9] dbplyr_2.1.1          
[10] GenomicRanges_1.44.0  
[11] GenomeInfoDb_1.28.4   
[12] IRanges_2.26.0        
[13] S4Vectors_0.30.0      
[14] BiocGenerics_0.38.0   

loaded via a namespace (and not attached):
  [1] matrixStats_0.60.1           
  [2] bitops_1.0-7                 
  [3] bit64_4.0.5                  
  [4] filelock_1.0.2               
  [5] webshot_0.5.2                
  [6] RColorBrewer_1.1-2           
  [7] progress_1.2.2               
  [8] httr_1.4.2                   
  [9] tools_4.1.1                  
 [10] utf8_1.2.2                   
 [11] R6_2.5.1                     
 [12] DBI_1.1.1                    
 [13] lazyeval_0.2.2               
 [14] colorspace_2.0-2             
 [15] tidyselect_1.1.1             
 [16] gridExtra_2.3                
 [17] prettyunits_1.1.1            
 [18] bit_4.0.4                    
 [19] curl_4.3.2                   
 [20] compiler_4.1.1               
 [21] TSP_1.1-10                   
 [22] xml2_1.3.2                   
 [23] DelayedArray_0.18.0          
 [24] plotly_4.9.4.1               
 [25] rtracklayer_1.52.1           
 [26] scales_1.1.1                 
 [27] rappdirs_0.3.3               
 [28] stringr_1.4.0                
 [29] digest_0.6.27                
 [30] Rsamtools_2.8.0              
 [31] XVector_0.32.0               
 [32] pkgconfig_2.0.3              
 [33] htmltools_0.5.2              
 [34] MatrixGenerics_1.4.3         
 [35] fastmap_1.1.0                
 [36] htmlwidgets_1.5.4            
 [37] rlang_0.4.11                 
 [38] rstudioapi_0.13              
 [39] RSQLite_2.2.8                
 [40] shiny_1.6.0                  
 [41] BiocIO_1.2.0                 
 [42] generics_0.1.0               
 [43] jsonlite_1.7.2               
 [44] BiocParallel_1.26.2          
 [45] dendextend_1.15.1            
 [46] dplyr_1.0.7                  
 [47] RCurl_1.98-1.4               
 [48] magrittr_2.0.1               
 [49] GenomeInfoDbData_1.2.6       
 [50] patchwork_1.1.1              
 [51] Matrix_1.3-4                 
 [52] Rcpp_1.0.7                   
 [53] munsell_0.5.0                
 [54] fansi_0.5.0                  
 [55] viridis_0.6.1                
 [56] lifecycle_1.0.0              
 [57] stringi_1.7.4                
 [58] yaml_2.2.1                   
 [59] SummarizedExperiment_1.22.0  
 [60] zlibbioc_1.38.0              
 [61] grid_4.1.1                   
 [62] blob_1.2.2                   
 [63] promises_1.2.0.1             
 [64] crayon_1.4.1                 
 [65] lattice_0.20-44              
 [66] Biostrings_2.60.2            
 [67] hms_1.1.0                    
 [68] KEGGREST_1.32.0              
 [69] knitr_1.34                   
 [70] pillar_1.6.2                 
 [71] rjson_0.2.20                 
 [72] codetools_0.2-18             
 [73] biomaRt_2.48.3               
 [74] BiocVersion_3.13.1           
 [75] XML_3.99-0.7                 
 [76] glue_1.4.2                   
 [77] evaluate_0.14                
 [78] data.table_1.14.0            
 [79] BiocManager_1.30.16          
 [80] httpuv_1.6.3                 
 [81] png_0.1-7                    
 [82] vctrs_0.3.8                  
 [83] foreach_1.5.1                
 [84] gtable_0.3.0                 
 [85] purrr_0.3.4                  
 [86] tidyr_1.1.3                  
 [87] heatmaply_1.2.1              
 [88] assertthat_0.2.1             
 [89] cachem_1.0.6                 
 [90] ggplot2_3.3.5                
 [91] xfun_0.25                    
 [92] mime_0.11                    
 [93] xtable_1.8-4                 
 [94] restfulr_0.0.13              
 [95] later_1.3.0                  
 [96] viridisLite_0.4.0            
 [97] seriation_1.3.0              
 [98] tibble_3.1.4                 
 [99] iterators_1.0.13             
[100] GenomicAlignments_1.28.0     
[101] registry_0.5-1               
[102] memoise_2.0.0                
[103] interactiveDisplayBase_1.30.0
[104] ellipsis_0.3.2               

I would appreciate help from anyone. Thank you in advance!

-Pratik

GenomicRanges sesame sesameData GenomicFeatures • 167 views
ADD COMMENT
4
Entering edit mode
@vincent-j-carey-jr-4
Last seen 1 day ago
United States

Is this moving in the direction of what you'd like?

seqlevelsStyle(mm10.txdb) = "UCSC"
txns.reproducible.example <- cdsByOverlaps(x = mm10.txdb, 
   ranges =  MM285.mm10.manifest, columns = c("CDSSTART","CDSEND", "TXNAME", "CDSNAME"))
zz = split(txns.reproducible.example, unlist( txns.reproducible.example$TXNAME))
1
Entering edit mode

Thank you very much! This is perfect, specifically this line here:

zz = split(txns.reproducible.example, unlist( txns.reproducible.example$TXNAME))

So I realized that the txns.reference was most likely created by cds() rather than cdsOverlaps(). Regardless you brought me all the way. : ) Thank you

For reference if anyone needs this in the future, this accomplished what I needed to do:

MM285.mm10.manifest <- sesameData::sesameDataGet('MM285.mm10.manifest')
mm10.txdb <- GenomicFeatures::makeTxDbFromEnsembl(organism = "Mus musculus", release = 102)
seqlevelsStyle(mm10.txdb) = "UCSC"
txns.reproducible.example <- GenomicFeatures::cds(x = mm10.txdb,  columns = c("CDSSTART","CDSEND", "TXNAME"))
txns = split(txns.reproducible.example, unlist(txns.reproducible.example$TXNAME))

mcols(txns, level="within")[, "cdsStart"] <- mcols(txns, level="within")[, "CDSSTART"]
mcols(txns, level="within")[, "cdsEnd"] <- mcols(txns, level="within")[, "CDSEND"]
txns <- txns[, c("cdsStart", "cdsEnd")]
ADD REPLY

Login before adding your answer.

Traffic: 152 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6