Hello Bioconductor Community,
I posted this on Biostars originally, but removed it from there and posting it here now.
How do I build a GRangesList where each GRanges element is a CDS coordinate of gene transcripts? Basically, I am trying to overlap CDS coordinates from a TxDb object to CpG Loci from a GRanges object and make sure these CDS coordinates are grouped by gene transcripts.
The reproducible data is within the SesameData
package used in the sesame
package.
I am trying to create a txns
GRangesList
similar to the one below (txns.reference
):
library(sesameData)
genomeInfo.mm10 <- sesameData::sesameDataGet('genomeInfo.mm10')
txns.reference <- genomeInfo.mm10$txns
I am trying to do this for the mm39 assembly, but for the sake of providing a reproducible example, I'll only include an mm10 working example.
This is how far I have gotten:
MM285.mm10.manifest <- sesameData::sesameDataGet('MM285.mm10.manifest')
mm10.txdb <- GenomicFeatures::makeTxDbFromEnsembl(organism = "Mus musculus", release = 102)
seqlevelsStyle(mm10.txdb) <- "UCSC"
txns.reproducible.example <- cdsByOverlaps(x = mm10.txdb, ranges = MM285.mm10.manifest, columns = c("CDSSTART","CDSEND"))
The txns.reproducible.example
is a GRanges object not a GRangesList, and it does not contain NAMES
of the gene transcripts as txns.refernce
does. I have tried many ways, but no success yet.
> sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 11.5.2
Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats4 parallel stats graphics
[5] grDevices utils datasets methods
[9] base
other attached packages:
[1] GenomicFeatures_1.44.2
[2] AnnotationDbi_1.54.1
[3] Biobase_2.52.0
[4] sesameData_1.11.3
[5] rmarkdown_2.10
[6] ExperimentHub_2.0.0
[7] AnnotationHub_3.0.1
[8] BiocFileCache_2.0.0
[9] dbplyr_2.1.1
[10] GenomicRanges_1.44.0
[11] GenomeInfoDb_1.28.4
[12] IRanges_2.26.0
[13] S4Vectors_0.30.0
[14] BiocGenerics_0.38.0
loaded via a namespace (and not attached):
[1] matrixStats_0.60.1
[2] bitops_1.0-7
[3] bit64_4.0.5
[4] filelock_1.0.2
[5] webshot_0.5.2
[6] RColorBrewer_1.1-2
[7] progress_1.2.2
[8] httr_1.4.2
[9] tools_4.1.1
[10] utf8_1.2.2
[11] R6_2.5.1
[12] DBI_1.1.1
[13] lazyeval_0.2.2
[14] colorspace_2.0-2
[15] tidyselect_1.1.1
[16] gridExtra_2.3
[17] prettyunits_1.1.1
[18] bit_4.0.4
[19] curl_4.3.2
[20] compiler_4.1.1
[21] TSP_1.1-10
[22] xml2_1.3.2
[23] DelayedArray_0.18.0
[24] plotly_4.9.4.1
[25] rtracklayer_1.52.1
[26] scales_1.1.1
[27] rappdirs_0.3.3
[28] stringr_1.4.0
[29] digest_0.6.27
[30] Rsamtools_2.8.0
[31] XVector_0.32.0
[32] pkgconfig_2.0.3
[33] htmltools_0.5.2
[34] MatrixGenerics_1.4.3
[35] fastmap_1.1.0
[36] htmlwidgets_1.5.4
[37] rlang_0.4.11
[38] rstudioapi_0.13
[39] RSQLite_2.2.8
[40] shiny_1.6.0
[41] BiocIO_1.2.0
[42] generics_0.1.0
[43] jsonlite_1.7.2
[44] BiocParallel_1.26.2
[45] dendextend_1.15.1
[46] dplyr_1.0.7
[47] RCurl_1.98-1.4
[48] magrittr_2.0.1
[49] GenomeInfoDbData_1.2.6
[50] patchwork_1.1.1
[51] Matrix_1.3-4
[52] Rcpp_1.0.7
[53] munsell_0.5.0
[54] fansi_0.5.0
[55] viridis_0.6.1
[56] lifecycle_1.0.0
[57] stringi_1.7.4
[58] yaml_2.2.1
[59] SummarizedExperiment_1.22.0
[60] zlibbioc_1.38.0
[61] grid_4.1.1
[62] blob_1.2.2
[63] promises_1.2.0.1
[64] crayon_1.4.1
[65] lattice_0.20-44
[66] Biostrings_2.60.2
[67] hms_1.1.0
[68] KEGGREST_1.32.0
[69] knitr_1.34
[70] pillar_1.6.2
[71] rjson_0.2.20
[72] codetools_0.2-18
[73] biomaRt_2.48.3
[74] BiocVersion_3.13.1
[75] XML_3.99-0.7
[76] glue_1.4.2
[77] evaluate_0.14
[78] data.table_1.14.0
[79] BiocManager_1.30.16
[80] httpuv_1.6.3
[81] png_0.1-7
[82] vctrs_0.3.8
[83] foreach_1.5.1
[84] gtable_0.3.0
[85] purrr_0.3.4
[86] tidyr_1.1.3
[87] heatmaply_1.2.1
[88] assertthat_0.2.1
[89] cachem_1.0.6
[90] ggplot2_3.3.5
[91] xfun_0.25
[92] mime_0.11
[93] xtable_1.8-4
[94] restfulr_0.0.13
[95] later_1.3.0
[96] viridisLite_0.4.0
[97] seriation_1.3.0
[98] tibble_3.1.4
[99] iterators_1.0.13
[100] GenomicAlignments_1.28.0
[101] registry_0.5-1
[102] memoise_2.0.0
[103] interactiveDisplayBase_1.30.0
[104] ellipsis_0.3.2
I would appreciate help from anyone. Thank you in advance!
-Pratik
Thank you very much! This is perfect, specifically this line here:
zz = split(txns.reproducible.example, unlist( txns.reproducible.example$TXNAME))
So I realized that the
txns.reference
was most likely created bycds()
rather thancdsOverlaps()
. Regardless you brought me all the way. : ) Thank youFor reference if anyone needs this in the future, this accomplished what I needed to do: