Entering edit mode
António Miguel de Jesus Domingues
▴
510
@antonio-miguel-de-jesus-domingues-5182
Last seen 10 months ago
Germany
I am trying to use the annotatr
with a TxDb generated from a ensembl GFF. The reason is that this particular annotation does not exist in Bioconductor (Rn5, ensgene). The issue is that there I can't find how to do it except saving the individual feature files (introns, exons, etc) and loading with read_annotations
. Is there another way?
Here is how I am preparing the annotations:
txdb <- makeTxDbFromGFF("/mnt/fileserver/genomics/references/Rattus_norvegicus/Ensembl/Rnor_5.0/Annotation/Genes/genes.gtf")
introns <- intronicParts(txdb, linked.to.single.gene.only = TRUE)
exons <- exonicParts(txdb, linked.to.single.gene.only = TRUE)
fiveUTR <- unlist(fiveUTRsByTranscript(txdb))
threeUTR <- unlist(threeUTRsByTranscript(txdb))
intergenicRegions <- gaps(unlist(range(exonsBy(txdb, "gene"))))
This leads to an error:
annots <- c(
'introns',
'exons',
'fiveUTR',
'threeUTR',
'intergenicRegions'
)
# Build the annotations (a single GRanges object)
annotations <- build_annotations(genome = 'Rnor_5.0', annotations = annots)
Error: ‘introns’ not in annotatr_cache
And when I try to set the cache manually, the mcols
are not matching:
annotatr_cache$set(
sprintf(
"%s_custom_%s", "rn5", "introns"
),
introns
)
annotatr_cache$set(
sprintf(
"%s_custom_%s", "rn5", "exons"
),
exons
)
annots <- c(
'rn5_custom_introns',
'rn5_custom_exons'
)
# Build the annotations (a single GRanges object)
annotations <- build_annotations(genome = 'Rnor_5.0', annotations = annots)
dm_annotated = annotate_regions(
regions = regions,
annotations = annotations,
ignore.strand = TRUE,
quiet = FALSE
)
print(dm_annotated)
dm_annsum = summarize_annotations(
annotated_regions = dm_annotated,
quiet = TRUE)
print(dm_annsum)
GRanges object with 956 ranges and 4 metadata columns:
seqnames ranges strand | name score
<Rle> <IRanges> <Rle> | <character> <numeric>
[1] X 55737246-55737271 - | ENSRNOG00000029663_1.. 1000.000
[2] 18 31745729-31745750 - | ENSRNOG00000013920_1.. 614.745
[3] 19 62927445-62927466 - | ENSRNOG00000015173_1.. 380.954
[4] 20 5493221-5493243 - | ENSRNOG00000000816_2.. 310.303
[5] 9 80969164-80969469 - | ENSRNOG00000014182_3.. 279.199
... ... ... ... . ... ...
[952] 5 170222940-170223039 + | ENSRNOG00000016398_3.. 3.34775
[953] 1 267685135-267685234 - | ENSRNOG00000013967_4.. 3.34606
[954] 18 25057278-25057448 - | ENSRNOG00000029939_1.. 3.34577
[955] 16 81105363-81105772 + | ENSRNOG00000019504_1.. 3.34391
[956] 5 125986496-125986511 + | ENSRNOG00000005905_2.. 3.34352
thick annot
<IRanges> <GRanges>
[1] 55737251 X:55687310-55946671:-
[2] 31745741 18:31744045-31749035:-
[3] 62927458 19:62925808-62928287:-
[4] 5493225 20:5493099-5494097:-
[5] 80969296 9:80968047-80970915:-
... ... ...
[952] 170223021 5:170217421-170228154:+
[953] 267685153 1:267677296-267697763:-
[954] 25057381 18:25032734-25060073:-
[955] 81105667 16:81104945-81106112:+
[956] 125986503 5:125985966-125991044:+
-------
seqinfo: 21 sequences from an unspecified genome; no seqlengths
dm_annsum = summarize_annotations(
annotated_regions = dm_annotated,
quiet = TRUE)
Error: `distinct()` must use existing variables.
✖ `annot.type` not found in `.data`.
sessionInfo( )
R version 4.0.5 (2021-03-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] AnnotationHub_2.22.1 BiocFileCache_1.14.0 dbplyr_2.1.0
[4] GenomicFeatures_1.42.3 AnnotationDbi_1.52.0 Biobase_2.50.0
[7] GenomicRanges_1.42.0 GenomeInfoDb_1.26.2 IRanges_2.24.1
[10] S4Vectors_0.28.1 BiocGenerics_0.36.0 annotatr_1.16.0
loaded via a namespace (and not attached):
[1] MatrixGenerics_1.2.1 httr_1.4.2
[3] regioneR_1.22.0 bit64_4.0.5
[5] shiny_1.6.0 assertthat_0.2.1
[7] interactiveDisplayBase_1.28.0 askpass_1.1
[9] BiocManager_1.30.10 blob_1.2.1
[11] BSgenome_1.58.0 GenomeInfoDbData_1.2.4
[13] Rsamtools_2.6.0 yaml_2.2.1
[15] progress_1.2.2 BiocVersion_3.12.0
[17] lattice_0.20-41 pillar_1.5.1
[19] RSQLite_2.2.3 glue_1.4.2
[21] digest_0.6.27 promises_1.2.0.1
[23] XVector_0.30.0 colorspace_2.0-0
[25] plyr_1.8.6 htmltools_0.5.1.1
[27] httpuv_1.5.5 Matrix_1.3-2
[29] XML_3.99-0.5 pkgconfig_2.0.3
[31] biomaRt_2.46.3 zlibbioc_1.36.0
[33] purrr_0.3.4 xtable_1.8-4
[35] scales_1.1.1 later_1.1.0.1
[37] BiocParallel_1.24.1 tibble_3.1.0
[39] openssl_1.4.3 ggplot2_3.3.3
[41] generics_0.1.0 ellipsis_0.3.1
[43] withr_2.4.1 cachem_1.0.4
[45] SummarizedExperiment_1.20.0 cli_2.3.1
[47] magrittr_2.0.1 crayon_1.4.1
[49] mime_0.10 memoise_2.0.0
[51] fansi_0.4.2 xml2_1.3.2
[53] tools_4.0.5 prettyunits_1.1.1
[55] hms_1.0.0 lifecycle_1.0.0
[57] matrixStats_0.58.0 stringr_1.4.0
[59] munsell_0.5.0 DelayedArray_0.16.2
[61] Biostrings_2.58.0 compiler_4.0.5
[63] rlang_0.4.10 grid_4.0.5
[65] RCurl_1.98-1.2 rstudioapi_0.13
[67] rappdirs_0.3.3 bitops_1.0-6
[69] gtable_0.3.0 DBI_1.1.1
[71] curl_4.3 reshape2_1.4.4
[73] R6_2.5.0 GenomicAlignments_1.26.0
[75] dplyr_1.0.5 rtracklayer_1.50.0
[77] fastmap_1.1.0 bit_4.0.4
[79] utf8_1.1.4 readr_1.4.0
[81] stringi_1.5.3 Rcpp_1.0.6
[83] vctrs_0.3.6 tidyselect_1.1.0