annotatr with custom TxDb
Last seen 13 months ago

I am trying to use the annotatr with a TxDb generated from a ensembl GFF. The reason is that this particular annotation does not exist in Bioconductor (Rn5, ensgene). The issue is that there I can't find how to do it except saving the individual feature files (introns, exons, etc) and loading with read_annotations. Is there another way?

Here is how I am preparing the annotations:

txdb <- makeTxDbFromGFF("/mnt/fileserver/genomics/references/Rattus_norvegicus/Ensembl/Rnor_5.0/Annotation/Genes/genes.gtf")

introns <- intronicParts(txdb, = TRUE)
exons <- exonicParts(txdb, = TRUE)
fiveUTR <- unlist(fiveUTRsByTranscript(txdb)) 
threeUTR <- unlist(threeUTRsByTranscript(txdb))
intergenicRegions <- gaps(unlist(range(exonsBy(txdb, "gene"))))

This leads to an error:

annots <- c(

# Build the annotations (a single GRanges object)
annotations <- build_annotations(genome = 'Rnor_5.0', annotations = annots)
Error: ‘introns’ not in annotatr_cache

And when I try to set the cache manually, the mcols are not matching:

      "%s_custom_%s", "rn5", "introns"

      "%s_custom_%s", "rn5", "exons"

annots <- c(

# Build the annotations (a single GRanges object)
annotations <- build_annotations(genome = 'Rnor_5.0', annotations = annots)

dm_annotated = annotate_regions(
    regions = regions,
    annotations = annotations,
    ignore.strand = TRUE,
    quiet = FALSE

dm_annsum = summarize_annotations(
    annotated_regions = dm_annotated,
    quiet = TRUE)

GRanges object with 956 ranges and 4 metadata columns:
        seqnames              ranges strand |                   name     score
           <Rle>           <IRanges>  <Rle> |            <character> <numeric>
    [1]        X   55737246-55737271      - | ENSRNOG00000029663_1..  1000.000
    [2]       18   31745729-31745750      - | ENSRNOG00000013920_1..   614.745
    [3]       19   62927445-62927466      - | ENSRNOG00000015173_1..   380.954
    [4]       20     5493221-5493243      - | ENSRNOG00000000816_2..   310.303
    [5]        9   80969164-80969469      - | ENSRNOG00000014182_3..   279.199
    ...      ...                 ...    ... .                    ...       ...
  [952]        5 170222940-170223039      + | ENSRNOG00000016398_3..   3.34775
  [953]        1 267685135-267685234      - | ENSRNOG00000013967_4..   3.34606
  [954]       18   25057278-25057448      - | ENSRNOG00000029939_1..   3.34577
  [955]       16   81105363-81105772      + | ENSRNOG00000019504_1..   3.34391
  [956]        5 125986496-125986511      + | ENSRNOG00000005905_2..   3.34352
            thick                   annot
        <IRanges>               <GRanges>
    [1]  55737251   X:55687310-55946671:-
    [2]  31745741  18:31744045-31749035:-
    [3]  62927458  19:62925808-62928287:-
    [4]   5493225    20:5493099-5494097:-
    [5]  80969296   9:80968047-80970915:-
    ...       ...                     ...
  [952] 170223021 5:170217421-170228154:+
  [953] 267685153 1:267677296-267697763:-
  [954]  25057381  18:25032734-25060073:-
  [955]  81105667  16:81104945-81106112:+
  [956] 125986503 5:125985966-125991044:+
  seqinfo: 21 sequences from an unspecified genome; no seqlengths
dm_annsum = summarize_annotations(
    annotated_regions = dm_annotated,
    quiet = TRUE)

Error: `distinct()` must use existing variables.
✖ `annot.type` not found in `.data`.
sessionInfo( )
R version 4.0.5 (2021-03-31)                                                                
Platform: x86_64-pc-linux-gnu (64-bit)                                                      
Running under: Ubuntu 20.04.2 LTS                                                           

Matrix products: default                                                                    
BLAS:   /usr/lib/x86_64-linux-gnu/blas/          
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/     

 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C                    
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_US.UTF-8          
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_US.UTF-8          
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                       
 [9] LC_ADDRESS=C               LC_TELEPHONE=C                  

attached base packages:                                                                     
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base                                                                          

other attached packages:                                                                    
 [1] AnnotationHub_2.22.1   BiocFileCache_1.14.0   dbplyr_2.1.0          
 [4] GenomicFeatures_1.42.3 AnnotationDbi_1.52.0   Biobase_2.50.0                                                                                                                       
 [7] GenomicRanges_1.42.0   GenomeInfoDb_1.26.2    IRanges_2.24.1        
[10] S4Vectors_0.28.1       BiocGenerics_0.36.0    annotatr_1.16.0       

loaded via a namespace (and not attached):
 [1] MatrixGenerics_1.2.1          httr_1.4.2                    
 [3] regioneR_1.22.0               bit64_4.0.5                   
 [5] shiny_1.6.0                   assertthat_0.2.1             
 [7] interactiveDisplayBase_1.28.0 askpass_1.1                   
 [9] BiocManager_1.30.10           blob_1.2.1                    
[11] BSgenome_1.58.0               GenomeInfoDbData_1.2.4       
[13] Rsamtools_2.6.0               yaml_2.2.1                    
[15] progress_1.2.2                BiocVersion_3.12.0           
[17] lattice_0.20-41               pillar_1.5.1                 
[19] RSQLite_2.2.3                 glue_1.4.2                    
[21] digest_0.6.27                 promises_1.2.0.1             
[23] XVector_0.30.0                colorspace_2.0-0             
[25] plyr_1.8.6                    htmltools_0.5.1.1            
[27] httpuv_1.5.5                  Matrix_1.3-2                 
[29] XML_3.99-0.5                  pkgconfig_2.0.3              
[31] biomaRt_2.46.3                zlibbioc_1.36.0              
[33] purrr_0.3.4                   xtable_1.8-4                 
[35] scales_1.1.1                  later_1.1.0.1                
[37] BiocParallel_1.24.1           tibble_3.1.0                 
[39] openssl_1.4.3                 ggplot2_3.3.3      
[41] generics_0.1.0                ellipsis_0.3.1               
[43] withr_2.4.1                   cachem_1.0.4                 
[45] SummarizedExperiment_1.20.0   cli_2.3.1                     
[47] magrittr_2.0.1                crayon_1.4.1                 
[49] mime_0.10                     memoise_2.0.0                
[51] fansi_0.4.2                   xml2_1.3.2                    
[53] tools_4.0.5                   prettyunits_1.1.1            
[55] hms_1.0.0                     lifecycle_1.0.0              
[57] matrixStats_0.58.0            stringr_1.4.0                
[59] munsell_0.5.0                 DelayedArray_0.16.2          
[61] Biostrings_2.58.0             compiler_4.0.5               
[63] rlang_0.4.10                  grid_4.0.5                    
[65] RCurl_1.98-1.2                rstudioapi_0.13              
[67] rappdirs_0.3.3                bitops_1.0-6                 
[69] gtable_0.3.0                  DBI_1.1.1                     
[71] curl_4.3                      reshape2_1.4.4               
[73] R6_2.5.0                      GenomicAlignments_1.26.0     
[75] dplyr_1.0.5                   rtracklayer_1.50.0           
[77] fastmap_1.1.0                 bit_4.0.4                     
[79] utf8_1.1.4                    readr_1.4.0                   
[81] stringi_1.5.3                 Rcpp_1.0.6                    
[83] vctrs_0.3.6                   tidyselect_1.1.0
