annotatr with custom TxDb
0
0
Entering edit mode
@antonio-miguel-de-jesus-domingues-5182
Last seen 11 weeks ago
Germany

I am trying to use the annotatr with a TxDb generated from a ensembl GFF. The reason is that this particular annotation does not exist in Bioconductor (Rn5, ensgene). The issue is that there I can't find how to do it except saving the individual feature files (introns, exons, etc) and loading with read_annotations. Is there another way?

Here is how I am preparing the annotations:

txdb <- makeTxDbFromGFF("/mnt/fileserver/genomics/references/Rattus_norvegicus/Ensembl/Rnor_5.0/Annotation/Genes/genes.gtf")

introns <- intronicParts(txdb, linked.to.single.gene.only = TRUE)
exons <- exonicParts(txdb, linked.to.single.gene.only = TRUE)
fiveUTR <- unlist(fiveUTRsByTranscript(txdb)) 
threeUTR <- unlist(threeUTRsByTranscript(txdb))
intergenicRegions <- gaps(unlist(range(exonsBy(txdb, "gene"))))

This leads to an error:

annots <- c(
   'introns',
   'exons',
   'fiveUTR',
   'threeUTR',
   'intergenicRegions'
)

# Build the annotations (a single GRanges object)
annotations <- build_annotations(genome = 'Rnor_5.0', annotations = annots)
Error: ‘introns’ not in annotatr_cache

And when I try to set the cache manually, the mcols are not matching:

annotatr_cache$set(
   sprintf(
      "%s_custom_%s", "rn5", "introns"
      ), 
   introns
)

annotatr_cache$set(
   sprintf(
      "%s_custom_%s", "rn5", "exons"
      ), 
   exons
)

annots <- c(
   'rn5_custom_introns',
   'rn5_custom_exons'
)

# Build the annotations (a single GRanges object)
annotations <- build_annotations(genome = 'Rnor_5.0', annotations = annots)

dm_annotated = annotate_regions(
    regions = regions,
    annotations = annotations,
    ignore.strand = TRUE,
    quiet = FALSE
    )
print(dm_annotated)

dm_annsum = summarize_annotations(
    annotated_regions = dm_annotated,
    quiet = TRUE)
print(dm_annsum)

GRanges object with 956 ranges and 4 metadata columns:
        seqnames              ranges strand |                   name     score
           <Rle>           <IRanges>  <Rle> |            <character> <numeric>
    [1]        X   55737246-55737271      - | ENSRNOG00000029663_1..  1000.000
    [2]       18   31745729-31745750      - | ENSRNOG00000013920_1..   614.745
    [3]       19   62927445-62927466      - | ENSRNOG00000015173_1..   380.954
    [4]       20     5493221-5493243      - | ENSRNOG00000000816_2..   310.303
    [5]        9   80969164-80969469      - | ENSRNOG00000014182_3..   279.199
    ...      ...                 ...    ... .                    ...       ...
  [952]        5 170222940-170223039      + | ENSRNOG00000016398_3..   3.34775
  [953]        1 267685135-267685234      - | ENSRNOG00000013967_4..   3.34606
  [954]       18   25057278-25057448      - | ENSRNOG00000029939_1..   3.34577
  [955]       16   81105363-81105772      + | ENSRNOG00000019504_1..   3.34391
  [956]        5 125986496-125986511      + | ENSRNOG00000005905_2..   3.34352
            thick                   annot
        <IRanges>               <GRanges>
    [1]  55737251   X:55687310-55946671:-
    [2]  31745741  18:31744045-31749035:-
    [3]  62927458  19:62925808-62928287:-
    [4]   5493225    20:5493099-5494097:-
    [5]  80969296   9:80968047-80970915:-
    ...       ...                     ...
  [952] 170223021 5:170217421-170228154:+
  [953] 267685153 1:267677296-267697763:-
  [954]  25057381  18:25032734-25060073:-
  [955]  81105667  16:81104945-81106112:+
  [956] 125986503 5:125985966-125991044:+
  -------
  seqinfo: 21 sequences from an unspecified genome; no seqlengths
dm_annsum = summarize_annotations(
    annotated_regions = dm_annotated,
    quiet = TRUE)

Error: `distinct()` must use existing variables.
✖ `annot.type` not found in `.data`.
sessionInfo( )
R version 4.0.5 (2021-03-31)                                                                
Platform: x86_64-pc-linux-gnu (64-bit)                                                      
Running under: Ubuntu 20.04.2 LTS                                                           

Matrix products: default                                                                    
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0          
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0     

locale:                                                                                     
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C                    
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_US.UTF-8          
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_US.UTF-8          
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                       
 [9] LC_ADDRESS=C               LC_TELEPHONE=C                  
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C              

attached base packages:                                                                     
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base                                                                          

other attached packages:                                                                    
 [1] AnnotationHub_2.22.1   BiocFileCache_1.14.0   dbplyr_2.1.0          
 [4] GenomicFeatures_1.42.3 AnnotationDbi_1.52.0   Biobase_2.50.0                                                                                                                       
 [7] GenomicRanges_1.42.0   GenomeInfoDb_1.26.2    IRanges_2.24.1        
[10] S4Vectors_0.28.1       BiocGenerics_0.36.0    annotatr_1.16.0       

loaded via a namespace (and not attached):
 [1] MatrixGenerics_1.2.1          httr_1.4.2                    
 [3] regioneR_1.22.0               bit64_4.0.5                   
 [5] shiny_1.6.0                   assertthat_0.2.1             
 [7] interactiveDisplayBase_1.28.0 askpass_1.1                   
 [9] BiocManager_1.30.10           blob_1.2.1                    
[11] BSgenome_1.58.0               GenomeInfoDbData_1.2.4       
[13] Rsamtools_2.6.0               yaml_2.2.1                    
[15] progress_1.2.2                BiocVersion_3.12.0           
[17] lattice_0.20-41               pillar_1.5.1                 
[19] RSQLite_2.2.3                 glue_1.4.2                    
[21] digest_0.6.27                 promises_1.2.0.1             
[23] XVector_0.30.0                colorspace_2.0-0             
[25] plyr_1.8.6                    htmltools_0.5.1.1            
[27] httpuv_1.5.5                  Matrix_1.3-2                 
[29] XML_3.99-0.5                  pkgconfig_2.0.3              
[31] biomaRt_2.46.3                zlibbioc_1.36.0              
[33] purrr_0.3.4                   xtable_1.8-4                 
[35] scales_1.1.1                  later_1.1.0.1                
[37] BiocParallel_1.24.1           tibble_3.1.0                 
[39] openssl_1.4.3                 ggplot2_3.3.3      
[41] generics_0.1.0                ellipsis_0.3.1               
[43] withr_2.4.1                   cachem_1.0.4                 
[45] SummarizedExperiment_1.20.0   cli_2.3.1                     
[47] magrittr_2.0.1                crayon_1.4.1                 
[49] mime_0.10                     memoise_2.0.0                
[51] fansi_0.4.2                   xml2_1.3.2                    
[53] tools_4.0.5                   prettyunits_1.1.1            
[55] hms_1.0.0                     lifecycle_1.0.0              
[57] matrixStats_0.58.0            stringr_1.4.0                
[59] munsell_0.5.0                 DelayedArray_0.16.2          
[61] Biostrings_2.58.0             compiler_4.0.5               
[63] rlang_0.4.10                  grid_4.0.5                    
[65] RCurl_1.98-1.2                rstudioapi_0.13              
[67] rappdirs_0.3.3                bitops_1.0-6                 
[69] gtable_0.3.0                  DBI_1.1.1                     
[71] curl_4.3                      reshape2_1.4.4               
[73] R6_2.5.0                      GenomicAlignments_1.26.0     
[75] dplyr_1.0.5                   rtracklayer_1.50.0           
[77] fastmap_1.1.0                 bit_4.0.4                     
[79] utf8_1.1.4                    readr_1.4.0                   
[81] stringi_1.5.3                 Rcpp_1.0.6                    
[83] vctrs_0.3.6                   tidyselect_1.1.0
GenomicFeatures annotatr genomicFe • 791 views

Login before adding your answer.

Traffic: 489 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6