branchpointer: trouble to read gtf
0
0
Entering edit mode
@d560df69
Last seen 3.2 years ago
Germany

Dears,

I have trouble to read gtf files with branchpointer::gtfToExons While the supplied example file (gencode.v26.annotation.small.gtf) can be read, my own gtf files or any change in the example file lead to "Error: subscript contains invalid names". E.g. keeping only the gene_id and transcript_id from the example file renders it unreadable. I suspect that gtfToExons relies on specific attributes in the group/attribute field but I cannot pinpoint which. I work with non-model organisms and can only provide transcript-exon information with non-public identifiers. Also, gff3 files cannot be read.

An example for a minimal gtf file which cannot be read is:

chr1    gmap    transcript      1       1000    .       +       .       transcript_id "tx1";
chr1    gmap    exon    100     900     .       +       0       transcript_id "tx1";

Any hint on how to construct my gtf files?

library(branchpointer)
exons <- gtfToExons("minimal.gtf")

Error: subscript contains invalid names

sessionInfo( ):
R version 4.1.0 (2021-05-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252   
[3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C                   
[5] LC_TIME=German_Germany.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] branchpointer_1.18.0 caret_6.0-88         ggplot2_3.3.5       
[4] lattice_0.20-44     

loaded via a namespace (and not attached):
  [1] nlme_3.1-153                      matrixStats_0.60.1               
  [3] bitops_1.0-7                      lubridate_1.7.10                 
  [5] bit64_4.0.5                       filelock_1.0.2                   
  [7] progress_1.2.2                    httr_1.4.2                       
  [9] GenomeInfoDb_1.28.4               tools_4.1.0                      
 [11] utf8_1.2.2                        R6_2.5.1                         
 [13] rpart_4.1-15                      DBI_1.1.1                        
 [15] BiocGenerics_0.38.0               colorspace_2.0-2                 
 [17] nnet_7.3-16                       withr_2.4.2                      
 [19] gbm_2.1.8                         tidyselect_1.1.1                 
 [21] prettyunits_1.1.1                 bit_4.0.4                        
 [23] curl_4.3.2                        compiler_4.1.0                   
 [25] Biobase_2.52.0                    xml2_1.3.2                       
 [27] DelayedArray_0.18.0               rtracklayer_1.52.1               
 [29] scales_1.1.1                      rappdirs_0.3.3                   
 [31] Rsamtools_2.8.0                   stringr_1.4.0                    
 [33] digest_0.6.27                     XVector_0.32.0                   
 [35] pkgconfig_2.0.3                   parallelly_1.28.1                
 [37] MatrixGenerics_1.4.3              BSgenome_1.60.0                  
 [39] dbplyr_2.1.1                      fastmap_1.1.0                    
 [41] rlang_0.4.11                      rstudioapi_0.13                  
 [43] RSQLite_2.2.8                     BiocIO_1.2.0                     
 [45] generics_0.1.0                    BiocParallel_1.26.2              
 [47] dplyr_1.0.7                       ModelMetrics_1.2.2.2             
 [49] RCurl_1.98-1.5                    magrittr_2.0.1                   
 [51] GenomeInfoDbData_1.2.6            Matrix_1.3-4                     
 [53] Rcpp_1.0.7                        munsell_0.5.0                    
 [55] S4Vectors_0.30.0                  fansi_0.5.0                      
 [57] lifecycle_1.0.0                   yaml_2.2.1                       
 [59] stringi_1.7.4                     pROC_1.18.0                      
 [61] SummarizedExperiment_1.22.0       MASS_7.3-54                      
 [63] zlibbioc_1.38.0                   plyr_1.8.6                       
 [65] recipes_0.1.16                    BiocFileCache_2.0.0              
 [67] grid_4.1.0                        blob_1.2.2                       
 [69] parallel_4.1.0                    listenv_0.8.0                    
 [71] crayon_1.4.1                      cowplot_1.1.1                    
 [73] Biostrings_2.60.2                 splines_4.1.0                    
 [75] hms_1.1.0                         KEGGREST_1.32.0                  
 [77] BSgenome.Hsapiens.UCSC.hg38_1.4.3 pillar_1.6.2                     
 [79] GenomicRanges_1.44.0              rjson_0.2.20                     
 [81] future.apply_1.8.1                reshape2_1.4.4                   
 [83] codetools_0.2-18                  biomaRt_2.48.3                   
 [85] stats4_4.1.0                      XML_3.99-0.8                     
 [87] glue_1.4.2                        data.table_1.14.0                
 [89] png_0.1-7                         vctrs_0.3.8                      
 [91] foreach_1.5.1                     gtable_0.3.0                     
 [93] purrr_0.3.4                       kernlab_0.9-29                   
 [95] future_1.22.1                     assertthat_0.2.1                 
 [97] cachem_1.0.6                      gower_0.2.2                      
 [99] prodlim_2019.11.13                restfulr_0.0.13                  
[101] class_7.3-19                      survival_3.2-13                  
[103] timeDate_3043.102                 tibble_3.1.4                     
[105] iterators_1.0.13                  GenomicAlignments_1.28.0         
[107] AnnotationDbi_1.54.1              memoise_2.0.0                    
[109] IRanges_2.26.0                    lava_1.6.10                      
[111] globals_0.14.0                    ellipsis_0.3.2                   
[113] ipred_0.9-12
gtfToExons "Error:subscriptcontainsinvalidnames" gtf branchpointer • 1.1k views
ADD COMMENT
0
Entering edit mode

Hi Frank,

Your example gtf is missing a gene_id. In the old code we also required a transcript_type/transcript_biotype, and a gene_type/gene_biotype. The code on github (betsig/branchpointer) has been updated so these are no longer required.

ADD REPLY

Login before adding your answer.

Traffic: 655 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6