Search
Question: ParseMetaFromGtfFile() from SCAN.UPC package fails to produce annotation file
0
gravatar for lhuang7
11 weeks ago by
lhuang70
United States
lhuang70 wrote:

Hi,

I try to create an annotation file using the function ParseMetaFromGtfFile() from SCAN.UPC package but get a warning message with no output file generated.

After searching the archive I found this 3-year old post related to the same issue (ParseMetaFromGtfFile is.na() error).

The following is the code snippet I used:

library(SCAN.UPC)

ParseMetaFromGtfFile(gtfFilePath = "gencode.v25.annotation.gtf", 
                     fastaFilePattern = "GRCh38.primary_assembly.genome.fa", 
                     outFilePath = "GRCh38_Annotation.txt",  
                     featureTypes = "protein_coding", 
                     attributeType = "gene_id")

# Saving GTF data to temporary files
# Done parsing 10000 lines from gencode.v25.annotation.gtf
# Done parsing 20000 lines from gencode.v25.annotation.gtf
# ...
# Done parsing 2570000 lines from gencode.v25.annotation.gtf
# Warning message:
# In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL'
sessionInfo()
# R version 3.4.1 (2017-06-30)
# Platform: x86_64-apple-darwin15.6.0 (64-bit)
# Running under: macOS Sierra 10.12.6
# 
# Matrix products: default
# BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
# LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
# 
# locale:
# [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
# 
# attached base packages:
# [1] stats4    parallel  stats     graphics  grDevices utils     datasets 
# [8] methods   base     
# 
# other attached packages:
#  [1] SCAN.UPC_2.18.0     sva_3.25.4          BiocParallel_1.11.6
#  [4] genefilter_1.59.0   mgcv_1.8-19         nlme_3.1-131       
#  [7] foreach_1.4.3       affyio_1.47.0       affy_1.55.0        
# [10] GEOquery_2.43.0     oligo_1.41.1        Biostrings_2.45.3  
# [13] XVector_0.17.0      IRanges_2.11.12     S4Vectors_0.15.5   
# [16] oligoClasses_1.39.1 Biobase_2.37.2      BiocGenerics_0.23.0
# 
# loaded via a namespace (and not attached):
#  [1] SummarizedExperiment_1.7.5 splines_3.4.1             
#  [3] lattice_0.20-35            colorspace_1.3-3          
#  [5] yaml_2.1.14                blob_1.1.0                
#  [7] XML_3.98-1.9               survival_2.41-3           
#  [9] rlang_0.1.2                DBI_0.7                   
# [11] bit64_0.9-7                matrixStats_0.52.2        
# [13] GenomeInfoDbData_0.99.1    stringr_1.2.0             
# [15] zlibbioc_1.23.0            codetools_0.2-15          
# [17] memoise_1.1.0              ff_2.2-13                 
# [19] GenomeInfoDb_1.13.4        BiocInstaller_1.26.1      
# [21] AnnotationDbi_1.39.2       preprocessCore_1.39.0     
# [23] Rcpp_0.12.12               xtable_1.8-2              
# [25] limma_3.33.7               DelayedArray_0.3.19       
# [27] annotate_1.55.0            affxparser_1.49.0         
# [29] bit_1.1-12                 digest_0.6.12             
# [31] stringi_1.1.5              GenomicRanges_1.29.12     
# [33] grid_3.4.1                 tools_3.4.1               
# [35] bitops_1.0-6               magrittr_1.5              
# [37] RCurl_1.95-4.8             RSQLite_2.0               
# [39] tibble_1.3.4               MASS_7.3-47               
# [41] autoinst_0.0.0.9000        Matrix_1.2-11             
# [43] lubridate_1.6.0            httr_1.3.1                
# [45] iterators_1.0.8            R6_2.2.2                  
# [47] compiler_3.4.1

Did I do anything wrong? Can anyone kindly guide me to fix this problem?

Thanks,

Lei

ADD COMMENTlink modified 11 weeks ago by Stephen Piccolo500 • written 11 weeks ago by lhuang70

I'll look into this and get back to you.

ADD REPLYlink written 11 weeks ago by Stephen Piccolo500

Thanks Stephen!

ADD REPLYlink written 11 weeks ago by lhuang70
1
gravatar for Stephen Piccolo
11 weeks ago by
United States
Stephen Piccolo500 wrote:

Thanks for letting me know about this. Some of the information that is often stored in the second column was stored in a different location within the file. I believe I have fixed the problem. I'll post it as soon as I can to the devel server. But for now, send me an email, and I'll send you the fix.

As an aside, I implemented this parser before other GTF parsers were ubiquitous. For more advanced GTF parsing, it would be best to use one of those (e.g., https://bioconductor.org/packages/devel/bioc/manuals/GenomicFeatures/man/GenomicFeatures.pdf)

ADD COMMENTlink written 11 weeks ago by Stephen Piccolo500

Thanks for the quick fix!

ADD REPLYlink written 11 weeks ago by lhuang70
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 120 users visited in the last hour