Search
Question: ParseMetaFromGtfFile() from SCAN.UPC package fails to produce annotation file
0
gravatar for lhuang7
12 months ago by
lhuang710
United States
lhuang710 wrote:

Hi,

I try to create an annotation file using the function ParseMetaFromGtfFile() from SCAN.UPC package but get a warning message with no output file generated.

After searching the archive I found this 3-year old post related to the same issue (ParseMetaFromGtfFile is.na() error).

The following is the code snippet I used:

library(SCAN.UPC)

ParseMetaFromGtfFile(gtfFilePath = "gencode.v25.annotation.gtf", 
                     fastaFilePattern = "GRCh38.primary_assembly.genome.fa", 
                     outFilePath = "GRCh38_Annotation.txt",  
                     featureTypes = "protein_coding", 
                     attributeType = "gene_id")

# Saving GTF data to temporary files
# Done parsing 10000 lines from gencode.v25.annotation.gtf
# Done parsing 20000 lines from gencode.v25.annotation.gtf
# ...
# Done parsing 2570000 lines from gencode.v25.annotation.gtf
# Warning message:
# In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL'
sessionInfo()
# R version 3.4.1 (2017-06-30)
# Platform: x86_64-apple-darwin15.6.0 (64-bit)
# Running under: macOS Sierra 10.12.6
# 
# Matrix products: default
# BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
# LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
# 
# locale:
# [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
# 
# attached base packages:
# [1] stats4    parallel  stats     graphics  grDevices utils     datasets 
# [8] methods   base     
# 
# other attached packages:
#  [1] SCAN.UPC_2.18.0     sva_3.25.4          BiocParallel_1.11.6
#  [4] genefilter_1.59.0   mgcv_1.8-19         nlme_3.1-131       
#  [7] foreach_1.4.3       affyio_1.47.0       affy_1.55.0        
# [10] GEOquery_2.43.0     oligo_1.41.1        Biostrings_2.45.3  
# [13] XVector_0.17.0      IRanges_2.11.12     S4Vectors_0.15.5   
# [16] oligoClasses_1.39.1 Biobase_2.37.2      BiocGenerics_0.23.0
# 
# loaded via a namespace (and not attached):
#  [1] SummarizedExperiment_1.7.5 splines_3.4.1             
#  [3] lattice_0.20-35            colorspace_1.3-3          
#  [5] yaml_2.1.14                blob_1.1.0                
#  [7] XML_3.98-1.9               survival_2.41-3           
#  [9] rlang_0.1.2                DBI_0.7                   
# [11] bit64_0.9-7                matrixStats_0.52.2        
# [13] GenomeInfoDbData_0.99.1    stringr_1.2.0             
# [15] zlibbioc_1.23.0            codetools_0.2-15          
# [17] memoise_1.1.0              ff_2.2-13                 
# [19] GenomeInfoDb_1.13.4        BiocInstaller_1.26.1      
# [21] AnnotationDbi_1.39.2       preprocessCore_1.39.0     
# [23] Rcpp_0.12.12               xtable_1.8-2              
# [25] limma_3.33.7               DelayedArray_0.3.19       
# [27] annotate_1.55.0            affxparser_1.49.0         
# [29] bit_1.1-12                 digest_0.6.12             
# [31] stringi_1.1.5              GenomicRanges_1.29.12     
# [33] grid_3.4.1                 tools_3.4.1               
# [35] bitops_1.0-6               magrittr_1.5              
# [37] RCurl_1.95-4.8             RSQLite_2.0               
# [39] tibble_1.3.4               MASS_7.3-47               
# [41] autoinst_0.0.0.9000        Matrix_1.2-11             
# [43] lubridate_1.6.0            httr_1.3.1                
# [45] iterators_1.0.8            R6_2.2.2                  
# [47] compiler_3.4.1

Did I do anything wrong? Can anyone kindly guide me to fix this problem?

Thanks,

Lei

ADD COMMENTlink modified 12 months ago by Stephen Piccolo530 • written 12 months ago by lhuang710

I'll look into this and get back to you.

ADD REPLYlink written 12 months ago by Stephen Piccolo530

Thanks Stephen!

ADD REPLYlink written 12 months ago by lhuang710
1
gravatar for Stephen Piccolo
12 months ago by
United States
Stephen Piccolo530 wrote:

Thanks for letting me know about this. Some of the information that is often stored in the second column was stored in a different location within the file. I believe I have fixed the problem. I'll post it as soon as I can to the devel server. But for now, send me an email, and I'll send you the fix.

As an aside, I implemented this parser before other GTF parsers were ubiquitous. For more advanced GTF parsing, it would be best to use one of those (e.g., https://bioconductor.org/packages/devel/bioc/manuals/GenomicFeatures/man/GenomicFeatures.pdf)

ADD COMMENTlink written 12 months ago by Stephen Piccolo530

Thanks for the quick fix!

ADD REPLYlink written 12 months ago by lhuang710
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 199 users visited in the last hour