I can't import my gff file by `importTransprits()`,when I worked with SGSeq.
1
1
Entering edit mode
2315440517 • 0
@2315440517-20246
Last seen 5.5 years ago

I have some trouble when I worked with SGSeq. The species that we study has no available transcription annotation from TxDb, so I need to import my own gff file by function importTranscripts(), but I can't achieve it with the error "Error: subscript contains invalid names". I wander either the fault of my oprions or the something wrong with my gff file.

These are my code:

fname<-file.choose("Triplophysarosa.evm.gff")
fname
## "D:\\bioinformation\\Triplophysarosa\\05.Annotation\\02.Gene_Prediction\\Triplophysarosa.evm.gff"
file.exists(fname)
##[1] TRUE
importTranscripts("D:\\bioinformation\\Triplophysarosa\\05.Annotation\\02.Gene_Prediction\\Triplophysarosa.evm.gff",tag_tx = "contig1",tag_gene = "evm.TU.contig365.6")
 Hide Traceback
Error: subscript contains invalid names
10.
stop(wmsg(...), call. = FALSE)
9.
.subscript_error("subscript contains invalid ", what)
8.
NSBS(i, x, exact = exact, strict.upper.bound = !allow.append, allow.NAs = allow.NAs)
7.
NSBS(i, x, exact = exact, strict.upper.bound = !allow.append, allow.NAs = allow.NAs)
6.
normalizeSingleBracketSubscript(j, xstub)
5.
mcols(exons)[c(tag_tx, tag_gene)]
4.
mcols(exons)[c(tag_tx, tag_gene)]
3.
data.frame(mcols(exons)[c(tag_tx, tag_gene)])
2.
unique(data.frame(mcols(exons)[c(tag_tx, tag_gene)]))
1.
importTranscripts(fname, tag_tx = "contig1", tag_gene = "evm.TU.contig365.6")
sessionInfo()
R version 3.5.2 (2018-12-20)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows 7 x64 (build 7600)

Matrix products: default

locale:
[1] LC_COLLATE=Chinese (Simplified)_People's Republic of China.936 
[2] LC_CTYPE=Chinese (Simplified)_People's Republic of China.936   
[3] LC_MONETARY=Chinese (Simplified)_People's Republic of China.936
[4] LC_NUMERIC=C                                                   
[5] LC_TIME=Chinese (Simplified)_People's Republic of China.936    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices
[6] utils     datasets  methods   base     

other attached packages:
 [1] SGSeq_1.16.2               
 [2] SummarizedExperiment_1.12.0
 [3] DelayedArray_0.8.0         
 [4] BiocParallel_1.16.6        
 [5] matrixStats_0.54.0         
 [6] Biobase_2.42.0             
 [7] Rsamtools_1.34.1           
 [8] Biostrings_2.50.2          
 [9] XVector_0.22.0             
[10] GenomicRanges_1.34.0       
[11] GenomeInfoDb_1.18.2        
[12] IRanges_2.16.0             
[13] S4Vectors_0.20.1           
[14] BiocGenerics_0.28.0        

loaded via a namespace (and not attached):
 [1] magrittr_1.5             GenomicFeatures_1.34.4  
 [3] gtable_0.2.0             zlibbioc_1.28.0         
 [5] memoise_1.1.0            hms_0.4.2               
 [7] RCurl_1.95-4.12          pillar_1.3.1            
 [9] progress_1.2.0           stringr_1.4.0           
[11] lattice_0.20-38          rtracklayer_1.42.2      
[13] bit_1.1-14               plyr_1.8.4              
[15] knitr_1.22               GenomicAlignments_1.18.1
[17] igraph_1.2.4             pkgconfig_2.0.2         
[19] Matrix_1.2-15            R6_2.4.0                
[21] GenomeInfoDbData_1.2.0   digest_0.6.18           
[23] xfun_0.5                 colorspace_1.4-0        
[25] AnnotationDbi_1.44.0     stringi_1.3.1           
[27] lazyeval_0.2.1           yaml_2.2.0              
[29] RSQLite_2.1.1            tibble_2.0.1            
[31] httr_1.4.0               compiler_3.5.2          
[33] bit64_0.9-7              munsell_0.5.0           
[35] DBI_1.0.0                Rcpp_1.0.0              
[37] biomaRt_2.38.0           XML_3.98-1.19           
[39] RUnit_0.4.32             assertthat_0.2.0        
[41] blob_1.1.1               ggplot2_3.1.0           
[43] prettyunits_1.0.2        tools_3.5.2             
[45] bitops_1.0-6             scales_1.0.0            
[47] crayon_1.3.4             rlang_0.3.1             
[49] grid_3.5.2  
software error annotation • 847 views
ADD COMMENT
1
Entering edit mode
@leonard-goldstein-6845
Last seen 6 months ago
Australia

When posting a question about a software package, please always tag your post with the package name. This triggers an automatic email from the system to the package maintainer.

Regarding your question -- the tag_tx and tag_gene arguments are for specifying names of the relevant GFF tags. Please try

importTranscripts("D:\\bioinformation\\Triplophysarosa\\05.Annotation\\02.Gene_Prediction\\Triplophysarosa.evm.gff",tag_tx = "ID",tag_gene = "Name")
ADD COMMENT
0
Entering edit mode

Thank you very much, Imy problem has been solved.

ADD REPLY
0
Entering edit mode

Dear professor, I have another question,whether I can use SGSeq package to recognize all of splice events accross the whole genome, only run it once. Or I just can get the splice events from a particular gene for once running?

ADD REPLY
0
Entering edit mode

Both are possible. Please see vignette section 6 for an example on how to analyze a particular region of the genome using the which argument. If no region is specified the analysis is genome-wide. Depending on your data set, genome-wide predictions can be computationally intensive. In case you run into problems, please see vignette section 13 for considerations about parallelization and memory requirements.

ADD REPLY

Login before adding your answer.

Traffic: 756 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6