I have some trouble when I worked with SGSeq
. The species that we study has no available transcription annotation from TxDb, so I need to import my own gff file by function importTranscripts()
, but I can't achieve it with the error "Error: subscript contains invalid names". I wander either the fault of my oprions or the something wrong with my gff file.
These are my code:
fname<-file.choose("Triplophysarosa.evm.gff")
fname
## "D:\\bioinformation\\Triplophysarosa\\05.Annotation\\02.Gene_Prediction\\Triplophysarosa.evm.gff"
file.exists(fname)
##[1] TRUE
importTranscripts("D:\\bioinformation\\Triplophysarosa\\05.Annotation\\02.Gene_Prediction\\Triplophysarosa.evm.gff",tag_tx = "contig1",tag_gene = "evm.TU.contig365.6")
Hide Traceback
Error: subscript contains invalid names
10.
stop(wmsg(...), call. = FALSE)
9.
.subscript_error("subscript contains invalid ", what)
8.
NSBS(i, x, exact = exact, strict.upper.bound = !allow.append, allow.NAs = allow.NAs)
7.
NSBS(i, x, exact = exact, strict.upper.bound = !allow.append, allow.NAs = allow.NAs)
6.
normalizeSingleBracketSubscript(j, xstub)
5.
mcols(exons)[c(tag_tx, tag_gene)]
4.
mcols(exons)[c(tag_tx, tag_gene)]
3.
data.frame(mcols(exons)[c(tag_tx, tag_gene)])
2.
unique(data.frame(mcols(exons)[c(tag_tx, tag_gene)]))
1.
importTranscripts(fname, tag_tx = "contig1", tag_gene = "evm.TU.contig365.6")
sessionInfo()
R version 3.5.2 (2018-12-20)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows 7 x64 (build 7600)
Matrix products: default
locale:
[1] LC_COLLATE=Chinese (Simplified)_People's Republic of China.936
[2] LC_CTYPE=Chinese (Simplified)_People's Republic of China.936
[3] LC_MONETARY=Chinese (Simplified)_People's Republic of China.936
[4] LC_NUMERIC=C
[5] LC_TIME=Chinese (Simplified)_People's Republic of China.936
attached base packages:
[1] stats4 parallel stats graphics grDevices
[6] utils datasets methods base
other attached packages:
[1] SGSeq_1.16.2
[2] SummarizedExperiment_1.12.0
[3] DelayedArray_0.8.0
[4] BiocParallel_1.16.6
[5] matrixStats_0.54.0
[6] Biobase_2.42.0
[7] Rsamtools_1.34.1
[8] Biostrings_2.50.2
[9] XVector_0.22.0
[10] GenomicRanges_1.34.0
[11] GenomeInfoDb_1.18.2
[12] IRanges_2.16.0
[13] S4Vectors_0.20.1
[14] BiocGenerics_0.28.0
loaded via a namespace (and not attached):
[1] magrittr_1.5 GenomicFeatures_1.34.4
[3] gtable_0.2.0 zlibbioc_1.28.0
[5] memoise_1.1.0 hms_0.4.2
[7] RCurl_1.95-4.12 pillar_1.3.1
[9] progress_1.2.0 stringr_1.4.0
[11] lattice_0.20-38 rtracklayer_1.42.2
[13] bit_1.1-14 plyr_1.8.4
[15] knitr_1.22 GenomicAlignments_1.18.1
[17] igraph_1.2.4 pkgconfig_2.0.2
[19] Matrix_1.2-15 R6_2.4.0
[21] GenomeInfoDbData_1.2.0 digest_0.6.18
[23] xfun_0.5 colorspace_1.4-0
[25] AnnotationDbi_1.44.0 stringi_1.3.1
[27] lazyeval_0.2.1 yaml_2.2.0
[29] RSQLite_2.1.1 tibble_2.0.1
[31] httr_1.4.0 compiler_3.5.2
[33] bit64_0.9-7 munsell_0.5.0
[35] DBI_1.0.0 Rcpp_1.0.0
[37] biomaRt_2.38.0 XML_3.98-1.19
[39] RUnit_0.4.32 assertthat_0.2.0
[41] blob_1.1.1 ggplot2_3.1.0
[43] prettyunits_1.0.2 tools_3.5.2
[45] bitops_1.0-6 scales_1.0.0
[47] crayon_1.3.4 rlang_0.3.1
[49] grid_3.5.2
Thank you very much, Imy problem has been solved.
Dear professor, I have another question,whether I can use
SGSeq
package to recognize all of splice events accross the whole genome, only run it once. Or I just can get the splice events from a particular gene for once running?Both are possible. Please see vignette section 6 for an example on how to analyze a particular region of the genome using the
which
argument. If no region is specified the analysis is genome-wide. Depending on your data set, genome-wide predictions can be computationally intensive. In case you run into problems, please see vignette section 13 for considerations about parallelization and memory requirements.