Hi all,
So I previously posted a question, which wasn't a minimal working example. I stripped it down now and hope I'm posting correctly now.
In short, I'm trying to get VariantAnnotation working with a non-reference genome where I provide my own GFF and FASTA. I'm getting an error with particular lines in my VCF file (i.e., some records in the VCF work and some do not). I've reduced my example to just two SNPs, neither of which work. (I'm not showing any samples that do work.)
The variable vcf
has been saved at this link as an RDS file.
With that set, then a record of my execution is as follows, with the error message that is stumping me at the very end:
> library (VariantAnnotation)
> library (GenomicFeatures)
> txdb <- makeTxDbFromGFF (file = "http://sgd-archive.yeastgenome.org/sequence/strains/BY4742/BY4742_Toronto_2012/BY4742_Toronto_2012.gff.gz", dataSource="BY4742", organism="Saccharomyces cerevisiae")
Import genomic features from the file as a GRanges object ... trying URL 'http://sgd-archive.yeastgenome.org/sequence/strains/BY4742/BY4742_Toronto_2012/BY4742_Toronto_2012.gff.gz'
Content type 'binary/octet-stream' length 4540589 bytes (4.3 MB)
==================================================
downloaded 4.3 MB
OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
Warning message:
In makeTxDbFromGRanges(gr, metadata = metadata) :
The following transcripts were dropped because their exon ranks could
not be inferred (either because the exons are not on the same
chromosome/strand or because they are not separated by introns):
YBR084C-A_mRNA
> coding <- predictCoding (vcf, txdb, seqSource="http://sgd-archive.yeastgenome.org/sequence/strains/BY4742/BY4742_Toronto_2012/BY4742_Toronto_2012.fsa.gz")
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'x' in selecting a method for function 'seqlevels': unable to find an inherited method for function ‘seqinfo’ for signature ‘"character"’
The warning message from makeTxDbFromGFF
is "ok". I've checked the transcript in question and two exons are on different strands; there's probably something wrong with the description of its features.
Neither SNP has gone passed the end of the chromosome in question (808695 bp). Beyond that, I can't see anything obviously wrong with my input data.
My session information is:
> sessionInfo ()
R version 4.1.3 (2022-03-10)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Ubuntu 22.04.1 LTS
Matrix products: default
BLAS/LAPACK: /home/rwan/.conda/envs/vcf2gene/lib/libopenblasp-r0.3.21.so
locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
[5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] GenomicFeatures_1.46.1 AnnotationDbi_1.56.1
[3] VariantAnnotation_1.40.0 Rsamtools_2.10.0
[5] Biostrings_2.62.0 XVector_0.34.0
[7] SummarizedExperiment_1.24.0 Biobase_2.54.0
[9] GenomicRanges_1.46.1 GenomeInfoDb_1.30.0
[11] IRanges_2.28.0 S4Vectors_0.32.3
[13] MatrixGenerics_1.6.0 matrixStats_0.62.0
[15] BiocGenerics_0.40.0
loaded via a namespace (and not attached):
[1] Rcpp_1.0.9 lattice_0.20-45 prettyunits_1.1.1
[4] png_0.1-7 assertthat_0.2.1 digest_0.6.29
[7] utf8_1.2.2 BiocFileCache_2.2.0 R6_2.5.1
[10] RSQLite_2.2.8 httr_1.4.4 pillar_1.8.1
[13] zlibbioc_1.40.0 rlang_1.0.5 progress_1.2.2
[16] curl_4.3.2 blob_1.2.3 Matrix_1.4-1
[19] BiocParallel_1.28.3 stringr_1.4.1 RCurl_1.98-1.8
[22] bit_4.0.4 biomaRt_2.50.0 DelayedArray_0.20.0
[25] rtracklayer_1.54.0 compiler_4.1.3 pkgconfig_2.0.3
[28] tidyselect_1.1.2 KEGGREST_1.34.0 tibble_3.1.8
[31] GenomeInfoDbData_1.2.7 XML_3.99-0.10 fansi_1.0.3
[34] crayon_1.5.1 dplyr_1.0.10 dbplyr_2.2.1
[37] GenomicAlignments_1.30.0 bitops_1.0-7 rappdirs_0.3.3
[40] grid_4.1.3 lifecycle_1.0.2 DBI_1.1.3
[43] magrittr_2.0.3 cli_3.4.0 stringi_1.7.8
[46] cachem_1.0.6 xml2_1.3.3 ellipsis_0.3.2
[49] filelock_1.0.2 vctrs_0.4.1 generics_0.1.3
[52] rjson_0.2.21 restfulr_0.0.15 tools_4.1.3
[55] bit64_4.0.5 BSgenome_1.62.0 glue_1.6.2
[58] purrr_0.3.4 hms_1.1.2 yaml_2.3.5
[61] parallel_4.1.3 fastmap_1.1.0 memoise_2.0.1
[64] BiocIO_1.4.0
Any help would be appreciated! Thank you!
Ray
That's not an example that anybody can use to test, as
dput
isn't meant to be used on such a complicated object. For example, something like thisCannot be simply copy/pasted into R, because
<environment>
is actually anenvironment
on your machine, and obviously passing that in as a character string won't work. If you want to provide an example, you should output that object as an RDS and put somewhere that people can download.Hi James,
Thanks a lot for the explanation! I've never used
dput
before, but found some information on how to create a "minimal working example" for R and the person who posted that message suggesteddput
. Clearly, s/he was incorrect -- thank you for pointing that out!I've edited my post with an RDS now. I hope it is easier for someone to understand my problem.
Thank you!
Ray