Entering edit mode
Hi,
I'm using NxtIRFcore pipeline to analyze my data according the the vignette. I'm facing an error when creating reference. The error code says "Error in .Call2("DNAStringSet_translate", x, skip_code, dna_codes[codon_alphabet], : in 'x[[2307]]': not a base at pos 21 ". Do you have any solution for this?
Thank you.
FTP = "ftp://ftp.ensembl.org/pub/release-104/"
BuildReference(
reference_path = ref_path,
fasta = paste0(FTP, "fasta/rattus_norvegicus/dna/",
"Rattus_norvegicus.Rnor_6.0.dna.toplevel.fa.gz"),
gtf = paste0(FTP, "gtf/rattus_norvegicus/",
"Rattus_norvegicus.Rnor_6.0.104.gtf.gz")
)
Splice Annotations Filtered
Translating Alternate Splice Peptides...
Error in .Call2("DNAStringSet_translate", x, skip_code, dna_codes[codon_alphabet], : in 'x[[2307]]': not a base at pos 21
Traceback:
1. BuildReference(reference_path = ref_path, fasta = paste0(FTP,
. "fasta/rattus_norvegicus/dna/", "Rattus_norvegicus.Rnor_6.0.dna.toplevel.fa.gz"),
. gtf = paste0(FTP, "gtf/rattus_norvegicus/", "Rattus_norvegicus.Rnor_6.0.104.gtf.gz"))
2. .gen_splice_proteins(reference_path, reference_data$genome)
3. .gen_splice_proteins_translate(AS_Table.Extended)
4. Biostrings::translate(as(DNAseq, "DNAStringSet"))
5. Biostrings::translate(as(DNAseq, "DNAStringSet"))
6. .Call2("DNAStringSet_translate", x, skip_code, dna_codes[codon_alphabet],
. lkup, init_lkup, if.non.ambig, if.ambig, PACKAGE = "Biostrings")
The error message persists even after downloading fast and gtf file locally as follows.
ref_path = '../tmp/IRFinder_ref'
fasta_path='../../../../data/rna_seq/Rattus_norvegicus/Ensembl/Rnor_6.0/Rattus_norvegicus.Rnor_6.0.dna.toplevel.fa'
gtf_path='../../../../data/rna_seq/Rattus_norvegicus/Ensembl/Rnor_6.0/Rattus_norvegicus.Rnor_6.0.103.gtf'
BuildReference(
reference_path = ref_path,
fasta = fasta_path,
gtf = gtf_path
)
sessionInfo( )
R version 4.1.3 (2022-03-10)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS/LAPACK: /home/aaaaa/miniconda3/envs/r_bio/lib/libopenblasp-r0.3.20.so
locale:
[1] LC_CTYPE=ja_JP.UTF-8 LC_NUMERIC=C
[3] LC_TIME=ja_JP.UTF-8 LC_COLLATE=ja_JP.UTF-8
[5] LC_MONETARY=ja_JP.UTF-8 LC_MESSAGES=ja_JP.UTF-8
[7] LC_PAPER=ja_JP.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=ja_JP.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] forcats_0.5.1 stringr_1.4.0 dplyr_1.0.9 purrr_0.3.4
[5] readr_2.1.2 tidyr_1.2.0 tibble_3.1.7 ggplot2_3.3.6
[9] tidyverse_1.3.1 NxtIRFcore_1.0.0 NxtIRFdata_1.0.0
loaded via a namespace (and not attached):
[1] colorspace_2.0-3 rjson_0.2.21
[3] ellipsis_0.3.2 IRdisplay_1.1
[5] XVector_0.34.0 fs_1.5.2
[7] GenomicRanges_1.46.1 base64enc_0.1-3
[9] rstudioapi_0.13 bit64_4.0.5
[11] lubridate_1.8.0 interactiveDisplayBase_1.32.0
[13] AnnotationDbi_1.56.2 fansi_1.0.3
[15] xml2_1.3.3 splines_4.1.3
[17] R.methodsS3_1.8.2 sparseMatrixStats_1.6.0
[19] cachem_1.0.6 IRkernel_1.3
[21] jsonlite_1.8.0 Rsamtools_2.10.0
[23] broom_0.8.0 annotate_1.72.0
[25] dbplyr_2.2.0 png_0.1-7
[27] R.oo_1.25.0 shiny_1.7.1
[29] HDF5Array_1.22.1 BiocManager_1.30.18
[31] compiler_4.1.3 httr_1.4.3
[33] backports_1.4.1 assertthat_0.2.1
[35] Matrix_1.4-1 fastmap_1.1.0
[37] lazyeval_0.2.2 cli_3.3.0
[39] later_1.2.0 htmltools_0.5.2
[41] tools_4.1.3 gtable_0.3.0
[43] glue_1.6.2 GenomeInfoDbData_1.2.7
[45] rappdirs_0.3.3 Rcpp_1.0.8.3
[47] Biobase_2.54.0 cellranger_1.1.0
[49] vctrs_0.4.1 Biostrings_2.62.0
[51] rhdf5filters_1.6.0 rtracklayer_1.54.0
[53] DelayedMatrixStats_1.16.0 rvest_1.0.2
[55] mime_0.12 lifecycle_1.0.1
[57] restfulr_0.0.15 XML_3.99-0.10
[59] AnnotationHub_3.2.2 zlibbioc_1.40.0
[61] scales_1.2.0 BSgenome_1.62.0
[63] hms_1.1.1 promises_1.2.0.1
[65] MatrixGenerics_1.6.0 parallel_4.1.3
[67] SummarizedExperiment_1.24.0 rhdf5_2.38.1
[69] yaml_2.3.5 curl_4.3.2
[71] memoise_2.0.1 stringi_1.7.6
[73] RSQLite_2.2.14 BiocVersion_3.14.0
[75] genefilter_1.76.0 S4Vectors_0.32.4
[77] BiocIO_1.4.0 BiocGenerics_0.40.0
[79] filelock_1.0.2 BiocParallel_1.28.3
[81] fstcore_0.9.12 repr_1.1.4
[83] GenomeInfoDb_1.30.1 rlang_1.0.2
[85] pkgconfig_2.0.3 matrixStats_0.62.0
[87] bitops_1.0-7 evaluate_0.15
[89] lattice_0.20-45 Rhdf5lib_1.16.0
[91] GenomicAlignments_1.30.0 htmlwidgets_1.5.4
[93] bit_4.0.4 tidyselect_1.1.2
[95] magrittr_2.0.3 R6_2.5.1
[97] IRanges_2.28.0 generics_0.1.2
[99] pbdZMQ_0.3-7 DelayedArray_0.20.0
[101] DBI_1.1.3 withr_2.5.0
[103] haven_2.5.0 pillar_1.7.0
[105] survival_3.3-1 KEGGREST_1.34.0
[107] RCurl_1.98-1.7 modelr_0.1.8
[109] crayon_1.5.1 uuid_1.1-0
[111] utf8_1.2.2 BiocFileCache_2.2.1
[113] plotly_4.10.0 tzdb_0.3.0
[115] readxl_1.4.0 grid_4.1.3
[117] data.table_1.14.2 blob_1.2.3
[119] reprex_2.0.1 digest_0.6.29
[121] xtable_1.8-4 httpuv_1.6.5
[123] R.utils_2.11.0 stats4_4.1.3
[125] munsell_0.5.0 fst_0.9.8
[127] viridisLite_0.4.0
Hi ieki,
It appears that Flt3lg-205 exon 8 contains a "N" base, which is triggering problems with Biostrings::translate().
Technically it is a problem with the genome having N's in a protein-coding segment, but I can see that incomplete genomes can cause unnecessary hurdles. So, I have amended the code to devel / release versions to translate these as "fuzzy codons", i.e. using Biostrings::translate(x, if.fuzzy.codon = "solve"), instead of triggering needless errors.
Hopefully versions 1.3.1 (devel) and 1.2.1 (release) will be online in coming days and fix your problem. In the meantime you can install it via github:
Let me know if you encounter further problems
Alex
Hi, Alex
I installed the updated version (1.3.1) via devtools and the error disappeared.
Thank you for your prompt reply and for correcting the code!