customProDB easyrun error in generating a custom protein database: Error in keepSeqlevels(anno, seqlevels(galn), pruning.mode = "coarse")
0
0
Entering edit mode
Esraa • 0
@31cfe5f1
Last seen 7 months ago
Egypt

Hello, I have been trying to run customProDB to build a customized protein database for my datasets, but whenever i run it through the easyrun function it keeps giving me the same error:

Calculate RPKMs and Output proteins pass the cutoff into FASTA file ... Error in keepSeqlevels(anno, seqlevels(galn), pruning.mode = "coarse") : invalid seqlevels: chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr1, chr1_GL456210_random, chr1_GL456211_random, chr1_GL456212_random, chr1_GL456213_random, chr1_GL456221_random, chr2, chr3, chr4, chr4_GL456216_random, chr4_GL456350_random, chr4_JH584292_random, chr4_JH584293_random, chr4_JH584294_random, chr4_JH584295_random, chr5, chr5_GL456354_random, chr5_JH584296_random, chr5_JH584297_random, chr5_JH584298_random, chr5_JH584299_random, chr6, chr7, chr7_GL456219_random, chr8, chr9, chrM, chrUn_GL456239, chrUn_GL456359, chrUn_GL456360, chrUn_GL456366, chrUn_GL456367, chrUn_GL456368, chrUn_GL456370, chrUn_GL456372, chrUn_GL456378, chrUn_GL456379, chrUn_GL456381, chrUn_GL456382, chrUn_GL456383, chrUn_GL456385, chrUn_GL456387, chrUn_GL456389, chrUn_GL456390, chrUn_GL456392, chrUn_GL456393, chrUn_GL456394, chrUn_GL456396, chrUn_JH584304, chrX, chrX_GL456233_random, chrY, chrY_JH584300_random, chrY_JH584301_random, chrY_JH584

Keeping in mind that this error appears with different human and mice datasets, my workflow goes as follows:

1) Prepare the annotation files:

  • Download coding sequence FASTA files (Genome and Protein from UCSC according to manual instructions)
  • Run through the terminal with the following code:

    library(customProDB)

pepfasta <- system.file("extdata/mm10", "Mouse__refGene_(protein)].fasta", package="customProDB")

CDSfasta <- system.file("extdata/mm10", "UCSC_Main_on_Mouse__refGene_(genome).fasta", package="customProDB")

annotation_path <- tempdir()

PrepareAnnotationRefseq(genome='mm10', CDSfasta, pepfasta, annotation_path, dbsnp=NULL, transcript_ids=NULL, splice_matrix=TRUE, ClinVar=FALSE)

2) After my R annotation files are generated i run the easyrun function as mentioned in the manual:

library(customProDB)

bamFile <- system.file("extdata/mm10", "mm10_aligned_sorted.bam", package="customProDB")

vcffile <- system.file("extdata/mm10", "freebayes.vcf", package="customProDB")

annotation_path <- system.file("extdata/mm10", package="customProDB")

outfile_path <- tempdir()

outfile_name <- 'test_mm10'

easyRun(bamFile, RPKM=NULL, vcffile, annotation_path, outfile_path, outfile_name, rpkm_cutoff=1, INDEL=TRUE, lablersid=FALSE, COSMIC=FALSE, nov_junction=FALSE)

It gives me the error, i really want to know what i am doing wrong or how to fix it so if anyone could please help i would be very grateful, thank you so much in advance.

customProDB • 438 views
ADD COMMENT
0
Entering edit mode

sessionInfo()

R version 4.3.0 (2023-04-21) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 20.04.6 LTS

attached base packages: [1] stats4 stats graphics grDevices utils datasets methods
[8] base

other attached packages: [1] customProDB_1.40.0 biomaRt_2.56.0 AnnotationDbi_1.62.1 [4] Biobase_2.60.0 IRanges_2.34.0 S4Vectors_0.38.1
[7] BiocGenerics_0.46.0

loaded via a namespace (and not attached): [1] KEGGREST_1.40.0 SummarizedExperiment_1.30.1 [3] AhoCorasickTrie_0.1.2 rjson_0.2.21
[5] lattice_0.21-8 vctrs_0.6.2
[7] tools_4.3.0 bitops_1.0-7
[9] generics_0.1.3 curl_5.0.0
[11] parallel_4.3.0 tibble_3.2.1
[13] fansi_1.0.4 RSQLite_2.3.1
[15] blob_1.2.4 pkgconfig_2.0.3
[17] Matrix_1.5-1 BSgenome_1.68.0
[19] dbplyr_2.3.2 lifecycle_1.0.3
[21] GenomeInfoDbData_1.2.10 compiler_4.3.0
[23] stringr_1.5.0 Rsamtools_2.16.0
[25] Biostrings_2.68.1 progress_1.2.2
[27] codetools_0.2-19 GenomeInfoDb_1.36.0
[29] yaml_2.3.7 RCurl_1.98-1.12
[31] pillar_1.9.0 crayon_1.5.2
[33] BiocParallel_1.34.1 DelayedArray_0.26.2
[35] cachem_1.0.8 tidyselect_1.2.0
[37] digest_0.6.31 stringi_1.7.12
[39] VariantAnnotation_1.46.0 restfulr_0.0.15
[41] dplyr_1.1.2 fastmap_1.1.1
[43] grid_4.3.0 cli_3.6.1
[45] magrittr_2.0.3 GenomicFeatures_1.52.0
[47] S4Arrays_1.0.4 XML_3.99-0.14
[49] utf8_1.2.3 prettyunits_1.1.1
[51] filelock_1.0.2 rappdirs_0.3.3
[53] bit64_4.0.5 XVector_0.40.0
[55] httr_1.4.6 matrixStats_0.63.0
[57] bit_4.0.5 png_0.1-8
[59] hms_1.1.3 memoise_2.0.1
[61] BiocIO_1.10.0 GenomicRanges_1.52.0
[63] BiocFileCache_2.8.0 rtracklayer_1.60.0
[65] rlang_1.1.1 Rcpp_1.0.10
[67] glue_1.6.2 DBI_1.1.3
[69] xml2_1.3.4 plyr_1.8.8
[71] R6_2.5.1 MatrixGenerics_1.12.0
[73] GenomicAlignments_1.36.0 zlibbioc_1.46.0

ADD REPLY

Login before adding your answer.

Traffic: 486 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6