Entering edit mode
Hi,
I am running the HTSeqGenie on both MacOS and Linux with the test TP53 samples. They both gave me error in reading the fastq files. It seems having problems reading the fastq.gz files in each parallel process. Could anyone help me with this please?
Error are at below:
checkConfig.R/checkConfig.template: loading template config= /Library/Frameworks/R.framework/Versions/4.2/Resources/library/HTSeqGenie/config/default-config.txt
possible qualities of filename=../data/H1993_TP53_subset2500_1.fastq.gz are: illumina1.8, GATK-rescaled
quality_encoding is not set! setting quality_encoding to illumina1.8
2023-03-01 14:24:01 ERROR::tools.R/safeExecute: caught exception:
2023-03-01 14:24:01 ERROR::Error in sclapply(inext = inext, fun = funlog, max.parallel.jobs = nb.parallel.jobs, : tools.R/sclapply: error in chunkid=1: Error in file(file, "wb") : cannot open the connection
2023-03-01 14:24:01 ERROR::tools.R/safeExecute: traceback:
2023-03-01 14:24:01 ERROR::10: stop(paste("tools.R/sclapply: error in chunkid=", jnodes[i], at tools.R#210
2023-03-01 14:24:01 ERROR::9: sclapply(inext = inext, fun = funlog, max.parallel.jobs = nb.parallel.jobs, at tools.R#120
2023-03-01 14:24:01 ERROR::8: processChunks(FastQStreamer.getReads, preprocessReadsChunk, nb.parallel.jobs = nb.parallel.jobs) at preprocessReads.R#29
2023-03-01 14:24:01 ERROR::7: eval(expr, env)
2023-03-01 14:24:01 ERROR::6: try(eval(expr, env), silent = TRUE)
2023-03-01 14:24:01 ERROR::5: serialize(what, NULL, xdr = FALSE)
2023-03-01 14:24:01 ERROR::4: safeExecute({ at preprocessReads.R#17
2023-03-01 14:24:01 ERROR::3: preprocessReads() at runPipeline.R#69
2023-03-01 14:24:01 ERROR::2: runPipelineConfig(config_update = list(...)) at runPipeline.R#48
2023-03-01 14:24:01 ERROR::1: runPipeline(input_file = Fq.1, input_file2 = Fq.2, paired_ends = TRUE,
Error in sclapply(inext = inext, fun = funlog, max.parallel.jobs = nb.parallel.jobs, :
tools.R/sclapply: error in chunkid=1: Error in file(file, "wb") : cannot open the connection
In addition: Warning messages:
1: In system("gsnap", ignore.stderr = TRUE) : error in running command
2: In system("samtools", ignore.stderr = TRUE) : error in running command
My codes are below:
library(HTSeqGenie)
library(gmapR)
Gencode.V43.GenomicFeatures <- "../Genome/Gencode/Gencode.v43/Gencode.v43.RData"
Gencode.V43.GenomicFeatures.rRNA <- "../Genome/rRNA/rRNA.Gencode.v43/rRNA.Gencode.v43.RData"
Sample.ID <- "test"
Fq.1 <- "../data/H1993_TP53_subset2500_1.fastq.gz"
Fq.2 <- "../data/H1993_TP53_subset2500_2.fastq.gz"
save_dir <- runPipeline(
## input
input_file=Fq.1,
input_file2=Fq.2,
paired_ends=TRUE,
# quality_encoding="illumina1.8",
## system
num_cores = 4,
## output
save_dir=paste("../analysis/", Sample.ID, sep=''),
prepend_str=paste("../analysis/", Sample.ID, sep=''),
overwrite_save_dir="erase",
remove_processedfastq = F,
remove_chunkdir = T,
## trim reads
# trimReads.do = FALSE,
# trimReads.length = NULL,
# trimReads.trim5 = 0,
## Filter
filterQuality.do = T,
filterQuality.minQuality = 23,
filterQuality.minFrac = 0.7,
# filterQuality.minLength
## detect adapter contamination
detectAdapterContam.do = T,
detectAdapterContam.force_paired_end_adapter = F,
## detect ribosomal RNA
detectRRNA.do = F,
detectRRNA.rrna_genome = "../Genome/rRNA/rRNA.Gencode.v43/rRNA.Gencode.v43.RData",
## aligner
path.gsnap_genomes="../Genome/Human/",
alignReads.genome="GRCh38.p14",
alignReads.additional_parameters="-M 2 -n 10 -B 2 -i 1 -N 1 -w 200000 -E 1 --pairmax-rna=200000 --clip-overlap",
alignReads.sam_id = Sample.ID,
## gene model
path.genomic_features = "../Genome/Gencode/",
countGenomicFeatures.gfeatures = "Gencode.v43"
)
```r
# include your problematic code here with any corresponding output
# please also include the results of running the following in an R session
sessionInfo( )
R version 4.2.0 (2022-04-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.4
Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] HTSeqGenie_4.28.1 GenomicFeatures_1.48.4 AnnotationDbi_1.58.0
[4] bambu_2.2.0 BSgenome.Hsapiens.UCSC.hg38_1.4.4 BSgenome_1.64.0
[7] rtracklayer_1.56.1 BiocManager_1.30.20 VariantAnnotation_1.42.1
[10] ShortRead_1.54.0 GenomicAlignments_1.32.1 SummarizedExperiment_1.26.1
[13] Biobase_2.56.0 MatrixGenerics_1.8.1 matrixStats_0.63.0
[16] BiocParallel_1.30.4 gmapR_1.38.0 Rsamtools_2.12.0
[19] Biostrings_2.64.1 XVector_0.36.0 GenomicRanges_1.48.0
[22] GenomeInfoDb_1.32.4 IRanges_2.30.1 S4Vectors_0.34.0
[25] BiocGenerics_0.42.0
loaded via a namespace (and not attached):
[1] bitops_1.0-7 bit64_4.0.5
[3] filelock_1.0.2 RColorBrewer_1.1-3
[5] progress_1.2.2 httr_1.4.5
[7] tools_4.2.0 utf8_1.2.3
[9] R6_2.5.1 DBI_1.1.3
[11] tidyselect_1.2.0 prettyunits_1.1.1
[13] bit_4.0.5 curl_5.0.0
[15] compiler_4.2.0 cli_3.6.0
[17] Cairo_1.6-0 xml2_1.3.3
[19] DelayedArray_0.22.0 VariantTools_1.38.0
[21] rappdirs_0.3.3 stringr_1.5.0
[23] digest_0.6.31 BSgenome.Hsapiens.UCSC.hg19_1.4.3
[25] jpeg_0.1-10 pkgconfig_2.0.3
[27] dbplyr_2.3.1 fastmap_1.1.1
[29] rlang_1.0.6 rstudioapi_0.14
[31] RSQLite_2.3.0 TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
[33] BiocIO_1.6.0 generics_0.1.3
[35] hwriter_1.3.2.1 jsonlite_1.8.4
[37] dplyr_1.1.0 RCurl_1.98-1.10
[39] magrittr_2.0.3 GenomeInfoDbData_1.2.8
[41] interp_1.1-3 Matrix_1.5-3
[43] Rcpp_1.0.10 fansi_1.0.4
[45] lifecycle_1.0.3 stringi_1.7.12
[47] yaml_2.3.7 zlibbioc_1.42.0
[49] org.Hs.eg.db_3.15.0 BiocFileCache_2.4.0
[51] grid_4.2.0 blob_1.2.3
[53] parallel_4.2.0 crayon_1.5.2
[55] deldir_1.0-6 lattice_0.20-45
[57] hms_1.1.2 KEGGREST_1.36.3
[59] pillar_1.8.1 rjson_0.2.21
[61] xgboost_1.7.3.1 codetools_0.2-19
[63] biomaRt_2.52.0 XML_3.99-0.13
[65] glue_1.6.2 latticeExtra_0.6-30
[67] data.table_1.14.8 png_0.1-8
[69] vctrs_0.5.2 purrr_1.0.1
[71] tidyr_1.3.0 cachem_1.0.7
[73] chipseq_1.46.0 restfulr_0.0.15
[75] tibble_3.1.8 memoise_2.0.1
[77] ellipsis_0.3.2