Entering edit mode
Hi,
I am running HTSeqGenie on paired-end RNA-Seq samples. Two fastq files are both around 3G. I tried the same code on both Mac and Red Hat Linux. They are running super slow on both OS. The running messages are below the codes. Thanks for helping.
# This is the R code used to run the sample
save_dir <- runPipeline(
shortReadReport.do = T,
## input
input_file = "~/Downloads/Pilot/RNA_R1_001.fastq.gz",
input_file2 = "~/Downloads/Pilot/RNA_R2_001.fastq.gz",
paired_ends = TRUE,
quality_encoding = "illumina1.8",
## system
num_cores = 6,
debug.tracemem = F,
## output
save_dir = paste("~/Downloads/Pilot/analysis/", Sample.ID, sep=''),
prepend_str = Sample.ID,
overwrite_save_dir = "erase",
remove_processedfastq = F,
remove_chunkdir = F,
## trim reads
trimReads.do = FALSE,
# trimReads.length = NULL,
# trimReads.trim5 = 0,
## Filter
filterQuality.do = T,
filterQuality.minQuality = 23,
filterQuality.minFrac = 0.7,
filterQuality.minLength = 18,
## detect adapter contamination
detectAdapterContam.do = T,
detectAdapterContam.force_paired_end_adapter = F,
## detect ribosomal RNA
detectRRNA.do = T,
detectRRNA.rrna_genome = "gencode_v43_rRNA",
## aligner
path.gsnap_genomes = "~/Downloads/Genome/Human/",
alignReads.genome = "hg38",
alignReads.static_parameters = "-M 2 -n 10 -B 2 -i 1 -N 1 -w 200000 -E 1 --pairmax-rna=200000 --clip-overlap",
alignReads.sam_id = Sample.ID,
alignReads.use_gmapR_gsnap = F,
## gene model
path.genomic_features = "~/Downloads/Gencode.v43/",
countGenomicFeatures.do = F,
countGenomicFeatures.gfeatures = "Gencode.v43.RData",
# Other process off
markDuplicates.do = F,
coverage.do = F,
analyzeVariants.do = F
)
# Below are the running messages on Mac:
checkConfig.R/checkConfig.template: loading template config= inst/config/default-config.txt
sh: line 1: 82406 Abort trap: 6 samtools 2> /dev/null
sh: line 1: 82408 Abort trap: 6 samtools 2>&1
2023-09-25 17:53:04 INFO::preprocessReads.R/preprocessReads: starting...
2023-09-25 17:53:04 INFO::io.R/FastQStreamer.init: initialised FastQ streamer for filename= ~/Downloads/Pilot/RNA_R1_001.fastq.gz
2023-09-25 17:53:04 INFO::io.R/FastQStreamer.init: initialised FastQ streamer for filename= ~/Downloads/Pilot/RNA_R2_001.fastq.gz
2023-09-25 17:53:04 DEBUG::tools.R/processChunks: starting...
2023-09-25 17:53:12 DEBUG::tools.R/processChunks: waiting for chunkid=[ ] ...
2023-09-25 17:53:12 DEBUG::tools.R/processChunks: starting chunkid= 1 ; see logfile= ~/Downloads/Pilot/analysis/xxx/chunks/chunk_000001/logs/progress.log
2023-09-25 17:53:19 DEBUG::tools.R/processChunks: starting chunkid= 2 ; see logfile= ~/Downloads/Pilot/analysis/xxx/chunks/chunk_000002/logs/progress.log
2023-09-25 17:53:27 DEBUG::tools.R/processChunks: starting chunkid= 3 ; see logfile= ~/Downloads/Pilot/analysis/xxx/chunks/chunk_000003/logs/progress.log
2023-09-25 17:53:35 DEBUG::tools.R/processChunks: starting chunkid= 4 ; see logfile= ~/Downloads/Pilot/analysis/xxx/chunks/chunk_000004/logs/progress.log
2023-09-25 17:53:44 DEBUG::tools.R/processChunks: starting chunkid= 5 ; see logfile= ~/Downloads/Pilot/analysis/xxx/chunks/chunk_000005/logs/progress.log
2023-09-25 17:53:53 DEBUG::tools.R/processChunks: starting chunkid= 6 ; see logfile= ~/Downloads/Pilot/analysis/xxx/chunks/chunk_000006/logs/progress.log
2023-09-25 17:54:14 DEBUG::tools.R/processChunks: waiting for chunkid=[ 1, 2, 3, 4, 5, 6 ] ...
2023-09-25 17:55:14 DEBUG::tools.R/processChunks: waiting for chunkid=[ 1, 2, 3, 4, 5, 6 ] ...
2023-09-25 17:56:14 DEBUG::tools.R/processChunks: waiting for chunkid=[ 1, 2, 3, 4, 5, 6 ] ...
2023-09-25 17:57:14 DEBUG::tools.R/processChunks: waiting for chunkid=[ 1, 2, 3, 4, 5, 6 ] ...
2023-09-25 17:58:02 DEBUG::tools.R/processChunks: done with chunkid= 1 ; elapsed.time= 4.845 minutes
2023-09-25 17:58:02 DEBUG::tools.R/processChunks: starting chunkid= 7 ; see logfile= ~/Downloads/Pilot/analysis/xxx/chunks/chunk_000007/logs/progress.log
2023-09-25 17:58:14 DEBUG::tools.R/processChunks: waiting for chunkid=[ 2, 3, 4, 5, 6, 7 ] ...
...
...
...
2023-09-25 18:30:43 INFO::preprocessReads.R/preprocessReads: done
2023-09-25 18:30:43 INFO::preprocessReads.R/buildShortReadReports: generating report_dir= ~/Downloads/Pilot/analysis/xxx/reports/shortReadReport_1 ...
2023-09-25 18:33:10 INFO::preprocessReads.R/buildShortReadReports: generating report_dir= ~/Downloads/Pilot/analysis/xxx/reports/shortReadReport_2 ...
# Afterwards kept running with no message. It has been more than 30 hours.
# Below are the sessionInfo() on Mac
sessionInfo( )
R version 4.2.3 (2023-03-15)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Ventura 13.4
Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] HTSeqGenie_4.28.1 VariantAnnotation_1.42.1 ShortRead_1.54.0 GenomicAlignments_1.32.1 SummarizedExperiment_1.26.1
[6] Biobase_2.56.0 MatrixGenerics_1.8.1 matrixStats_0.63.0 BiocParallel_1.30.4 gmapR_1.38.0
[11] Rsamtools_2.12.0 Biostrings_2.64.1 XVector_0.36.0 GenomicRanges_1.48.0 GenomeInfoDb_1.32.4
[16] IRanges_2.30.1 S4Vectors_0.34.0 BiocGenerics_0.42.0
loaded via a namespace (and not attached):
[1] httr_1.4.5 bit64_4.0.5 VariantTools_1.38.0 BiocFileCache_2.4.0 latticeExtra_0.6-30 blob_1.2.3
[7] BSgenome_1.64.0 GenomeInfoDbData_1.2.8 yaml_2.3.7 progress_1.2.2 pillar_1.9.0 RSQLite_2.3.0
[13] lattice_0.20-45 glue_1.6.2 digest_0.6.31 RColorBrewer_1.1-3 Matrix_1.5-3 chipseq_1.46.0
[19] XML_3.99-0.13 pkgconfig_2.0.3 biomaRt_2.52.0 zlibbioc_1.42.0 jpeg_0.1-10 tibble_3.2.1
[25] KEGGREST_1.36.3 generics_0.1.3 ellipsis_0.3.2 cachem_1.0.7 GenomicFeatures_1.48.4 cli_3.6.0
[31] deldir_1.0-6 magrittr_2.0.3 crayon_1.5.2 memoise_2.0.1 fansi_1.0.4 xml2_1.3.3
[37] hwriter_1.3.2.1 Cairo_1.6-0 tools_4.2.3 prettyunits_1.1.1 hms_1.1.2 BiocIO_1.6.0
[43] lifecycle_1.0.3 stringr_1.5.0 interp_1.1-3 DelayedArray_0.22.0 AnnotationDbi_1.58.0 compiler_4.2.3
[49] rlang_1.1.0 grid_4.2.3 RCurl_1.98-1.10 rstudioapi_0.14 rjson_0.2.21 rappdirs_0.3.3
[55] bitops_1.0-7 restfulr_0.0.15 codetools_0.2-19 DBI_1.1.3 curl_5.0.0 R6_2.5.1
[61] dplyr_1.1.1 rtracklayer_1.56.1 fastmap_1.1.1 bit_4.0.5 utf8_1.2.3 filelock_1.0.2
[67] stringi_1.7.12 parallel_4.2.3 Rcpp_1.0.10 vctrs_0.6.1 png_0.1-8 dbplyr_2.3.1
[73] tidyselect_1.2.0
I found that it's always the last chunk that took time forever, even though the last chunk is the smallest one.