HTSeqGenie runs very slow
0
0
Entering edit mode
zh9118 • 0
@zh9118-21668
Last seen 16 days ago
United States

Hi,

I am running HTSeqGenie on paired-end RNA-Seq samples. Two fastq files are both around 3G. I tried the same code on both Mac and Red Hat Linux. They are running super slow on both OS. The running messages are below the codes. Thanks for helping.

# This is the R code used to run the sample
save_dir <- runPipeline(
    shortReadReport.do = T,

    ## input
    input_file = "~/Downloads/Pilot/RNA_R1_001.fastq.gz",
    input_file2 = "~/Downloads/Pilot/RNA_R2_001.fastq.gz",
    paired_ends = TRUE,
    quality_encoding = "illumina1.8",

    ## system
    num_cores = 6,
    debug.tracemem = F,

    ## output
    save_dir = paste("~/Downloads/Pilot/analysis/", Sample.ID, sep=''),
    prepend_str = Sample.ID,
    overwrite_save_dir = "erase",
    remove_processedfastq = F,
    remove_chunkdir = F,

    ## trim reads
    trimReads.do = FALSE,
    # trimReads.length = NULL,
    # trimReads.trim5 = 0,

    ## Filter
    filterQuality.do = T,
    filterQuality.minQuality = 23,
    filterQuality.minFrac = 0.7,
    filterQuality.minLength = 18,

    ## detect adapter contamination
    detectAdapterContam.do = T,
    detectAdapterContam.force_paired_end_adapter = F,

    ## detect ribosomal RNA
    detectRRNA.do = T,
    detectRRNA.rrna_genome = "gencode_v43_rRNA",

    ## aligner
    path.gsnap_genomes = "~/Downloads/Genome/Human/",
    alignReads.genome = "hg38",
    alignReads.static_parameters = "-M 2 -n 10 -B 2 -i 1 -N 1 -w 200000 -E 1 --pairmax-rna=200000 --clip-overlap",
    alignReads.sam_id = Sample.ID,
    alignReads.use_gmapR_gsnap = F,

    ## gene model
    path.genomic_features = "~/Downloads/Gencode.v43/",
    countGenomicFeatures.do = F,
    countGenomicFeatures.gfeatures = "Gencode.v43.RData",

    # Other process off
    markDuplicates.do = F,
    coverage.do = F,
    analyzeVariants.do = F
  )
# Below are the running messages on Mac:
checkConfig.R/checkConfig.template: loading template config= inst/config/default-config.txt 
sh: line 1: 82406 Abort trap: 6           samtools 2> /dev/null
sh: line 1: 82408 Abort trap: 6           samtools 2>&1
2023-09-25 17:53:04 INFO::preprocessReads.R/preprocessReads: starting...
2023-09-25 17:53:04 INFO::io.R/FastQStreamer.init: initialised FastQ streamer for filename= ~/Downloads/Pilot/RNA_R1_001.fastq.gz
2023-09-25 17:53:04 INFO::io.R/FastQStreamer.init: initialised FastQ streamer for filename= ~/Downloads/Pilot/RNA_R2_001.fastq.gz
2023-09-25 17:53:04 DEBUG::tools.R/processChunks: starting...
2023-09-25 17:53:12 DEBUG::tools.R/processChunks: waiting for chunkid=[  ] ...
2023-09-25 17:53:12 DEBUG::tools.R/processChunks: starting chunkid= 1 ; see logfile= ~/Downloads/Pilot/analysis/xxx/chunks/chunk_000001/logs/progress.log
2023-09-25 17:53:19 DEBUG::tools.R/processChunks: starting chunkid= 2 ; see logfile= ~/Downloads/Pilot/analysis/xxx/chunks/chunk_000002/logs/progress.log
2023-09-25 17:53:27 DEBUG::tools.R/processChunks: starting chunkid= 3 ; see logfile= ~/Downloads/Pilot/analysis/xxx/chunks/chunk_000003/logs/progress.log
2023-09-25 17:53:35 DEBUG::tools.R/processChunks: starting chunkid= 4 ; see logfile= ~/Downloads/Pilot/analysis/xxx/chunks/chunk_000004/logs/progress.log
2023-09-25 17:53:44 DEBUG::tools.R/processChunks: starting chunkid= 5 ; see logfile= ~/Downloads/Pilot/analysis/xxx/chunks/chunk_000005/logs/progress.log
2023-09-25 17:53:53 DEBUG::tools.R/processChunks: starting chunkid= 6 ; see logfile= ~/Downloads/Pilot/analysis/xxx/chunks/chunk_000006/logs/progress.log
2023-09-25 17:54:14 DEBUG::tools.R/processChunks: waiting for chunkid=[ 1, 2, 3, 4, 5, 6 ] ...
2023-09-25 17:55:14 DEBUG::tools.R/processChunks: waiting for chunkid=[ 1, 2, 3, 4, 5, 6 ] ...
2023-09-25 17:56:14 DEBUG::tools.R/processChunks: waiting for chunkid=[ 1, 2, 3, 4, 5, 6 ] ...
2023-09-25 17:57:14 DEBUG::tools.R/processChunks: waiting for chunkid=[ 1, 2, 3, 4, 5, 6 ] ...
2023-09-25 17:58:02 DEBUG::tools.R/processChunks: done with chunkid= 1 ; elapsed.time= 4.845 minutes
2023-09-25 17:58:02 DEBUG::tools.R/processChunks: starting chunkid= 7 ; see logfile= ~/Downloads/Pilot/analysis/xxx/chunks/chunk_000007/logs/progress.log
2023-09-25 17:58:14 DEBUG::tools.R/processChunks: waiting for chunkid=[ 2, 3, 4, 5, 6, 7 ] ...
...
...
...
2023-09-25 18:30:43 INFO::preprocessReads.R/preprocessReads: done
2023-09-25 18:30:43 INFO::preprocessReads.R/buildShortReadReports: generating report_dir= ~/Downloads/Pilot/analysis/xxx/reports/shortReadReport_1 ...
2023-09-25 18:33:10 INFO::preprocessReads.R/buildShortReadReports: generating report_dir= ~/Downloads/Pilot/analysis/xxx/reports/shortReadReport_2 ...
# Afterwards kept running with no message. It has been more than 30 hours.
# Below are the sessionInfo() on Mac
sessionInfo( )
R version 4.2.3 (2023-03-15)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Ventura 13.4

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] HTSeqGenie_4.28.1           VariantAnnotation_1.42.1    ShortRead_1.54.0            GenomicAlignments_1.32.1    SummarizedExperiment_1.26.1
 [6] Biobase_2.56.0              MatrixGenerics_1.8.1        matrixStats_0.63.0          BiocParallel_1.30.4         gmapR_1.38.0               
[11] Rsamtools_2.12.0            Biostrings_2.64.1           XVector_0.36.0              GenomicRanges_1.48.0        GenomeInfoDb_1.32.4        
[16] IRanges_2.30.1              S4Vectors_0.34.0            BiocGenerics_0.42.0        

loaded via a namespace (and not attached):
 [1] httr_1.4.5             bit64_4.0.5            VariantTools_1.38.0    BiocFileCache_2.4.0    latticeExtra_0.6-30    blob_1.2.3            
 [7] BSgenome_1.64.0        GenomeInfoDbData_1.2.8 yaml_2.3.7             progress_1.2.2         pillar_1.9.0           RSQLite_2.3.0         
[13] lattice_0.20-45        glue_1.6.2             digest_0.6.31          RColorBrewer_1.1-3     Matrix_1.5-3           chipseq_1.46.0        
[19] XML_3.99-0.13          pkgconfig_2.0.3        biomaRt_2.52.0         zlibbioc_1.42.0        jpeg_0.1-10            tibble_3.2.1          
[25] KEGGREST_1.36.3        generics_0.1.3         ellipsis_0.3.2         cachem_1.0.7           GenomicFeatures_1.48.4 cli_3.6.0             
[31] deldir_1.0-6           magrittr_2.0.3         crayon_1.5.2           memoise_2.0.1          fansi_1.0.4            xml2_1.3.3            
[37] hwriter_1.3.2.1        Cairo_1.6-0            tools_4.2.3            prettyunits_1.1.1      hms_1.1.2              BiocIO_1.6.0          
[43] lifecycle_1.0.3        stringr_1.5.0          interp_1.1-3           DelayedArray_0.22.0    AnnotationDbi_1.58.0   compiler_4.2.3        
[49] rlang_1.1.0            grid_4.2.3             RCurl_1.98-1.10        rstudioapi_0.14        rjson_0.2.21           rappdirs_0.3.3        
[55] bitops_1.0-7           restfulr_0.0.15        codetools_0.2-19       DBI_1.1.3              curl_5.0.0             R6_2.5.1              
[61] dplyr_1.1.1            rtracklayer_1.56.1     fastmap_1.1.1          bit_4.0.5              utf8_1.2.3             filelock_1.0.2        
[67] stringi_1.7.12         parallel_4.2.3         Rcpp_1.0.10            vctrs_0.6.1            png_0.1-8              dbplyr_2.3.1          
[73] tidyselect_1.2.0
HTSeqGenie • 304 views
ADD COMMENT
0
Entering edit mode

I found that it's always the last chunk that took time forever, even though the last chunk is the smallest one.

enter image description here

ADD REPLY

Login before adding your answer.

Traffic: 557 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6