Hello all,
I'm running ASpli package for alternative splicing analysis in a plant called Glycine max and i'm having the issue of a never ending process when the pipeline reaches the counting step. I let it run for 30 hours in a virtual machine on google of 24CPUs and 150GB RAM, but it never got me any result, so i killed the process. It used around 92GB RAM in the counting step and never ends. I used the 'toy' dataset to test the installation, dependencies and pipeline, and it went great, with no errors.
Whats could be the problem?
Some important info:
The point it gets stuck and never results in anything:
> counts <- readCounts(features=features, bam=bam, targets=targets, cores=20, readLength=100L, maxISize=50000, minAnchor=10)
Read summarization by gene completed
Read summarization by bin completed
Read summarization by ei1 region completed
Read summarization by ie2 region completed
Information about the R session:
> sessionInfo()
R version 3.5.2 (2018-12-20)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.1 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
locale:
[1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
[4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
[7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] GenomicFeatures_1.34.3 AnnotationDbi_1.44.0 Biobase_2.42.0
[4] GenomicRanges_1.34.0 GenomeInfoDb_1.18.1 IRanges_2.16.0
[7] S4Vectors_0.20.1 BiocGenerics_0.28.0 ASpli_1.8.1
[10] edgeR_3.24.3 limma_3.38.3
loaded via a namespace (and not attached):
[1] ProtGenerics_1.14.0 bitops_1.0-6
[3] matrixStats_0.54.0 bit64_0.9-7
[5] RColorBrewer_1.1-2 progress_1.2.0
[7] httr_1.4.0 tools_3.5.2
[9] backports_1.1.3 R6_2.3.0
[11] rpart_4.1-13 Hmisc_4.2-0
[13] DBI_1.0.0 lazyeval_0.2.1
[15] Gviz_1.26.4 colorspace_1.4-0
[17] nnet_7.3-12 gridExtra_2.3
[19] prettyunits_1.0.2 bit_1.1-14
[21] curl_3.3 compiler_3.5.2
[23] htmlTable_1.13.1 DelayedArray_0.8.0
[25] rtracklayer_1.42.1 scales_1.0.0
[27] checkmate_1.9.1 stringr_1.4.0
[29] digest_0.6.18 Rsamtools_1.34.1
[31] foreign_0.8-70 rmarkdown_1.11
[33] XVector_0.22.0 base64enc_0.1-3
[35] dichromat_2.0-0 pkgconfig_2.0.2
[37] htmltools_0.3.6 ensembldb_2.6.5
[39] BSgenome_1.50.0 htmlwidgets_1.3
[41] rlang_0.3.1 rstudioapi_0.9.0
[43] RSQLite_2.1.1 BiocParallel_1.16.5
[45] acepack_1.4.1 VariantAnnotation_1.28.10
[47] RCurl_1.95-4.11 magrittr_1.5
[49] GenomeInfoDbData_1.2.0 Formula_1.2-3
[51] Matrix_1.2-15 Rcpp_1.0.0
[53] munsell_0.5.0 stringi_1.2.4
[55] yaml_2.2.0 SummarizedExperiment_1.12.0
[57] zlibbioc_1.28.0 plyr_1.8.4
[59] grid_3.5.2 blob_1.1.1
[61] crayon_1.3.4 lattice_0.20-38
[63] Biostrings_2.50.2 splines_3.5.2
[65] hms_0.4.2 locfit_1.5-9.1
[67] knitr_1.21 pillar_1.3.1
[69] biomaRt_2.38.0 XML_3.98-1.17
[71] evaluate_0.12 biovizBase_1.30.1
[73] latticeExtra_0.6-28 data.table_1.12.0
[75] BiocManager_1.30.4 gtable_0.2.0
[77] assertthat_0.2.0 ggplot2_3.1.0
[79] xfun_0.4 AnnotationFilter_1.6.0
[81] survival_2.43-3 tibble_2.0.1
[83] GenomicAlignments_1.18.1 memoise_1.1.0
[85] cluster_2.0.7-1 BiocStyle_2.10.0
Reference data used from Phytozome: Gmax: Gmax275Wm82.a2.v1.gene.gff3.gz
Fasta file used to generate bam files, also from Phytozome: Gmax275v2.0.fa.gz
(Data from this Phytozome link)
Any help or tip will be greatly appreciated!
Thank you all in advance.
Cordially,
Fernanda Costa
Dear Fernanda, thanks for your post. How many BAMs (samples) are you trying to quantify? For the purpose of the diagnosis, can you run readCounts() function using only 1 sample?
Can you share with us the information about coverage of the sequencing (num of reads) and size of genome (you can copy.paste info in Log File after using binGenome() function.
Thanks a lot!
Estefi
Hello Estefi,
Thank you for your reply!
I'm running the readCounts() function in just one sample of this specific project right now and it seems to be stuck at the same step (also using over 90GB of RAM).
This project has 36 samples, each one with around 50M reads (R1 and R2).
Log File After binGenome():
Thank you so much for your help,
Fernanda Costa