Question

ASpli readCounts never ends!

2

Entering edit mode

fernandalpcosta ▴ 10

@fernandalpcosta-19812

Last seen 7.0 years ago

Brazil/Campinas/UNICAMP

Hello all,

I'm running ASpli package for alternative splicing analysis in a plant called Glycine max and i'm having the issue of a never ending process when the pipeline reaches the counting step. I let it run for 30 hours in a virtual machine on google of 24CPUs and 150GB RAM, but it never got me any result, so i killed the process. It used around 92GB RAM in the counting step and never ends. I used the 'toy' dataset to test the installation, dependencies and pipeline, and it went great, with no errors.

Whats could be the problem?

Some important info:

The point it gets stuck and never results in anything:

> counts <- readCounts(features=features, bam=bam, targets=targets, cores=20, readLength=100L, maxISize=50000, minAnchor=10)
Read summarization by gene completed
Read summarization by bin completed
Read summarization by ei1 region completed
Read summarization by ie2 region completed

Information about the R session:

 > sessionInfo()
R version 3.5.2 (2018-12-20)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.1 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] GenomicFeatures_1.34.3 AnnotationDbi_1.44.0   Biobase_2.42.0        
 [4] GenomicRanges_1.34.0   GenomeInfoDb_1.18.1    IRanges_2.16.0        
 [7] S4Vectors_0.20.1       BiocGenerics_0.28.0    ASpli_1.8.1           
[10] edgeR_3.24.3           limma_3.38.3          

loaded via a namespace (and not attached):
 [1] ProtGenerics_1.14.0         bitops_1.0-6               
 [3] matrixStats_0.54.0          bit64_0.9-7                
 [5] RColorBrewer_1.1-2          progress_1.2.0             
 [7] httr_1.4.0                  tools_3.5.2                
 [9] backports_1.1.3             R6_2.3.0                   
[11] rpart_4.1-13                Hmisc_4.2-0                
[13] DBI_1.0.0                   lazyeval_0.2.1             
[15] Gviz_1.26.4                 colorspace_1.4-0           
[17] nnet_7.3-12                 gridExtra_2.3              
[19] prettyunits_1.0.2           bit_1.1-14                 
[21] curl_3.3                    compiler_3.5.2             
[23] htmlTable_1.13.1            DelayedArray_0.8.0         
[25] rtracklayer_1.42.1          scales_1.0.0               
[27] checkmate_1.9.1             stringr_1.4.0              
[29] digest_0.6.18               Rsamtools_1.34.1           
[31] foreign_0.8-70              rmarkdown_1.11             
[33] XVector_0.22.0              base64enc_0.1-3            
[35] dichromat_2.0-0             pkgconfig_2.0.2            
[37] htmltools_0.3.6             ensembldb_2.6.5            
[39] BSgenome_1.50.0             htmlwidgets_1.3            
[41] rlang_0.3.1                 rstudioapi_0.9.0           
[43] RSQLite_2.1.1               BiocParallel_1.16.5        
[45] acepack_1.4.1               VariantAnnotation_1.28.10  
[47] RCurl_1.95-4.11             magrittr_1.5               
[49] GenomeInfoDbData_1.2.0      Formula_1.2-3              
[51] Matrix_1.2-15               Rcpp_1.0.0                 
[53] munsell_0.5.0               stringi_1.2.4              
[55] yaml_2.2.0                  SummarizedExperiment_1.12.0
[57] zlibbioc_1.28.0             plyr_1.8.4                 
[59] grid_3.5.2                  blob_1.1.1                 
[61] crayon_1.3.4                lattice_0.20-38            
[63] Biostrings_2.50.2           splines_3.5.2              
[65] hms_0.4.2                   locfit_1.5-9.1             
[67] knitr_1.21                  pillar_1.3.1               
[69] biomaRt_2.38.0              XML_3.98-1.17              
[71] evaluate_0.12               biovizBase_1.30.1          
[73] latticeExtra_0.6-28         data.table_1.12.0          
[75] BiocManager_1.30.4          gtable_0.2.0               
[77] assertthat_0.2.0            ggplot2_3.1.0              
[79] xfun_0.4                    AnnotationFilter_1.6.0     
[81] survival_2.43-3             tibble_2.0.1               
[83] GenomicAlignments_1.18.1    memoise_1.1.0              
[85] cluster_2.0.7-1             BiocStyle_2.10.0

Reference data used from Phytozome: Gmax: Gmax275Wm82.a2.v1.gene.gff3.gz
Fasta file used to generate bam files, also from Phytozome: Gmax275v2.0.fa.gz (Data from this Phytozome link)

Any help or tip will be greatly appreciated!

Thank you all in advance.

Cordially,

Fernanda Costa

aspli readCounts rna-seq alternative splicing AS • 1.6k views

ADD COMMENT • link 7.0 years ago fernandalpcosta ▴ 10

1

Entering edit mode

Dear Fernanda, thanks for your post. How many BAMs (samples) are you trying to quantify? For the purpose of the diagnosis, can you run readCounts() function using only 1 sample?

Can you share with us the information about coverage of the sequencing (num of reads) and size of genome (you can copy.paste info in Log File after using binGenome() function.

Thanks a lot!

Estefi

ADD REPLY • link 7.0 years ago emancini ▴ 50

0

Entering edit mode

Hello Estefi,

Thank you for your reply!

I'm running the readCounts() function in just one sample of this specific project right now and it seems to be stuck at the same step (also using over 90GB of RAM).

This project has 36 samples, each one with around 50M reads (R1 and R2).

Log File After binGenome():

> features <- binGenome(TxDb)
* Number of extracted Genes = 56044
* Number of extracted Exon Bins = 292007
* Number of extracted intron bins = 398757
* Number of extracted trascripts = 88647
* Number of extracted junctions = 347047
* Number of AS bins (not include external) = 29288
* Number of AS bins (include external) = 29326
* Classified as: 
    ES bins = 11587 (40%)
    IR bins = 891   (3%)
    Alt5'ss bins = 5178 (18%)
    Alt3'ss bins = 6821 (23%)
    Multiple AS bins = 4811 (16%)
    classified as:
            ES bins = 1544  (32%)
            IR bins = 162   (3%)
            Alt5'ss bins = 1392 (29%)
            Alt3'ss bins = 1523 (32%)

Thank you so much for your help,

Fernanda Costa

ADD REPLY • link 7.0 years ago fernandalpcosta ▴ 10