The support.bioconductor.org editor has been updated to markdown! Please see more info at: Tutorial: Updated Support Site Editor

Question: ASpli readCounts never ends!
1
gravatar for fernandalpcosta
11 days ago by
Brazil/Campinas/UNICAMP
fernandalpcosta0 wrote:

Hello all,

I'm running ASpli package for alternative splicing analysis in a plant called Glycine max and i'm having the issue of a never ending process when the pipeline reaches the counting step. I let it run for 30 hours in a virtual machine on google of 24CPUs and 150GB RAM, but it never got me any result, so i killed the process. It used around 92GB RAM in the counting step and never ends. I used the 'toy' dataset to test the installation, dependencies and pipeline, and it went great, with no errors.

Whats could be the problem?

Some important info:


The point it gets stuck and never results in anything:

> counts <- readCounts(features=features, bam=bam, targets=targets, cores=20, readLength=100L, maxISize=50000, minAnchor=10)
Read summarization by gene completed
Read summarization by bin completed
Read summarization by ei1 region completed
Read summarization by ie2 region completed

Information about the R session:

 > sessionInfo()
R version 3.5.2 (2018-12-20)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.1 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] GenomicFeatures_1.34.3 AnnotationDbi_1.44.0   Biobase_2.42.0        
 [4] GenomicRanges_1.34.0   GenomeInfoDb_1.18.1    IRanges_2.16.0        
 [7] S4Vectors_0.20.1       BiocGenerics_0.28.0    ASpli_1.8.1           
[10] edgeR_3.24.3           limma_3.38.3          

loaded via a namespace (and not attached):
 [1] ProtGenerics_1.14.0         bitops_1.0-6               
 [3] matrixStats_0.54.0          bit64_0.9-7                
 [5] RColorBrewer_1.1-2          progress_1.2.0             
 [7] httr_1.4.0                  tools_3.5.2                
 [9] backports_1.1.3             R6_2.3.0                   
[11] rpart_4.1-13                Hmisc_4.2-0                
[13] DBI_1.0.0                   lazyeval_0.2.1             
[15] Gviz_1.26.4                 colorspace_1.4-0           
[17] nnet_7.3-12                 gridExtra_2.3              
[19] prettyunits_1.0.2           bit_1.1-14                 
[21] curl_3.3                    compiler_3.5.2             
[23] htmlTable_1.13.1            DelayedArray_0.8.0         
[25] rtracklayer_1.42.1          scales_1.0.0               
[27] checkmate_1.9.1             stringr_1.4.0              
[29] digest_0.6.18               Rsamtools_1.34.1           
[31] foreign_0.8-70              rmarkdown_1.11             
[33] XVector_0.22.0              base64enc_0.1-3            
[35] dichromat_2.0-0             pkgconfig_2.0.2            
[37] htmltools_0.3.6             ensembldb_2.6.5            
[39] BSgenome_1.50.0             htmlwidgets_1.3            
[41] rlang_0.3.1                 rstudioapi_0.9.0           
[43] RSQLite_2.1.1               BiocParallel_1.16.5        
[45] acepack_1.4.1               VariantAnnotation_1.28.10  
[47] RCurl_1.95-4.11             magrittr_1.5               
[49] GenomeInfoDbData_1.2.0      Formula_1.2-3              
[51] Matrix_1.2-15               Rcpp_1.0.0                 
[53] munsell_0.5.0               stringi_1.2.4              
[55] yaml_2.2.0                  SummarizedExperiment_1.12.0
[57] zlibbioc_1.28.0             plyr_1.8.4                 
[59] grid_3.5.2                  blob_1.1.1                 
[61] crayon_1.3.4                lattice_0.20-38            
[63] Biostrings_2.50.2           splines_3.5.2              
[65] hms_0.4.2                   locfit_1.5-9.1             
[67] knitr_1.21                  pillar_1.3.1               
[69] biomaRt_2.38.0              XML_3.98-1.17              
[71] evaluate_0.12               biovizBase_1.30.1          
[73] latticeExtra_0.6-28         data.table_1.12.0          
[75] BiocManager_1.30.4          gtable_0.2.0               
[77] assertthat_0.2.0            ggplot2_3.1.0              
[79] xfun_0.4                    AnnotationFilter_1.6.0     
[81] survival_2.43-3             tibble_2.0.1               
[83] GenomicAlignments_1.18.1    memoise_1.1.0              
[85] cluster_2.0.7-1             BiocStyle_2.10.0

Reference data used from Phytozome: Gmax: Gmax275Wm82.a2.v1.gene.gff3.gz
Fasta file used to generate bam files, also from Phytozome: Gmax275v2.0.fa.gz (Data from this Phytozome link)


Any help or tip will be greatly appreciated!

Thank you all in advance.

Cordially,

Fernanda Costa

ADD COMMENTlink written 11 days ago by fernandalpcosta0
1

Dear Fernanda, thanks for your post. How many BAMs (samples) are you trying to quantify? For the purpose of the diagnosis, can you run readCounts() function using only 1 sample?

Can you share with us the information about coverage of the sequencing (num of reads) and size of genome (you can copy.paste info in Log File after using binGenome() function.

Thanks a lot!

Estefi

ADD REPLYlink written 10 days ago by emancini10

Hello Estefi,

Thank you for your reply!

I'm running the readCounts() function in just one sample of this specific project right now and it seems to be stuck at the same step (also using over 90GB of RAM).

This project has 36 samples, each one with around 50M reads (R1 and R2).

Log File After binGenome():

> features <- binGenome(TxDb)
* Number of extracted Genes = 56044
* Number of extracted Exon Bins = 292007
* Number of extracted intron bins = 398757
* Number of extracted trascripts = 88647
* Number of extracted junctions = 347047
* Number of AS bins (not include external) = 29288
* Number of AS bins (include external) = 29326
* Classified as: 
    ES bins = 11587 (40%)
    IR bins = 891   (3%)
    Alt5'ss bins = 5178 (18%)
    Alt3'ss bins = 6821 (23%)
    Multiple AS bins = 4811 (16%)
    classified as:
            ES bins = 1544  (32%)
            IR bins = 162   (3%)
            Alt5'ss bins = 1392 (29%)
            Alt3'ss bins = 1523 (32%)

Thank you so much for your help,

Fernanda Costa

ADD REPLYlink written 2 days ago by fernandalpcosta0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 345 users visited in the last hour