DEXSeq: summarizeOverlaps() crashes Rstudio with too many samples
Entering edit mode
Last seen 4 days ago

Hi there,

I am trying to use my 42 BAM files (aligned with Subjunc from RSubread package) to prepare a DEXSeqDataSet according to point 2 in the manual (

I changed BamFileList(bamFiles) from the manual to bamFiles, because we used this function already on the bam files. But also with the original code snippet Rstudio crashed.

Since I have seven condition with each six replicates, I would love to use all BAM files for more complex analyses.

Are there any recommendations how to approach without the need of too much RAM (I have approx 60GB left over)?

Best Thomas

txdb = makeTxDbFromGFF("/home/chuddy/bioinformatics/ref_genomes/mouse_38/annotation/ensembl/Mus_musculus.GRCm38.102.gtf.gz")

flattenedAnnotation = exonicParts(txdb, = TRUE)
names(flattenedAnnotation) = sprintf("%s:E%0.3d", flattenedAnnotation$gene_id, flattenedAnnotation$exonic_part)

bamFiles <- list.files("../BAM/subjunc/", full.names = T, include.dirs = F, pattern = "\\.BAM$")
bamFiles <- normalizePath(bamFiles) 

bamFiles = BamFileList(bamFiles)

se = summarizeOverlaps(flattenedAnnotation, 
sessionInfo( )

R version 4.3.2 (2023-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.6 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/ 
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/

 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=de_CH.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=de_CH.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=de_CH.UTF-8       LC_NAME=C                 

time zone: Europe/Zurich
tzcode source: system (glibc)

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] lubridate_1.9.3             forcats_1.0.0               stringr_1.5.1               dplyr_1.1.4                
 [5] purrr_1.0.2                 readr_2.1.4                 tidyr_1.3.0                 tibble_3.2.1               
 [9] ggplot2_3.4.4               tidyverse_2.0.0             GenomicAlignments_1.36.0    Rsamtools_2.16.0           
[13] Biostrings_2.68.1           XVector_0.40.0              GenomicFeatures_1.52.2      DEXSeq_1.46.0              
[17] RColorBrewer_1.1-3          AnnotationDbi_1.62.2        DESeq2_1.40.2               SummarizedExperiment_1.30.2
[21] GenomicRanges_1.52.1        GenomeInfoDb_1.36.4         IRanges_2.34.1              S4Vectors_0.38.2           
[25] MatrixGenerics_1.12.3       matrixStats_1.2.0           Biobase_2.60.0              BiocGenerics_0.46.0        
[29] BiocParallel_1.34.2        

loaded via a namespace (and not attached):
 [1] DBI_1.2.0               bitops_1.0-7            biomaRt_2.56.1          rlang_1.1.2             magrittr_2.0.3         
 [6] compiler_4.3.2          RSQLite_2.3.4           png_0.1-8               vctrs_0.6.5             pkgconfig_2.0.3        
[11] crayon_1.5.2            fastmap_1.1.1           dbplyr_2.4.0            utf8_1.2.4              tzdb_0.4.0             
[16] bit_4.0.5               xfun_0.41               zlibbioc_1.46.0         cachem_1.0.8            progress_1.2.3         
[21] blob_1.2.4              DelayedArray_0.26.7     parallel_4.3.2          prettyunits_1.2.0       R6_2.5.1               
[26] stringi_1.8.3           rtracklayer_1.60.1      genefilter_1.82.1       Rcpp_1.0.11             knitr_1.45             
[31] timechange_0.2.0        Matrix_1.6-4            splines_4.3.2           tidyselect_1.2.0        rstudioapi_0.15.0      
[36] abind_1.4-5             yaml_2.3.8              codetools_0.2-19        hwriter_1.3.2.1         curl_5.2.0             
[41] lattice_0.22-5          withr_2.5.2             KEGGREST_1.40.1         survival_3.5-7          BiocFileCache_2.8.0    
[46] xml2_1.3.6              pillar_1.9.0            filelock_1.0.3          generics_0.1.3          RCurl_1.98-1.13        
[51] hms_1.1.3               munsell_0.5.0           scales_1.3.0            xtable_1.8-4            glue_1.6.2             
[56] tools_4.3.2             BiocIO_1.10.0           annotate_1.78.0         locfit_1.5-9.8          XML_3.99-0.16          
[61] grid_4.3.2              colorspace_2.1-0        GenomeInfoDbData_1.2.10 restfulr_0.0.15         cli_3.6.2              
[66] rappdirs_0.3.3          fansi_1.0.6             S4Arrays_1.0.6          gtable_0.3.4            digest_0.6.33          
[71] geneplotter_1.78.0      rjson_0.2.21            memoise_2.0.1           lifecycle_1.0.4         httr_1.4.7             
[76] statmod_1.5.0           bit64_4.0.5
DEXSeq • 559 views
Entering edit mode
Last seen 4 hours ago
United States

You can read in chunks, using the 'yieldSize' argument, which may help with the memory situation. An alternative is to use Rsubread, which is likely faster, and might be more memory efficient.

Entering edit mode

thank you for your reply. if I add "yieldSize = 1000000" as a further argument I get Error:

se = summarizeOverlaps(flattenedAnnotation, 
+                        bamFiles, 
+                        singleEnd=T,
+                        ignore.strand=TRUE, yieldSize = 1000000 
+                        )
Error: BiocParallel errors
  1 remote errors, element index: 1
  0 unevaluated and other errors
  first remote error:
Error in FUN(bf, param = param, ...): unused argument (yieldSize = 1e+06)

I would love to use Rsubread to get the counts, since I also use it for DEG analysis, however, I wasn't able to import the Rsubread count matrix into the DEXSeq pipeline, despite there being one tool, which did not work for me with the info or manual I found online. I will try again though and hope that it works with the updated knowledge on objects used in DEXSeq.

Entering edit mode

yieldSize is an argument for BamFileList, not summarizeOverlaps

Entering edit mode

thank you, this worked.


Login before adding your answer.

Traffic: 635 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6