DEXSeq: summarizeOverlaps() crashes Rstudio with too many samples
1
0
Entering edit mode
@adde7aac
Last seen 8 weeks ago
Switzerland

Hi there,

I am trying to use my 42 BAM files (aligned with Subjunc from RSubread package) to prepare a DEXSeqDataSet according to point 2 in the manual (https://bioconductor.org/packages/release/bioc/vignettes/DEXSeq/inst/doc/DEXSeq.html#2_Preparations)

I changed BamFileList(bamFiles) from the manual to bamFiles, because we used this function already on the bam files. But also with the original code snippet Rstudio crashed.

Since I have seven condition with each six replicates, I would love to use all BAM files for more complex analyses.

Are there any recommendations how to approach without the need of too much RAM (I have approx 60GB left over)?

Best Thomas

txdb = makeTxDbFromGFF("/home/chuddy/bioinformatics/ref_genomes/mouse_38/annotation/ensembl/Mus_musculus.GRCm38.102.gtf.gz")


flattenedAnnotation = exonicParts(txdb, linked.to.single.gene.only = TRUE)
names(flattenedAnnotation) = sprintf("%s:E%0.3d", flattenedAnnotation$gene_id, flattenedAnnotation$exonic_part)

bamFiles <- list.files("../BAM/subjunc/", full.names = T, include.dirs = F, pattern = "\\.BAM$")
bamFiles <- normalizePath(bamFiles) 

bamFiles = BamFileList(bamFiles)

se = summarizeOverlaps(flattenedAnnotation, 
                       bamFiles, 
                       singleEnd=T,
                       ignore.strand=TRUE)
sessionInfo( )

R version 4.3.2 (2023-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.6 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=de_CH.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=de_CH.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=de_CH.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=de_CH.UTF-8 LC_IDENTIFICATION=C       

time zone: Europe/Zurich
tzcode source: system (glibc)

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] lubridate_1.9.3             forcats_1.0.0               stringr_1.5.1               dplyr_1.1.4                
 [5] purrr_1.0.2                 readr_2.1.4                 tidyr_1.3.0                 tibble_3.2.1               
 [9] ggplot2_3.4.4               tidyverse_2.0.0             GenomicAlignments_1.36.0    Rsamtools_2.16.0           
[13] Biostrings_2.68.1           XVector_0.40.0              GenomicFeatures_1.52.2      DEXSeq_1.46.0              
[17] RColorBrewer_1.1-3          AnnotationDbi_1.62.2        DESeq2_1.40.2               SummarizedExperiment_1.30.2
[21] GenomicRanges_1.52.1        GenomeInfoDb_1.36.4         IRanges_2.34.1              S4Vectors_0.38.2           
[25] MatrixGenerics_1.12.3       matrixStats_1.2.0           Biobase_2.60.0              BiocGenerics_0.46.0        
[29] BiocParallel_1.34.2        

loaded via a namespace (and not attached):
 [1] DBI_1.2.0               bitops_1.0-7            biomaRt_2.56.1          rlang_1.1.2             magrittr_2.0.3         
 [6] compiler_4.3.2          RSQLite_2.3.4           png_0.1-8               vctrs_0.6.5             pkgconfig_2.0.3        
[11] crayon_1.5.2            fastmap_1.1.1           dbplyr_2.4.0            utf8_1.2.4              tzdb_0.4.0             
[16] bit_4.0.5               xfun_0.41               zlibbioc_1.46.0         cachem_1.0.8            progress_1.2.3         
[21] blob_1.2.4              DelayedArray_0.26.7     parallel_4.3.2          prettyunits_1.2.0       R6_2.5.1               
[26] stringi_1.8.3           rtracklayer_1.60.1      genefilter_1.82.1       Rcpp_1.0.11             knitr_1.45             
[31] timechange_0.2.0        Matrix_1.6-4            splines_4.3.2           tidyselect_1.2.0        rstudioapi_0.15.0      
[36] abind_1.4-5             yaml_2.3.8              codetools_0.2-19        hwriter_1.3.2.1         curl_5.2.0             
[41] lattice_0.22-5          withr_2.5.2             KEGGREST_1.40.1         survival_3.5-7          BiocFileCache_2.8.0    
[46] xml2_1.3.6              pillar_1.9.0            filelock_1.0.3          generics_0.1.3          RCurl_1.98-1.13        
[51] hms_1.1.3               munsell_0.5.0           scales_1.3.0            xtable_1.8-4            glue_1.6.2             
[56] tools_4.3.2             BiocIO_1.10.0           annotate_1.78.0         locfit_1.5-9.8          XML_3.99-0.16          
[61] grid_4.3.2              colorspace_2.1-0        GenomeInfoDbData_1.2.10 restfulr_0.0.15         cli_3.6.2              
[66] rappdirs_0.3.3          fansi_1.0.6             S4Arrays_1.0.6          gtable_0.3.4            digest_0.6.33          
[71] geneplotter_1.78.0      rjson_0.2.21            memoise_2.0.1           lifecycle_1.0.4         httr_1.4.7             
[76] statmod_1.5.0           bit64_4.0.5
DEXSeq • 761 views
ADD COMMENT
1
Entering edit mode
@james-w-macdonald-5106
Last seen 3 days ago
United States

You can read in chunks, using the 'yieldSize' argument, which may help with the memory situation. An alternative is to use Rsubread, which is likely faster, and might be more memory efficient.

ADD COMMENT
0
Entering edit mode

thank you for your reply. if I add "yieldSize = 1000000" as a further argument I get Error:

se = summarizeOverlaps(flattenedAnnotation, 
+                        bamFiles, 
+                        singleEnd=T,
+                        ignore.strand=TRUE, yieldSize = 1000000 
+                        )
Error: BiocParallel errors
  1 remote errors, element index: 1
  0 unevaluated and other errors
  first remote error:
Error in FUN(bf, param = param, ...): unused argument (yieldSize = 1e+06)

I would love to use Rsubread to get the counts, since I also use it for DEG analysis, however, I wasn't able to import the Rsubread count matrix into the DEXSeq pipeline, despite there being one tool, which did not work for me with the info or manual I found online. I will try again though and hope that it works with the updated knowledge on objects used in DEXSeq.

ADD REPLY
1
Entering edit mode

yieldSize is an argument for BamFileList, not summarizeOverlaps

ADD REPLY
0
Entering edit mode

thank you, this worked.

ADD REPLY

Login before adding your answer.

Traffic: 669 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6