Hi there,
I am trying to use my 42 BAM files (aligned with Subjunc from RSubread package) to prepare a DEXSeqDataSet according to point 2 in the manual (https://bioconductor.org/packages/release/bioc/vignettes/DEXSeq/inst/doc/DEXSeq.html#2_Preparations)
I changed BamFileList(bamFiles) from the manual to bamFiles, because we used this function already on the bam files. But also with the original code snippet Rstudio crashed.
Since I have seven condition with each six replicates, I would love to use all BAM files for more complex analyses.
Are there any recommendations how to approach without the need of too much RAM (I have approx 60GB left over)?
Best Thomas
txdb = makeTxDbFromGFF("/home/chuddy/bioinformatics/ref_genomes/mouse_38/annotation/ensembl/Mus_musculus.GRCm38.102.gtf.gz")
flattenedAnnotation = exonicParts(txdb, linked.to.single.gene.only = TRUE)
names(flattenedAnnotation) = sprintf("%s:E%0.3d", flattenedAnnotation$gene_id, flattenedAnnotation$exonic_part)
bamFiles <- list.files("../BAM/subjunc/", full.names = T, include.dirs = F, pattern = "\\.BAM$")
bamFiles <- normalizePath(bamFiles)
bamFiles = BamFileList(bamFiles)
se = summarizeOverlaps(flattenedAnnotation,
bamFiles,
singleEnd=T,
ignore.strand=TRUE)
sessionInfo( )
R version 4.3.2 (2023-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.6 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=de_CH.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=de_CH.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=de_CH.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=de_CH.UTF-8 LC_IDENTIFICATION=C
time zone: Europe/Zurich
tzcode source: system (glibc)
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] lubridate_1.9.3 forcats_1.0.0 stringr_1.5.1 dplyr_1.1.4
[5] purrr_1.0.2 readr_2.1.4 tidyr_1.3.0 tibble_3.2.1
[9] ggplot2_3.4.4 tidyverse_2.0.0 GenomicAlignments_1.36.0 Rsamtools_2.16.0
[13] Biostrings_2.68.1 XVector_0.40.0 GenomicFeatures_1.52.2 DEXSeq_1.46.0
[17] RColorBrewer_1.1-3 AnnotationDbi_1.62.2 DESeq2_1.40.2 SummarizedExperiment_1.30.2
[21] GenomicRanges_1.52.1 GenomeInfoDb_1.36.4 IRanges_2.34.1 S4Vectors_0.38.2
[25] MatrixGenerics_1.12.3 matrixStats_1.2.0 Biobase_2.60.0 BiocGenerics_0.46.0
[29] BiocParallel_1.34.2
loaded via a namespace (and not attached):
[1] DBI_1.2.0 bitops_1.0-7 biomaRt_2.56.1 rlang_1.1.2 magrittr_2.0.3
[6] compiler_4.3.2 RSQLite_2.3.4 png_0.1-8 vctrs_0.6.5 pkgconfig_2.0.3
[11] crayon_1.5.2 fastmap_1.1.1 dbplyr_2.4.0 utf8_1.2.4 tzdb_0.4.0
[16] bit_4.0.5 xfun_0.41 zlibbioc_1.46.0 cachem_1.0.8 progress_1.2.3
[21] blob_1.2.4 DelayedArray_0.26.7 parallel_4.3.2 prettyunits_1.2.0 R6_2.5.1
[26] stringi_1.8.3 rtracklayer_1.60.1 genefilter_1.82.1 Rcpp_1.0.11 knitr_1.45
[31] timechange_0.2.0 Matrix_1.6-4 splines_4.3.2 tidyselect_1.2.0 rstudioapi_0.15.0
[36] abind_1.4-5 yaml_2.3.8 codetools_0.2-19 hwriter_1.3.2.1 curl_5.2.0
[41] lattice_0.22-5 withr_2.5.2 KEGGREST_1.40.1 survival_3.5-7 BiocFileCache_2.8.0
[46] xml2_1.3.6 pillar_1.9.0 filelock_1.0.3 generics_0.1.3 RCurl_1.98-1.13
[51] hms_1.1.3 munsell_0.5.0 scales_1.3.0 xtable_1.8-4 glue_1.6.2
[56] tools_4.3.2 BiocIO_1.10.0 annotate_1.78.0 locfit_1.5-9.8 XML_3.99-0.16
[61] grid_4.3.2 colorspace_2.1-0 GenomeInfoDbData_1.2.10 restfulr_0.0.15 cli_3.6.2
[66] rappdirs_0.3.3 fansi_1.0.6 S4Arrays_1.0.6 gtable_0.3.4 digest_0.6.33
[71] geneplotter_1.78.0 rjson_0.2.21 memoise_2.0.1 lifecycle_1.0.4 httr_1.4.7
[76] statmod_1.5.0 bit64_4.0.5
thank you for your reply. if I add "yieldSize = 1000000" as a further argument I get Error:
I would love to use Rsubread to get the counts, since I also use it for DEG analysis, however, I wasn't able to import the Rsubread count matrix into the DEXSeq pipeline, despite there being one tool, which did not work for me with the info or manual I found online. I will try again though and hope that it works with the updated knowledge on objects used in DEXSeq.
yieldSize is an argument for
BamFileList
, notsummarizeOverlaps
thank you, this worked.