I hope someone can help in this issue.
I have 8 bam files from mm9 alignment, each ~4-5 geg in size. When I run summarizeOverlaps over 3 files, it takes 2-3 hours to finish and it works although my computer almost freezes up. But when I inquire to summarizeOverlaps for the the 8 bam files together, then keep it overnight (as it takes too long to wait), the computer freezes (although it is 16 geg i7 mac, so supposed to be powerful) and the command never results in anything. I even had it run for 30 hours and it looked like it was consuming memory (~600 mega of ram) but still got nothing. I had to reboot the laptop.
I am making my own txdb file from gtf that I used for the alignment to match the naming of the chromosomes. (script is below).
Do you have any tips on how I can get the summzerOverlaps to work on the 8 files to create one se file without freezing up the computer? I have been trying to do that for the past 2 week and always same result.
Any input is appreciated.
here’s the script:
library("DESeq2") library("GenomicFeatures") library("Rsamtools") library("GenomicAlignments") library("GenomicRanges”) mm9_from_cluster_gtf_txdb <- makeTranscriptDbFromGFF(file="~/Desktop/genes.gtf", format="gtf”) head(seqlevels(mm9_from_cluster_gtf_txdb)) saveDb(mm9_from_cluster_gtf_txdb, file="/Path/To/Libraries/TxDB/mm9_from_cluster_Ensembl_txdb.sqlite”) exonsByGene<-exonsBy(mm9_from_cluster_gtf_txdb,by="gene") seqinfo(exonsByGene) fls <- list.files("Path/To/BamFiles", pattern="paired.accepted_hits.bam", full= TRUE) fls Experiment <- c(fls[2:8], fls) Experiment bamLst_experiment <- BamFileList(Experiment, yieldSize=100000) seqinfo(bamLst_experiment) se_test_experiment <- summarizeOverlaps(exonsByGene,bamLst_experiment, mode="Union", singleEnd=FALSE, ignore.strand=TRUE, fragments=TRUE) <<<This is the step that freezes the computer when I run the 8 of the files together. Sessioninfo() R version 3.1.2 (2014-10-31) Platform: x86_64-apple-darwin13.4.0 (64-bit) locale:  en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages:  stats4 parallel stats graphics grDevices utils datasets methods base other attached packages:  GenomicAlignments_1.2.0 Rsamtools_1.18.1 Biostrings_2.34.0 XVector_0.6.0  GenomicRanges_1.18.3 GenomeInfoDb_1.2.2 IRanges_2.0.0 S4Vectors_0.4.0  BiocGenerics_0.12.0 BiocInstaller_1.16.1 loaded via a namespace (and not attached):  base64enc_0.1-2 BatchJobs_1.5 BBmisc_1.8 BiocParallel_1.0.0 bitops_1.0-6  brew_1.0-6 checkmate_1.5.0 codetools_0.2-9 DBI_0.3.1 digest_0.6.4  fail_1.2 foreach_1.4.2 iterators_1.0.7 RSQLite_1.0.0 sendmailR_1.2-1  stringr_0.6.2 tools_3.1.2 zlibbioc_1.12.0