Dear all:
I'm recently dealing with a set of MeDIP-seq data with MEDIPS but got an error with "MEDIPS.creatSet". Could anyone with experience help me? Thank you.
My raw paired end reads were mapped to UCSC_mm10 reference genome with BWA. Then I dealt with the data with following codes:
library(MEDIPS) library(BSgenome.Mmusculus.UCSC.mm10) BSgenome="BSgenome.Mmusculus.UCSC.mm10" chr.select=c(“chr1”,"chr2","chr3","chr4","chr5","chr6","chr7","chr8","chr9","chr10","chr11","chr12","chr13","chr14","chr15","chr16","chr17","chr18","chr19","chrX","chrY") extend=0 shift=0 ws=300 uniq=1 ME_Input_2 = MEDIPS.createSet(file = "/home/aligned_bam/LSK-ME-I_sorted.bam", BSgenome = BSgenome, extend = extend, shift = shift, uniq = uniq, window_size = ws, chr.select = chr.select, paired=T, bwa=T)
But I got the following output and errors:
Warning: processing of bwa alignment files for paired-end sequencing data is still under development. Reading bam alignment LSK-ME-I_sorted.bam considering chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chrX chrY using bam index Total number of imported first mate reads in properly mapped pairs: 35648142 scanBamFlag: isPaired = T, isProperPair=TRUE , hasUnmappedMate=FALSE, isUnmappedQuery = F, isFirstMateRead = T, isSecondMateRead = F Mean insertion size: 1916.52 nt SD of the insertion size: 360506 nt Max insertion size: 189455410 nt Min insertion size: 0 nt Paired-end alignment (BAM) files generated by bwa are processed in a different way compared to BAM files generated by bowtie, because in the bwa output the first mate can be either the 'left' or the 'right' mate regardless of their alignment to the plus or the minus strand. Creating GRange Object... Keep at most 1 read mapping to the same genomic location. Number of remaining short reads: 32803074 Calculating genomic coordinates... Creating Granges object for genome wide windows... Calculating short read coverage at genome wide windows... Error in queryHits(findOverlaps(query, subject, maxgap = maxgap, minoverlap = minoverlap, : error in evaluating the argument 'x' in selecting a method for function 'queryHits': Error in .Call2("Integer_order2", a, b, decreasing, PACKAGE = "S4Vectors") : long vectors not supported yet: memory.c:3361 Calls: findOverlaps ... findOverlaps -> .local -> <Anonymous> -> .Call2 -> .Call Calls: MEDIPS.createSet ... countOverlaps -> countOverlaps -> .local -> queryHits Execution halted
Here comes my sessionInfo. MEDIPS is in version 1.20 however, Bioconductor is still in version 3.0.
> sessionInfo() R version 3.2.2 (2015-08-14) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu precise (12.04.5 LTS) locale: [1] LC_CTYPE=en_HK.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_HK.UTF-8 LC_COLLATE=en_HK.UTF-8 [5] LC_MONETARY=en_HK.UTF-8 LC_MESSAGES=en_HK.UTF-8 [7] LC_PAPER=en_HK.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_HK.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets [8] methods base other attached packages: [1] MEDIPS_1.20.0 Rsamtools_1.18.3 BSgenome_1.34.1 [4] rtracklayer_1.26.3 Biostrings_2.34.1 XVector_0.6.0 [7] GenomicRanges_1.18.4 GenomeInfoDb_1.2.5 IRanges_2.0.1 [10] S4Vectors_0.4.0 BiocGenerics_0.12.1 loaded via a namespace (and not attached): [1] AnnotationDbi_1.28.2 DNAcopy_1.40.0 magrittr_1.5 [4] zlibbioc_1.12.0 GenomicAlignments_1.2.2 BiocParallel_1.0.3 [7] brew_1.0-6 foreach_1.4.3 stringr_1.0.0 [10] sendmailR_1.2-1 tools_3.2.2 fail_1.3 [13] Biobase_2.26.0 checkmate_1.6.2 DBI_0.3.1 [16] gtools_3.5.0 iterators_1.0.8 BatchJobs_1.6 [19] digest_0.6.8 preprocessCore_1.28.0 base64enc_0.1-3 [22] bitops_1.0-6 codetools_0.2-14 biomaRt_2.22.0 [25] RCurl_1.95-4.7 RSQLite_1.0.0 stringi_0.5-5 [28] BBmisc_1.9 XML_3.98-1.3
I have no idea on the errors and can't find answer myself. :( Any help is appreciated.
Best,
Vanilla
Dear Lukas,
Thanks very much for your reply. I used 120gb memory to run MEDIPS and actually this is the maximum memory available to me.
Also thanks for your suggestions. However, this is not a merged BAM file. But if memory is the problem caused the error, could I creatSet chromosome by chromosome and merge them together?
Best,
Yiming
It is not memory per se, but R has decided at some point that the data needs to be represented by a vector longer than 2^31-1 elements. Apparently,
findOverlaps()
does not support vectors this long. If data of this size is somehow expected, then some kind of chunk-based processing would be appropriate.Thanks Martin! Would you share some suggestions on chunk-based processing then? -- Best, Vanilla
Thanks Lukas! I will try out your advice, analyze each chunk of chromosomes and concatenate resulting tables.
-- Best, Vanilla