Error with MEDIPS.createSet in queryHits
3
0
Entering edit mode
Vanilla • 0
@vanilla-9096
Last seen 3.3 years ago
Hong Kong

Dear all:

I'm recently dealing with a set of MeDIP-seq data with MEDIPS but got an error with "MEDIPS.creatSet". Could anyone with experience help me? Thank you.

My raw paired end reads were mapped to UCSC_mm10 reference genome with BWA. Then I dealt with the data with following codes:

library(MEDIPS)
library(BSgenome.Mmusculus.UCSC.mm10)

BSgenome="BSgenome.Mmusculus.UCSC.mm10"

chr.select=c(“chr1”,"chr2","chr3","chr4","chr5","chr6","chr7","chr8","chr9","chr10","chr11","chr12","chr13","chr14","chr15","chr16","chr17","chr18","chr19","chrX","chrY")
extend=0
shift=0
ws=300
uniq=1

ME_Input_2 = MEDIPS.createSet(file = "/home/aligned_bam/LSK-ME-I_sorted.bam", BSgenome = BSgenome, extend = extend, shift = shift, uniq = uniq, window_size = ws, chr.select = chr.select, paired=T, bwa=T)

But I got the following output and errors:

Warning: processing of bwa alignment files for paired-end sequencing data is still under development.
considering  chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chrX chrY  using bam index
Total number of imported first mate reads in properly mapped pairs: 35648142
scanBamFlag: isPaired = T, isProperPair=TRUE , hasUnmappedMate=FALSE, isUnmappedQuery = F, isFirstMateRead = T, isSecondMateRead = F
Mean insertion size: 1916.52 nt
SD of the insertion size: 360506 nt
Max insertion size: 189455410 nt
Min insertion size: 0 nt
Paired-end alignment (BAM) files generated by bwa are processed in a different way compared to BAM files generated by bowtie, because in the bwa output the first mate can be either the 'left' or the 'right' mate regardless of their alignment to the plus or the minus strand.
Creating GRange Object...
Keep at most 1 read mapping to the same genomic location.
Number of remaining short reads: 32803074
Calculating genomic coordinates...
Creating Granges object for genome wide windows...
Calculating short read coverage at genome wide windows...

Error in queryHits(findOverlaps(query, subject, maxgap = maxgap, minoverlap = minoverlap,  :
error in evaluating the argument 'x' in selecting a method for function 'queryHits': Error in .Call2("Integer_order2", a, b, decreasing, PACKAGE = "S4Vectors") :
long vectors not supported yet: memory.c:3361
Calls: findOverlaps ... findOverlaps -> .local -> <Anonymous> -> .Call2 -> .Call
Calls: MEDIPS.createSet ... countOverlaps -> countOverlaps -> .local -> queryHits
Execution halted

Here comes my sessionInfo. MEDIPS is in version 1.20 however, Bioconductor is still in version 3.0.

> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu precise (12.04.5 LTS)

locale:
[1] LC_CTYPE=en_HK.UTF-8       LC_NUMERIC=C
[3] LC_TIME=en_HK.UTF-8        LC_COLLATE=en_HK.UTF-8
[5] LC_MONETARY=en_HK.UTF-8    LC_MESSAGES=en_HK.UTF-8
[7] LC_PAPER=en_HK.UTF-8       LC_NAME=C
[11] LC_MEASUREMENT=en_HK.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
[1] MEDIPS_1.20.0        Rsamtools_1.18.3     BSgenome_1.34.1
[4] rtracklayer_1.26.3   Biostrings_2.34.1    XVector_0.6.0
[7] GenomicRanges_1.18.4 GenomeInfoDb_1.2.5   IRanges_2.0.1
[10] S4Vectors_0.4.0      BiocGenerics_0.12.1

loaded via a namespace (and not attached):
[1] AnnotationDbi_1.28.2    DNAcopy_1.40.0          magrittr_1.5
[4] zlibbioc_1.12.0         GenomicAlignments_1.2.2 BiocParallel_1.0.3
[7] brew_1.0-6              foreach_1.4.3           stringr_1.0.0
[10] sendmailR_1.2-1         tools_3.2.2             fail_1.3
[13] Biobase_2.26.0          checkmate_1.6.2         DBI_0.3.1
[16] gtools_3.5.0            iterators_1.0.8         BatchJobs_1.6
[19] digest_0.6.8            preprocessCore_1.28.0   base64enc_0.1-3
[22] bitops_1.0-6            codetools_0.2-14        biomaRt_2.22.0
[25] RCurl_1.95-4.7          RSQLite_1.0.0           stringi_0.5-5
[28] BBmisc_1.9              XML_3.98-1.3           

I have no idea on the errors and can't find answer myself. :( Any help is appreciated.

Best,

Vanilla

medips createset • 1.5k views
1
Entering edit mode
Lukas Chavez ▴ 570
@lukas-chavez-5781
Last seen 3.9 years ago
USA/La Jolla/UCSD
based on the error message 'long vectors not supported yet: memory.c:3361’, my best guess is that MEDIPS requires more memory than available. How much memory does your machine have? Migrating to another server with more memory might help, if available. Alternatively, in case your bam file 'LSK-ME-I_sorted.bam’ is a merged bam file, you could try importing the individual bam files separately and merge the MEDIPS SETs within MEDIPS using the MEDIPS.mergeSets() function. All the best, Lukas
0
Entering edit mode

Dear Lukas,

Thanks very much for your reply. I used 120gb memory to run MEDIPS and actually this is the maximum memory available to me.

Also thanks for your suggestions. However, this is not a merged BAM file. But if memory is the problem caused the error, could I creatSet chromosome by chromosome and merge them together?

Best,

Yiming

0
Entering edit mode

It is not memory per se, but R has decided at some point that the data needs to be represented by a vector longer than 2^31-1 elements. Apparently, findOverlaps() does not support vectors this long. If data of this size is somehow expected, then some kind of chunk-based processing would be appropriate.

0
Entering edit mode

Thanks Martin! Would you share some suggestions on chunk-based processing then? -- Best, Vanilla

0
Entering edit mode
Thanks for clarifying! Another option in MEDIPS is to process the data chromosome by chromosome or in chunks of chromosomes by using the chr.select parameter. In this case, however, MEDIPS.mergeSets() does not work. Instead, analysis of each chunk of chromosomes can be performed individually using the MEDIPS.meth() function and the resulting tables can be concatenated afterwards. Lukas On 04 Nov 2015, at 10:45, Martin Morgan [bioc] <noreply@bioconductor.org<mailto:noreply@bioconductor.org>> wrote: Activity on a post you are following on support.bioconductor.org<https: support.bioconductor.org=""/> User Martin Morgan<https: support.bioconductor.org="" u="" 1513=""/> wrote Comment: Error with MEDIPS.createSet in queryHits<https: support.bioconductor.org="" p="" 74112="" #74117="">: It is not memory per se, but R has decided at some point that the data needs to be represented by a vector longer than 2^31-1 elements. Apparently, findOverlaps() does not support vectors this long. If data of this size is somehow expected, then some kind of chunk-based processing would be appropriate. ________________________________ Post tags: medips, createset You may reply via email or visit C: Error with MEDIPS.createSet in queryHits
0
Entering edit mode

Thanks Lukas! I will try out your advice, analyze each chunk of chromosomes and concatenate resulting tables.

-- Best, Vanilla

1
Entering edit mode
@herve-pages-1542
Last seen 8 hours ago
Seattle, WA, United States

Hi Vanilla,

You're mixing up Bioconductor versions: your version of MEDIPS belongs to BioC 3.2 (the current BioC release), but your versions of GenomicRanges, IRanges, and many other packages, belong to BioC 3.0, which is not supported anymore. So it seems that you installed the latest MEDIPS "by hand" (i.e. without using biocLite()) on top of your BioC 3.0 installation. Furthermore, BioC 3.0 requires R 3.1 but you managed to install it on top of R 3.2 so you also have a mismatch between your version of BioC and your version of R. You absolutely want to fix these things before you proceed any further. Unless you have a good reason to stick to BioC 3.0, I would highly recommend that you install BioC 3.2.

MEDIPS.createSet() calls countOverlaps(), not findOverlaps(). The fact that the error message you got mentions findOverlaps() is because in the past countOverlaps() was always calling findOverlaps(). However, among many other things, findOverlaps() and friends (countOverlaps(), etc...) have been completely refactored between BioC 3.0 and BioC 3.2. In particular the "countOverlaps" method for GRanges objects (used in MEDIPS.createSet()) doesn't call findOverlaps() anymore so it could be (if you're lucky) that the issue your reported just vanishes by upgrading. If it doesn't, please report again but only after you've upgraded to the current BioC release (BioC 3.2) as we don't support older versions.

Thanks,

H.

0
Entering edit mode

Hi Herve,

Thanks for pointing out this problem. Actually I'm also concerned about the version issues as I have marked them as bold in my question. The problem is I failed updating BioC3.2 with the following code:

source("http://bioconductor.org/biocLite.R")
biocLite("BiocUpgrade")

As it returned an error

Error: Bioconductor version 3.0 cannot be upgraded with R version 3.1.0

So I installed MEDIPS1.20.0 manually.

I'm afraid there're several versions of R installed in my server. I also noticed that the versions of R used are different when I directly typed in "Rscript" or submitted a job to run "Rscript". After that I have export PATH=/usr/bin:\$PATH to my ".profile". However, I still failed updating BioC. Do you have any idea to solve the problem? Thanks!

Best,

Vanilla

0
Entering edit mode

I see. First of all, make sure that you start the correct version of R (i.e. R 3.2). Then, from this version of R try to remove the BiocInstaller package with:

> R.version.string
[1] "R version 3.2.1 Patched (2015-07-21 r68718)"   # you want 3.2.* here
> remove.packages("BiocInstaller")

Then:

source("http://bioconductor.org/biocLite.R")
biocLite()

The first command (source(".../biocLite.R")) will re-install the correct version of the BiocInstaller package (it looks like you might have the wrong version). The second command (biocLite(), with no arguments) will update all your packages. Doing biocLite("BiocUpgrade") should not be needed because that forces an upgrade of the BiocInstaller package itself. However source(".../biocLite.R") should already have installed the version of BiocInstaller that matches BioC 3.2 (should be BiocInstaller 1.20.0).

Let us know how this goes,

H.

0
Entering edit mode

Hi Herve,

It seems doesn't work... :( I followed your codes and got the following:

> R.version.string
[1] "R version 3.2.2 (2015-08-14)"
> remove.packages("BiocInstaller")
Removing package from ‘/nfs/home/package/software/R/lib’
(as ‘lib’ is unspecified)
> source("http://bioconductor.org/biocLite.R")
Bioconductor version 3.0 (BiocInstaller 1.16.5), ?biocLite for help
BiocInstaller version 3.0 is too old for R version 3.2.2;
remove.packages("BiocInstaller") then
source("http://bioconductor.org/biocLite.R")
A new version of Bioconductor is available after installing the most recent
version of R; see http://bioconductor.org/install

> biocLite()
BioC_mirror: http://bioconductor.org
Using Bioconductor version 3.0 (BiocInstaller 1.16.5), R version 3.2.2.

And actually 'lib = "/usr/lib/R/site-library"' is not writable, I used a personal library instead. But the BioC is not updated properly right? Thanks a lot.

Best, Vanilla

0
Entering edit mode

mmm, seems like you have more than one instance of BiocInstaller installed.

Can you please start R again, then do:

paths <- file.path(.libPaths(), "BiocInstaller")
unlink(paths, recursive=TRUE, force=TRUE)

then show us the output of:

file.exists(paths)  # should display one or more FALSEs, no TRUEs

If the above command gave you a TRUE, then do:

paths[file.exists(paths)]

That will show you the path of the BiocInstaller instance that could not be removed. Let's call this path /path/to/BiocInstaller

To remove it, open a terminal on your Ubuntu machine and, at the command line, do:

sudo rm -rf /path/to/BiocInstaller   # replace with your real path

Once you got rid of /path/to/BiocInstaller, start R again and do

paths <- file.path(.libPaths(), "BiocInstaller")
file.exists(paths)

again. You should see one or more FALSEs only, but no more TRUEs.

Then proceed with

source("http://bioconductor.org/biocLite.R")
biocLite()

Let us know how this goes.

H.

0
Entering edit mode

Hi Herve,

Thanks so much for the detailed solutions for my problem. I tried file.exists(paths) and it gave an TRUE. Unfortunately I don't have the right to remove it so I wrote to the admin to update it. Would let you know when it's fixed. :)

Best, Vanilla

0
Entering edit mode
Vanilla • 0
@vanilla-9096
Last seen 3.3 years ago
Hong Kong

Hi all,

Thanks for all your help and I finally solved my problems. I tried to solve in two ways:

1. In case it runs over RAM I performed MEDIPS.createSet() chromosome by chromosome. It worked smoothly. However, when calling differentially methylated regions with MEDIPS.meth(), I got the error for some chromosomes:

Error in MEDIPS.calibrationCurve(MSet = MSet2[[i]], CSet = CSet, input = F) :
The dependency of coverage signals on sequence pattern (e.g. CpG) densities is different than expected. No linear model can be build, please check the calibration plot by providing the MSet object at ISet.
Calls: MEDIPS.meth -> MEDIPS.calibrationCurve
Execution halted

I thought the error came because there was something wrong with CpG density dependent normalization of MeDIP-seq data. One solution is to skip this normalization and perform MEDIPS.meth() directly. As I have successfully normalized the data for other chromosomes, I put aside this way for a while.

2. With suggestions from Herve, I tried to re-install BioConductor3.2 into a personal library, though with some little troubles but solved with the help of admin. With matched version of R, BioConductor and MEDIPS, I can MEDIPS smoothly and don't even need to MEDIPS.createSet() chromosome by chromosome.

Thanks again for all warm-hearted help. Hope this can also help others.

Best,

Vanilla