Search
Question: R memory limit causing problems with genomicRanges operations...
0
gravatar for chrisclarkson100
11 months ago by
chrisclarkson10030 wrote:

I have tried, using rtracklayer and Genomicranges to find the overlap between bed files that document different ChIP Seq data e.g. Nucleosome coverage, CTCF coverage etc.

I want to then have a csv file that documents the overlapping regions and their occupancy levels e.g.

"","seqnames","start","end","width","strand","ctcf","H3k27ac","H3k36me3","H3k4me1","H3k79me2"

"1","chr21",9419882,9419981,100,"+",          1.04,  3.32,     1,         0.16,     6.76

"2","chr21",9424072,9424171,100,"+",          1.84,  2,        4.64,      4,        4.52

"3","chr21",9426516,9426560,45,"+",           1.44,  2.92,     1,         1.16,     2.4

The script, based on a previous post (https://support.bioconductor.org/p/89510/#89521), goes as follows:

library(bigmemory)

library(rtracklayer)

library(GenomicRanges)

Nuc_hESC<-import('Nucleosome_coverage.bed') # file too big for processing

ctcfe<-import('CtcfStdSig.bed')

H3k27ace<-import('H3k27acStdSig.bed')

H3k36me3e<-import('H3k36me3StdSig.bed')

H3k4me1e<-import('H3k4me1StdSig.bed')

H3k79me2e<-import('H3k79me2StdSig.bed')

dj <- disjoin(c(Nuc_hESC, ctcfe, H3k27ace, H3k36me3e, H3k4me1e, H3k79me2e))

djhESC <- dj[ (dj %over% Nuc_hESC) & (dj %over% ctcfe) & (dj %over% H3k27ace) & (dj %over% H3k36me3e) &(dj %over% H3k4me1e) & (dj %over% H3k79me2e)]

mcols(djhESC) <-  DataFrame(Nuc=NA, ctcf=NA, H3k27ac=NA, H3k36me3=NA, H3k4me1=NA, H3k79me2=NA)

annotate <- function(dj, gr, column) {

    hits <- findOverlaps(dj, gr)

    mcols(dj)[queryHits(hits), column] <- mcols(gr)[subjectHits(hits), "score"]

    dj

}

dj_annotatedhESC<-annotate(djhESC, Nuc_hESC, "Nuc")

Error: cannot allocate memory of size 5.8 Gb #debilitating error

dj_annotatedhESC<-annotate(dj_annotatedhESC, ctcfe, "ctcf")

dj_annotatedhESC<-annotate(dj_annotatedhESC, H3k27ace, "H3k27ac")

dj_annotatedhESC<-annotate(dj_annotatedhESC, H3k36me3e, "H3k36me3")

dj_annotatedhESC<-annotate(dj_annotatedhESC, H3k4me1e, "H3k4me1")

dj_annotatedhESC<-annotate(dj_annotatedhESC, H3k79me2e, "H3k79me2")

write.csv(dj_annotatedhESC, "occup_embryo.csv")

The error highlighted in bold is due to the sheer size of the Nucleosome coverage vector. I tried to sort this problem out by using the command: "memory.limit(.....)" and requesting different amounts of memory but this does not work.

Although I came up with a temporary solution of reducing the size of the Nucleosome coverage file by taking the best ChIP-Seq reads- Is there a way that I can get around this (i.e. can I somehow request more memory for R, bearing in mind that I am using a linux cluster computer)?

ADD COMMENTlink modified 11 months ago by theobroma2210 • written 11 months ago by chrisclarkson10030
0
gravatar for theobroma22
11 months ago by
theobroma2210
theobroma2210 wrote:

R uses virtual memory for any machine, PC or Linux: https://stat.ethz.ch/R-manual/R-devel/library/base/html/Memory-limits.html

There is a way to upgrade, but it seems unfortunately you are already maxed out at 5GB. You may want to restart your session or remove all objects that are not related to your current task as you proceed through your workflow in order to retain as much memory as possible.

 

 

 

ADD COMMENTlink written 11 months ago by theobroma2210
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 329 users visited in the last hour