R memory limit causing problems with genomicRanges operations...
1
0
Entering edit mode
@chrisclarkson100-11114
Last seen 22 months ago
United Kingdom

I have tried, using rtracklayer and Genomicranges to find the overlap between bed files that document different ChIP Seq data e.g. Nucleosome coverage, CTCF coverage etc.

I want to then have a csv file that documents the overlapping regions and their occupancy levels e.g.

"","seqnames","start","end","width","strand","ctcf","H3k27ac","H3k36me3","H3k4me1","H3k79me2"

"1","chr21",9419882,9419981,100,"+",          1.04,  3.32,     1,         0.16,     6.76

"2","chr21",9424072,9424171,100,"+",          1.84,  2,        4.64,      4,        4.52

"3","chr21",9426516,9426560,45,"+",           1.44,  2.92,     1,         1.16,     2.4

The script, based on a previous post (https://support.bioconductor.org/p/89510/#89521), goes as follows:

library(bigmemory)

library(rtracklayer)

library(GenomicRanges)

Nuc_hESC<-import('Nucleosome_coverage.bed') # file too big for processing

ctcfe<-import('CtcfStdSig.bed')

H3k27ace<-import('H3k27acStdSig.bed')

H3k36me3e<-import('H3k36me3StdSig.bed')

H3k4me1e<-import('H3k4me1StdSig.bed')

H3k79me2e<-import('H3k79me2StdSig.bed')

dj <- disjoin(c(Nuc_hESC, ctcfe, H3k27ace, H3k36me3e, H3k4me1e, H3k79me2e))

djhESC <- dj[ (dj %over% Nuc_hESC) & (dj %over% ctcfe) & (dj %over% H3k27ace) & (dj %over% H3k36me3e) &(dj %over% H3k4me1e) & (dj %over% H3k79me2e)]

mcols(djhESC) <-  DataFrame(Nuc=NA, ctcf=NA, H3k27ac=NA, H3k36me3=NA, H3k4me1=NA, H3k79me2=NA)

annotate <- function(dj, gr, column) {

    hits <- findOverlaps(dj, gr)

    mcols(dj)[queryHits(hits), column] <- mcols(gr)[subjectHits(hits), "score"]

    dj

}

dj_annotatedhESC<-annotate(djhESC, Nuc_hESC, "Nuc")

Error: cannot allocate memory of size 5.8 Gb #debilitating error

dj_annotatedhESC<-annotate(dj_annotatedhESC, ctcfe, "ctcf")

dj_annotatedhESC<-annotate(dj_annotatedhESC, H3k27ace, "H3k27ac")

dj_annotatedhESC<-annotate(dj_annotatedhESC, H3k36me3e, "H3k36me3")

dj_annotatedhESC<-annotate(dj_annotatedhESC, H3k4me1e, "H3k4me1")

dj_annotatedhESC<-annotate(dj_annotatedhESC, H3k79me2e, "H3k79me2")

write.csv(dj_annotatedhESC, "occup_embryo.csv")

The error highlighted in bold is due to the sheer size of the Nucleosome coverage vector. I tried to sort this problem out by using the command: "memory.limit(.....)" and requesting different amounts of memory but this does not work.

Although I came up with a temporary solution of reducing the size of the Nucleosome coverage file by taking the best ChIP-Seq reads- Is there a way that I can get around this (i.e. can I somehow request more memory for R, bearing in mind that I am using a linux cluster computer)?

R bigmemory • 1.2k views
ADD COMMENT
0
Entering edit mode
theobroma22 ▴ 10
@theobroma22-11920
Last seen 7.2 years ago

R uses virtual memory for any machine, PC or Linux: https://stat.ethz.ch/R-manual/R-devel/library/base/html/Memory-limits.html

There is a way to upgrade, but it seems unfortunately you are already maxed out at 5GB. You may want to restart your session or remove all objects that are not related to your current task as you proceed through your workflow in order to retain as much memory as possible.

 

 

 

ADD COMMENT

Login before adding your answer.

Traffic: 760 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6