Rsamtools Memory Issue
1
0
Entering edit mode
wrighth ▴ 260
@wrighth-3452
Last seen 10.2 years ago
Hello, all; I am having a problem with the readPileup() function in Rsamtools. I'm trying to read in a pileup file generated by SamTools of about ~4GB in size, on a Mac Pro, OS X 10.5. Top indicates that I have 18 GB of memory free, and I am using a freshly built R 2.11.1 with the x86_64 arch specified. However, when I attempt to read in the file: lane1 <- readPileup("test.pup", variant="SNP") I eventually get a malloc error and the error "Cannot allocate vector of size 500 Mb". I have specified ulimit unlimited for the shell that I'm running R in and have difficulty believing that a 500MB contiguous space is unavailable in 18 GB of free RAM. Top only ever indicates that R is using 2-3GB; Samtools has had no problems processing the files up to this point and a quick inspection seems to indicate that they are proper Pileup files. Any thoughts? Hollis Wright, PhD Oregon Clinical and Translational Research Institute Oregon Health and Science University
• 1.0k views
ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 3 months ago
United States
On 10/01/2010 01:12 PM, Hollis Wright wrote: > Hello, all; I am having a problem with the readPileup() function in > Rsamtools. I'm trying to read in a pileup file generated by SamTools > of about ~4GB in size, on a Mac Pro, OS X 10.5. Top indicates that I > have 18 GB of memory free, and I am using a freshly built R 2.11.1 > with the x86_64 arch specified. However, when I attempt to read in > the file: > > lane1 <- readPileup("test.pup", variant="SNP") > > I eventually get a malloc error and the error "Cannot allocate vector > of size 500 Mb". I have specified ulimit unlimited for the shell that > I'm running R in and have difficulty believing that a 500MB > contiguous space is unavailable in 18 GB of free RAM. Top only ever > indicates that R is using 2-3GB; Samtools has had no problems > processing the files up to this point and a quick inspection seems to > indicate that they are proper Pileup files. Any thoughts? Hi Hollis -- Partly, the message is saying "I've allocated a bunch of memory, and now I'm trying to allocate 500 more MB, and I can't find room for that additional memory". That 2-3 GB use reported by top needs clarification; it could be a mis-representation on the part of top, but it might also be helpful to report sessionInfo() (I'm not a Mac person so can't provide detail on 32 vs. 64 bit memory use...). samtools does stream processing so doesn't run in to memory limits; this is very different from the R programming model where data generally resides in memory. The code you execute ends up more or less directly at Rsamtools:::.readPileup_SNP and Rsamtools:::.readPileup_table. These rely on read.table to input the data, and the ... arguments available in the original call are passed down to read.table. So you can select lines to skip / limit the number of records read with the arguments 'skip' and 'nrows' as documented on ?read.table (samtools does produce multi-line records, so an unfortunate choice of skip / nrows will begin / end in the middle of a record; you could use read.table alone with similar arguments to peak at the file to get the breaks right). Martin > > Hollis Wright, PhD Oregon Clinical and Translational Research > Institute Oregon Health and Science University > _______________________________________________ Bioconductor mailing > list Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor Search the > archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793
ADD COMMENT
0
Entering edit mode
Hi, Martin. SessionInfo() below: R version 2.11.1 (2010-05-31) x86_64-apple-darwin locale: [1] C attached base packages: [1] stats graphics grDevices utils datasets methods other attached packages: [1] Rsamtools_1.0.8 Biostrings_2.16.9 GenomicRanges_1.0.9 [4] IRanges_1.6.17 loaded via a namespace (and not attached): [1] Biobase_2.8.0 For what it's worth I tried just a plain read.table on the Pileup file and had essentially the same error; I don't know if that helps. I'll see about chunking the file into smaller bits in the meantime, thanks... Hollis Wright, PhD Oregon Clinical and Translational Research Institute Oregon Health and Science University On 10/1/10 1:54 PM, "Martin Morgan" <mtmorgan@fhcrc.org> wrote: On 10/01/2010 01:12 PM, Hollis Wright wrote: > Hello, all; I am having a problem with the readPileup() function in > Rsamtools. I'm trying to read in a pileup file generated by SamTools > of about ~4GB in size, on a Mac Pro, OS X 10.5. Top indicates that I > have 18 GB of memory free, and I am using a freshly built R 2.11.1 > with the x86_64 arch specified. However, when I attempt to read in > the file: > > lane1 <- readPileup("test.pup", variant="SNP") > > I eventually get a malloc error and the error "Cannot allocate vector > of size 500 Mb". I have specified ulimit unlimited for the shell that > I'm running R in and have difficulty believing that a 500MB > contiguous space is unavailable in 18 GB of free RAM. Top only ever > indicates that R is using 2-3GB; Samtools has had no problems > processing the files up to this point and a quick inspection seems to > indicate that they are proper Pileup files. Any thoughts? Hi Hollis -- Partly, the message is saying "I've allocated a bunch of memory, and now I'm trying to allocate 500 more MB, and I can't find room for that additional memory". That 2-3 GB use reported by top needs clarification; it could be a mis-representation on the part of top, but it might also be helpful to report sessionInfo() (I'm not a Mac person so can't provide detail on 32 vs. 64 bit memory use...). samtools does stream processing so doesn't run in to memory limits; this is very different from the R programming model where data generally resides in memory. The code you execute ends up more or less directly at Rsamtools:::.readPileup_SNP and Rsamtools:::.readPileup_table. These rely on read.table to input the data, and the ... arguments available in the original call are passed down to read.table. So you can select lines to skip / limit the number of records read with the arguments 'skip' and 'nrows' as documented on ?read.table (samtools does produce multi-line records, so an unfortunate choice of skip / nrows will begin / end in the middle of a record; you could use read.table alone with similar arguments to peak at the file to get the breaks right). Martin > > Hollis Wright, PhD Oregon Clinical and Translational Research > Institute Oregon Health and Science University > _______________________________________________ Bioconductor mailing > list Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor Search the > archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793 [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 640 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6