Excessive memory requirements of PING or bug?

0

Entering edit mode

Lars Hennig ▴ 10

@lars-hennig-5290

Last seen 10.2 years ago

Dear PING maintainers, Running PING with the example from the vignette works fine, but segmentReads causes a "cannot allocate memory block of size 68719476735.9 Gb" error when using my own ChIP-seq sample data. (16Mio paired end reads mapped with bowtie). This is an Arabidopsis sample (genome size = 130MB). Using a sample of 100000 of our own reads runs smoothly again, 2.5 Mio crash with a similarly high memory request as mentioned above. Including snowfall or not has no effect. Is there a way to trick PING into processing more than some few 100000 reads with "normal" memory (I have 48 Gb available). If PING really has a very high memory need, this could be mentioned in the documentation. Thank you very much, Lars Script: library(ShortRead) reads <- readAligned("reads_sorted.bam", type="BAM") reads <- reads[!is.na(position(reads))] reads <- reads[chromosome(reads) %in% c("Chr4")] #reads <- reads[1:100000] library(PING) library(snowfall) sfInit(parallel=TRUE,cpus=4) sfLibrary(PING) reads <- as(reads,"RangesList") reads <- as(reads,"RangedData") reads <- as(reads,"GenomeData") seg <-segmentReads(reads, minReads=5, maxLregion=1200,minLregion=80, jitter=T) > traceback() 2: .Call("segReadsAll", data, dataC, start, end, as.integer(jitter), paraSW, as.integer(maxStep), as.integer(minLregion), PACKAGE = "PING") 1: segmentReads(reads_gd, minReads = 5, maxLregion = 1200, minLregion = 80, jitter = T) > sessionInfo() R version 2.15.0 (2012-03-30) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] snowfall_1.84 snow_0.3-9 PING_1.0.0 [4] chipseq_1.6.0 ShortRead_1.14.3 latticeExtra_0.6-19 [7] RColorBrewer_1.0-5 Rsamtools_1.8.4 lattice_0.20-6 [10] BSgenome_1.24.0 Biostrings_2.24.1 GenomicRanges_1.8.6 [13] IRanges_1.14.3 BiocGenerics_0.2.0 loaded via a namespace (and not attached): [1] Biobase_2.16.0 biomaRt_2.12.0 bitops_1.0-4.1 [4] GenomeGraphs_1.16.0 grid_2.15.0 hwriter_1.3 [7] RCurl_1.91-1 stats4_2.15.0 tools_2.15.0 [10] XML_3.9-4 zlibbioc_1.2.0 Dr. Lars Hennig Professor of Genetics Swedish University of Agricultural Sciences Uppsala BioCenter Department of Plant Biology and Forest Genetics PO-Box 7080 SE-75007 Uppsala, Sweden Lars.Hennig@vbsg.slu.se Tel. +46 18 67 3326 Fax +46 18 67 3389 Visiting address: Uppsala BioCenter Almas Allé 5 SE-75651 Uppsala, Sweden Room A-489 [[alternative HTML version deleted]]

PING PING • 1.9k views

ADD COMMENT • link updated 12.5 years ago by Xuekui Zhang ▴ 20 • written 12.5 years ago by Lars Hennig ▴ 10

0

Entering edit mode

Dan Tenenbaum ★ 8.2k

@dan-tenenbaum-4256

Last seen 5 months ago

United States

I'm cc'ing one of the PING maintainers who can perhaps shed more light on this. Dan On Thu, May 17, 2012 at 2:55 PM, Lars Hennig <lars.hennig at="" slu.se=""> wrote: > Dear PING maintainers, > > Running PING with the example from the vignette works fine, but segmentReads causes a "cannot allocate memory block of size 68719476735.9 Gb" error when using my own ChIP-seq sample data. (16Mio paired end reads mapped with bowtie). This is an Arabidopsis sample (genome size = 130MB). > Using a sample of 100000 of our own reads runs smoothly again, 2.5 Mio crash with a similarly high memory request as mentioned above. Including snowfall or not has no effect. > > Is there a way to trick PING into processing more than some few 100000 reads with "normal" memory (I have 48 Gb available). If PING really has a very high memory need, this could be mentioned in the documentation. > > Thank you very much, > > Lars > > Script: > > library(ShortRead) > > reads <- readAligned("reads_sorted.bam", type="BAM") > reads <- reads[!is.na(position(reads))] > reads <- reads[chromosome(reads) %in% c("Chr4")] > > #reads <- reads[1:100000] > > library(PING) > library(snowfall) > sfInit(parallel=TRUE,cpus=4) > sfLibrary(PING) > > > reads <- as(reads,"RangesList") > reads <- as(reads,"RangedData") > reads <- as(reads,"GenomeData") > > seg <-segmentReads(reads, minReads=5, maxLregion=1200,minLregion=80, jitter=T) > > > > >> traceback() > 2: .Call("segReadsAll", data, dataC, start, end, as.integer(jitter), > ? ? ? paraSW, as.integer(maxStep), as.integer(minLregion), PACKAGE = "PING") > 1: segmentReads(reads_gd, minReads = 5, maxLregion = 1200, minLregion = 80, > ? ? ? jitter = T) > > >> sessionInfo() > R version 2.15.0 (2012-03-30) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 ? ? ? LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 ? ? ? ?LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 ? ?LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=C ? ? ? ? ? ? ? ? LC_NAME=C > [9] LC_ADDRESS=C ? ? ? ? ? ? ? LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base > > other attached packages: > [1] snowfall_1.84 ? ? ? snow_0.3-9 ? ? ? ? ?PING_1.0.0 > [4] chipseq_1.6.0 ? ? ? ShortRead_1.14.3 ? ?latticeExtra_0.6-19 > [7] RColorBrewer_1.0-5 ?Rsamtools_1.8.4 ? ? lattice_0.20-6 > [10] BSgenome_1.24.0 ? ? Biostrings_2.24.1 ? GenomicRanges_1.8.6 > [13] IRanges_1.14.3 ? ? ?BiocGenerics_0.2.0 > > loaded via a namespace (and not attached): > [1] Biobase_2.16.0 ? ? ?biomaRt_2.12.0 ? ? ?bitops_1.0-4.1 > [4] GenomeGraphs_1.16.0 grid_2.15.0 ? ? ? ? hwriter_1.3 > [7] RCurl_1.91-1 ? ? ? ?stats4_2.15.0 ? ? ? tools_2.15.0 > [10] XML_3.9-4 ? ? ? ? ? zlibbioc_1.2.0 > > > Dr. Lars Hennig > Professor of Genetics > Swedish University of Agricultural Sciences > Uppsala BioCenter > Department of Plant Biology and Forest Genetics > PO-Box 7080 > SE-75007 Uppsala, Sweden > Lars.Hennig at vbsg.slu.se > Tel. +46 18 67 3326 > Fax ?+46 18 67 3389 > > Visiting address: > Uppsala BioCenter > Almas All? 5 > SE-75651 Uppsala, Sweden > Room A-489 > > > ? ? ? ?[[alternative HTML version deleted]] > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 12.5 years ago Dan Tenenbaum ★ 8.2k

0

Entering edit mode

ying chen ▴ 340

@ying-chen-5085

Last seen 10.2 years ago

Hi all, I have a question regarding CBS. Once I finish CBS computing, I end up with a text file such as: sample chromosome start stop mean count 1 NA06991 1 61736 106013377 -0.002 65870 2 NA06991 1 106019206 106022376 -1.675 1 3 NA06991 1 106024056 149036525 -0.002 11462 4 NA06991 1 149040066 149256692 -0.443 141 5 NA06991 1 149259417 149436843 -0.144 36 Is there a package that can map the original probe ID or gene IDs to segmentation result? I want to end up with a table that has a column for original probe ID, a column for the related CN mean from CBS, a column for chromosome, and a column for chromosome start position. Thanks a lot for the help! Ying [[alternative HTML version deleted]]

ADD COMMENT • link 12.5 years ago ying chen ▴ 340

0

Entering edit mode

doesn't genoset do this? also CRLMM. On Fri, May 18, 2012 at 11:23 AM, ying chen <ying_chen@live.com> wrote: > > Hi all, > I have a question regarding CBS. Once I finish CBS computing, I end up > with a text file such as: sample chromosome start stop > mean count > 1 NA06991 1 61736 106013377 -0.002 65870 > 2 NA06991 1 106019206 106022376 -1.675 1 > 3 NA06991 1 106024056 149036525 -0.002 11462 > 4 NA06991 1 149040066 149256692 -0.443 141 > 5 NA06991 1 149259417 149436843 -0.144 36 Is there a > package that can map the original probe ID or gene IDs to segmentation > result? I want to end up with a table that has a column for original probe > ID, a column for the related CN mean from CBS, a column for chromosome, and > a column for chromosome start position. Thanks a lot for the help! Ying > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> [[alternative HTML version deleted]]

ADD REPLY • link 12.5 years ago Tim Triche ★ 4.2k

0

Entering edit mode

Dan Tenenbaum ★ 8.2k

@dan-tenenbaum-4256

Last seen 5 months ago

United States

[cc'ing Bioconductor list so others can benefit...] On Sat, May 19, 2012 at 3:28 PM, Xuekui Zhang <ubcxzhang at="" gmail.com=""> wrote: > Hi Lars, > > ? Did you try to analyze each chromosome separately? > ? Please let me know if that still can not solve the problem. > > Xuekui > > On May 19, 2012, at 5:35 PM, Raphael Gottardo wrote: > > Hi Lars, > > Xuekui ccied here will look into it. > > Raphael > > -- > Raphael Gottardo, Associate Member > http://www.rglab.org > Fred Hutchinson Cancer Research Center > Vaccine and Infectious Disease Division > Public Health Sciences Division > > > > On May 18, 2012, at 11:56 AM, Dan Tenenbaum wrote: > > I'm cc'ing one of the PING maintainers who can perhaps shed more light on > this. > Dan > > > On Thu, May 17, 2012 at 2:55 PM, Lars Hennig <lars.hennig at="" slu.se=""> wrote: > > Dear PING maintainers, > > > Running PING with the example from the vignette works fine, but segmentReads > causes a "cannot allocate memory block of size 68719476735.9 Gb" error when > using my own ChIP-seq sample data. (16Mio paired end reads mapped with > bowtie). This is an Arabidopsis sample (genome size = 130MB). > > Using a sample of 100000 of our own reads runs smoothly again, 2.5 Mio crash > with a similarly high memory request as mentioned above. Including snowfall > or not has no effect. > > > Is there a way to trick PING into processing more than some few 100000 reads > with "normal" memory (I have 48 Gb available). If PING really has a very > high memory need, this could be mentioned in the documentation. > > > Thank you very much, > > > Lars > > > Script: > > > library(ShortRead) > > > reads <- readAligned("reads_sorted.bam", type="BAM") > > reads <- reads[!is.na(position(reads))] > > reads <- reads[chromosome(reads) %in% c("Chr4")] > > > #reads <- reads[1:100000] > > > library(PING) > > library(snowfall) > > sfInit(parallel=TRUE,cpus=4) > > sfLibrary(PING) > > > > reads <- as(reads,"RangesList") > > reads <- as(reads,"RangedData") > > reads <- as(reads,"GenomeData") > > > seg <-segmentReads(reads, minReads=5, maxLregion=1200,minLregion=80, > jitter=T) > > > > > > traceback() > > 2: .Call("segReadsAll", data, dataC, start, end, as.integer(jitter), > > ? ? ? paraSW, as.integer(maxStep), as.integer(minLregion), PACKAGE = "PING") > > 1: segmentReads(reads_gd, minReads = 5, maxLregion = 1200, minLregion = 80, > > ? ? ? jitter = T) > > > > sessionInfo() > > R version 2.15.0 (2012-03-30) > > Platform: x86_64-pc-linux-gnu (64-bit) > > > locale: > > [1] LC_CTYPE=en_US.UTF-8 ? ? ? LC_NUMERIC=C > > [3] LC_TIME=en_US.UTF-8 ? ? ? ?LC_COLLATE=en_US.UTF-8 > > [5] LC_MONETARY=en_US.UTF-8 ? ?LC_MESSAGES=en_US.UTF-8 > > [7] LC_PAPER=C ? ? ? ? ? ? ? ? LC_NAME=C > > [9] LC_ADDRESS=C ? ? ? ? ? ? ? LC_TELEPHONE=C > > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > > attached base packages: > > [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base > > > other attached packages: > > [1] snowfall_1.84 ? ? ? snow_0.3-9 ? ? ? ? ?PING_1.0.0 > > [4] chipseq_1.6.0 ? ? ? ShortRead_1.14.3 ? ?latticeExtra_0.6-19 > > [7] RColorBrewer_1.0-5 ?Rsamtools_1.8.4 ? ? lattice_0.20-6 > > [10] BSgenome_1.24.0 ? ? Biostrings_2.24.1 ? GenomicRanges_1.8.6 > > [13] IRanges_1.14.3 ? ? ?BiocGenerics_0.2.0 > > > loaded via a namespace (and not attached): > > [1] Biobase_2.16.0 ? ? ?biomaRt_2.12.0 ? ? ?bitops_1.0-4.1 > > [4] GenomeGraphs_1.16.0 grid_2.15.0 ? ? ? ? hwriter_1.3 > > [7] RCurl_1.91-1 ? ? ? ?stats4_2.15.0 ? ? ? tools_2.15.0 > > [10] XML_3.9-4 ? ? ? ? ? zlibbioc_1.2.0 > > > > Dr. Lars Hennig > > Professor of Genetics > > Swedish University of Agricultural Sciences > > Uppsala BioCenter > > Department of Plant Biology and Forest Genetics > > PO-Box 7080 > > SE-75007 Uppsala, Sweden > > Lars.Hennig at vbsg.slu.se > > Tel. +46 18 67 3326 > > Fax ?+46 18 67 3389 > > > Visiting address: > > Uppsala BioCenter > > Almas All? 5 > > SE-75651 Uppsala, Sweden > > Room A-489 > > > > ? ? ? ?[[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > >

ADD COMMENT • link 12.5 years ago Dan Tenenbaum ★ 8.2k

0

Entering edit mode

Xuekui Zhang ▴ 20

@xuekui-zhang-4799

Last seen 10.2 years ago

Hi Lars, Thanks for your feed back! Yes, cutting chromosome before segmentation is not preferred, since the cutting points might be where the peaks/nucleosomes are. To avoid this problem, you could run a sliding window (e.g. window size 300bp, step size 10 bp) on a chromosome, count reads count in each window to find valley of reads counts curve and good cutting there. When we make the next version of PING, we could integrate cutting into segmentation step to avoid cutting chromosome on wrong place. Xuekui On May 20, 2012, at 3:25 AM, Lars Hennig wrote: > Yes, I tried. Restricting to single chromosomes of ~ 20MB did not help but going to much smaller subchromosomal domains did eventually solve the problem. Still, this is not a preferred option to slice the genome into many small sectons. > > Lars > > -----Original Message----- > From: Dan Tenenbaum [mailto:dtenenba at fhcrc.org] > Sent: Sunday, May 20, 2012 12:30 AM > To: Xuekui Zhang > Cc: Raphael Gottardo; Lars Hennig; Renan Sauteraud; bioconductor at r-project.org > Subject: Re: [BioC] Excessive memory requirements of PING or bug? > > [cc'ing Bioconductor list so others can benefit...] > > On Sat, May 19, 2012 at 3:28 PM, Xuekui Zhang <ubcxzhang at="" gmail.com=""> wrote: >> Hi Lars, >> >> Did you try to analyze each chromosome separately? >> Please let me know if that still can not solve the problem. >> >> Xuekui >> >> On May 19, 2012, at 5:35 PM, Raphael Gottardo wrote: >> >> Hi Lars, >> >> Xuekui ccied here will look into it. >> >> Raphael >> >> -- >> Raphael Gottardo, Associate Member >> http://www.rglab.org >> Fred Hutchinson Cancer Research Center Vaccine and Infectious Disease >> Division Public Health Sciences Division >> >> >> >> On May 18, 2012, at 11:56 AM, Dan Tenenbaum wrote: >> >> I'm cc'ing one of the PING maintainers who can perhaps shed more light >> on this. >> Dan >> >> >> On Thu, May 17, 2012 at 2:55 PM, Lars Hennig <lars.hennig at="" slu.se=""> wrote: >> >> Dear PING maintainers, >> >> >> Running PING with the example from the vignette works fine, but >> segmentReads causes a "cannot allocate memory block of size >> 68719476735.9 Gb" error when using my own ChIP-seq sample data. (16Mio >> paired end reads mapped with bowtie). This is an Arabidopsis sample (genome size = 130MB). >> >> Using a sample of 100000 of our own reads runs smoothly again, 2.5 Mio >> crash with a similarly high memory request as mentioned above. >> Including snowfall or not has no effect. >> >> >> Is there a way to trick PING into processing more than some few 100000 >> reads with "normal" memory (I have 48 Gb available). If PING really >> has a very high memory need, this could be mentioned in the documentation. >> >> >> Thank you very much, >> >> >> Lars >> >> >> Script: >> >> >> library(ShortRead) >> >> >> reads <- readAligned("reads_sorted.bam", type="BAM") >> >> reads <- reads[!is.na(position(reads))] >> >> reads <- reads[chromosome(reads) %in% c("Chr4")] >> >> >> #reads <- reads[1:100000] >> >> >> library(PING) >> >> library(snowfall) >> >> sfInit(parallel=TRUE,cpus=4) >> >> sfLibrary(PING) >> >> >> >> reads <- as(reads,"RangesList") >> >> reads <- as(reads,"RangedData") >> >> reads <- as(reads,"GenomeData") >> >> >> seg <-segmentReads(reads, minReads=5, maxLregion=1200,minLregion=80, >> jitter=T) >> >> >> >> >> >> traceback() >> >> 2: .Call("segReadsAll", data, dataC, start, end, as.integer(jitter), >> >> paraSW, as.integer(maxStep), as.integer(minLregion), PACKAGE = >> "PING") >> >> 1: segmentReads(reads_gd, minReads = 5, maxLregion = 1200, minLregion >> = 80, >> >> jitter = T) >> >> >> >> sessionInfo() >> >> R version 2.15.0 (2012-03-30) >> >> Platform: x86_64-pc-linux-gnu (64-bit) >> >> >> locale: >> >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >> >> [7] LC_PAPER=C LC_NAME=C >> >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> >> attached base packages: >> >> [1] stats graphics grDevices utils datasets methods base >> >> >> other attached packages: >> >> [1] snowfall_1.84 snow_0.3-9 PING_1.0.0 >> >> [4] chipseq_1.6.0 ShortRead_1.14.3 latticeExtra_0.6-19 >> >> [7] RColorBrewer_1.0-5 Rsamtools_1.8.4 lattice_0.20-6 >> >> [10] BSgenome_1.24.0 Biostrings_2.24.1 GenomicRanges_1.8.6 >> >> [13] IRanges_1.14.3 BiocGenerics_0.2.0 >> >> >> loaded via a namespace (and not attached): >> >> [1] Biobase_2.16.0 biomaRt_2.12.0 bitops_1.0-4.1 >> >> [4] GenomeGraphs_1.16.0 grid_2.15.0 hwriter_1.3 >> >> [7] RCurl_1.91-1 stats4_2.15.0 tools_2.15.0 >> >> [10] XML_3.9-4 zlibbioc_1.2.0 >> >> >> >> Dr. Lars Hennig >> >> Professor of Genetics >> >> Swedish University of Agricultural Sciences >> >> Uppsala BioCenter >> >> Department of Plant Biology and Forest Genetics >> >> PO-Box 7080 >> >> SE-75007 Uppsala, Sweden >> >> Lars.Hennig at vbsg.slu.se >> >> Tel. +46 18 67 3326 >> >> Fax +46 18 67 3389 >> >> >> Visiting address: >> >> Uppsala BioCenter >> >> Almas All? 5 >> >> SE-75651 Uppsala, Sweden >> >> Room A-489 >> >> >> >> [[alternative HTML version deleted]] >> >> >> >> _______________________________________________ >> >> Bioconductor mailing list >> >> Bioconductor at r-project.org >> >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >>

ADD COMMENT • link 12.5 years ago Xuekui Zhang ▴ 20

0

Entering edit mode

On 05/21/2012 10:09 AM, Xuekui Zhang wrote: > Hi Lars, > > Thanks for your feed back! > Yes, cutting chromosome before segmentation is not preferred, since the cutting points might be where the peaks/nucleosomes are. > To avoid this problem, you could run a sliding window (e.g. window size 300bp, step size 10 bp) on a chromosome, count reads count in each window to find valley of reads counts curve and good cutting there. > When we make the next version of PING, we could integrate cutting into segmentation step to avoid cutting chromosome on wrong place. The original problem sounded like an integer overflow in the underlying C code in PING. Can that be fixed? If there is a simple reproducible example (e.g., using R's random number generators to simulate large enough data) I can perhaps help to identify this. Martin > Xuekui > > On May 20, 2012, at 3:25 AM, Lars Hennig wrote: > >> Yes, I tried. Restricting to single chromosomes of ~ 20MB did not help but going to much smaller subchromosomal domains did eventually solve the problem. Still, this is not a preferred option to slice the genome into many small sectons. >> >> Lars >> >> -----Original Message----- >> From: Dan Tenenbaum [mailto:dtenenba at fhcrc.org] >> Sent: Sunday, May 20, 2012 12:30 AM >> To: Xuekui Zhang >> Cc: Raphael Gottardo; Lars Hennig; Renan Sauteraud; bioconductor at r-project.org >> Subject: Re: [BioC] Excessive memory requirements of PING or bug? >> >> [cc'ing Bioconductor list so others can benefit...] >> >> On Sat, May 19, 2012 at 3:28 PM, Xuekui Zhang<ubcxzhang at="" gmail.com=""> wrote: >>> Hi Lars, >>> >>> Did you try to analyze each chromosome separately? >>> Please let me know if that still can not solve the problem. >>> >>> Xuekui >>> >>> On May 19, 2012, at 5:35 PM, Raphael Gottardo wrote: >>> >>> Hi Lars, >>> >>> Xuekui ccied here will look into it. >>> >>> Raphael >>> >>> -- >>> Raphael Gottardo, Associate Member >>> http://www.rglab.org >>> Fred Hutchinson Cancer Research Center Vaccine and Infectious Disease >>> Division Public Health Sciences Division >>> >>> >>> >>> On May 18, 2012, at 11:56 AM, Dan Tenenbaum wrote: >>> >>> I'm cc'ing one of the PING maintainers who can perhaps shed more light >>> on this. >>> Dan >>> >>> >>> On Thu, May 17, 2012 at 2:55 PM, Lars Hennig<lars.hennig at="" slu.se=""> wrote: >>> >>> Dear PING maintainers, >>> >>> >>> Running PING with the example from the vignette works fine, but >>> segmentReads causes a "cannot allocate memory block of size >>> 68719476735.9 Gb" error when using my own ChIP-seq sample data. (16Mio >>> paired end reads mapped with bowtie). This is an Arabidopsis sample (genome size = 130MB). >>> >>> Using a sample of 100000 of our own reads runs smoothly again, 2.5 Mio >>> crash with a similarly high memory request as mentioned above. >>> Including snowfall or not has no effect. >>> >>> >>> Is there a way to trick PING into processing more than some few 100000 >>> reads with "normal" memory (I have 48 Gb available). If PING really >>> has a very high memory need, this could be mentioned in the documentation. >>> >>> >>> Thank you very much, >>> >>> >>> Lars >>> >>> >>> Script: >>> >>> >>> library(ShortRead) >>> >>> >>> reads<- readAligned("reads_sorted.bam", type="BAM") >>> >>> reads<- reads[!is.na(position(reads))] >>> >>> reads<- reads[chromosome(reads) %in% c("Chr4")] >>> >>> >>> #reads<- reads[1:100000] >>> >>> >>> library(PING) >>> >>> library(snowfall) >>> >>> sfInit(parallel=TRUE,cpus=4) >>> >>> sfLibrary(PING) >>> >>> >>> >>> reads<- as(reads,"RangesList") >>> >>> reads<- as(reads,"RangedData") >>> >>> reads<- as(reads,"GenomeData") >>> >>> >>> seg<-segmentReads(reads, minReads=5, maxLregion=1200,minLregion=80, >>> jitter=T) >>> >>> >>> >>> >>> >>> traceback() >>> >>> 2: .Call("segReadsAll", data, dataC, start, end, as.integer(jitter), >>> >>> paraSW, as.integer(maxStep), as.integer(minLregion), PACKAGE = >>> "PING") >>> >>> 1: segmentReads(reads_gd, minReads = 5, maxLregion = 1200, minLregion >>> = 80, >>> >>> jitter = T) >>> >>> >>> >>> sessionInfo() >>> >>> R version 2.15.0 (2012-03-30) >>> >>> Platform: x86_64-pc-linux-gnu (64-bit) >>> >>> >>> locale: >>> >>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>> >>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>> >>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >>> >>> [7] LC_PAPER=C LC_NAME=C >>> >>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>> >>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>> >>> >>> attached base packages: >>> >>> [1] stats graphics grDevices utils datasets methods base >>> >>> >>> other attached packages: >>> >>> [1] snowfall_1.84 snow_0.3-9 PING_1.0.0 >>> >>> [4] chipseq_1.6.0 ShortRead_1.14.3 latticeExtra_0.6-19 >>> >>> [7] RColorBrewer_1.0-5 Rsamtools_1.8.4 lattice_0.20-6 >>> >>> [10] BSgenome_1.24.0 Biostrings_2.24.1 GenomicRanges_1.8.6 >>> >>> [13] IRanges_1.14.3 BiocGenerics_0.2.0 >>> >>> >>> loaded via a namespace (and not attached): >>> >>> [1] Biobase_2.16.0 biomaRt_2.12.0 bitops_1.0-4.1 >>> >>> [4] GenomeGraphs_1.16.0 grid_2.15.0 hwriter_1.3 >>> >>> [7] RCurl_1.91-1 stats4_2.15.0 tools_2.15.0 >>> >>> [10] XML_3.9-4 zlibbioc_1.2.0 >>> >>> >>> >>> Dr. Lars Hennig >>> >>> Professor of Genetics >>> >>> Swedish University of Agricultural Sciences >>> >>> Uppsala BioCenter >>> >>> Department of Plant Biology and Forest Genetics >>> >>> PO-Box 7080 >>> >>> SE-75007 Uppsala, Sweden >>> >>> Lars.Hennig at vbsg.slu.se >>> >>> Tel. +46 18 67 3326 >>> >>> Fax +46 18 67 3389 >>> >>> >>> Visiting address: >>> >>> Uppsala BioCenter >>> >>> Almas All? 5 >>> >>> SE-75651 Uppsala, Sweden >>> >>> Room A-489 >>> >>> >>> >>> [[alternative HTML version deleted]] >>> >>> >>> >>> _______________________________________________ >>> >>> Bioconductor mailing list >>> >>> Bioconductor at r-project.org >>> >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> >>> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793

ADD REPLY • link 12.5 years ago Martin Morgan 25k

0

Entering edit mode

Thanks Martin. I agree this sounds like something we should fix. -- Raphael Gottardo, Associate Member http://www.rglab.org Fred Hutchinson Cancer Research Center Vaccine and Infectious Disease Division Public Health Sciences Division On May 22, 2012, at 5:48 AM, Martin Morgan wrote: > On 05/21/2012 10:09 AM, Xuekui Zhang wrote: >> Hi Lars, >> >> Thanks for your feed back! >> Yes, cutting chromosome before segmentation is not preferred, since the cutting points might be where the peaks/nucleosomes are. >> To avoid this problem, you could run a sliding window (e.g. window size 300bp, step size 10 bp) on a chromosome, count reads count in each window to find valley of reads counts curve and good cutting there. >> When we make the next version of PING, we could integrate cutting into segmentation step to avoid cutting chromosome on wrong place. > > The original problem sounded like an integer overflow in the underlying C code in PING. Can that be fixed? If there is a simple reproducible example (e.g., using R's random number generators to simulate large enough data) I can perhaps help to identify this. > > Martin > >> Xuekui >> >> On May 20, 2012, at 3:25 AM, Lars Hennig wrote: >> >>> Yes, I tried. Restricting to single chromosomes of ~ 20MB did not help but going to much smaller subchromosomal domains did eventually solve the problem. Still, this is not a preferred option to slice the genome into many small sectons. >>> >>> Lars >>> >>> -----Original Message----- >>> From: Dan Tenenbaum [mailto:dtenenba@fhcrc.org] >>> Sent: Sunday, May 20, 2012 12:30 AM >>> To: Xuekui Zhang >>> Cc: Raphael Gottardo; Lars Hennig; Renan Sauteraud; bioconductor@r-project.org >>> Subject: Re: [BioC] Excessive memory requirements of PING or bug? >>> >>> [cc'ing Bioconductor list so others can benefit...] >>> >>> On Sat, May 19, 2012 at 3:28 PM, Xuekui Zhang<ubcxzhang@gmail.com> wrote: >>>> Hi Lars, >>>> >>>> Did you try to analyze each chromosome separately? >>>> Please let me know if that still can not solve the problem. >>>> >>>> Xuekui >>>> >>>> On May 19, 2012, at 5:35 PM, Raphael Gottardo wrote: >>>> >>>> Hi Lars, >>>> >>>> Xuekui ccied here will look into it. >>>> >>>> Raphael >>>> >>>> -- >>>> Raphael Gottardo, Associate Member >>>> http://www.rglab.org >>>> Fred Hutchinson Cancer Research Center Vaccine and Infectious Disease >>>> Division Public Health Sciences Division >>>> >>>> >>>> >>>> On May 18, 2012, at 11:56 AM, Dan Tenenbaum wrote: >>>> >>>> I'm cc'ing one of the PING maintainers who can perhaps shed more light >>>> on this. >>>> Dan >>>> >>>> >>>> On Thu, May 17, 2012 at 2:55 PM, Lars Hennig<lars.hennig@slu.se> wrote: >>>> >>>> Dear PING maintainers, >>>> >>>> >>>> Running PING with the example from the vignette works fine, but >>>> segmentReads causes a "cannot allocate memory block of size >>>> 68719476735.9 Gb" error when using my own ChIP-seq sample data. (16Mio >>>> paired end reads mapped with bowtie). This is an Arabidopsis sample (genome size = 130MB). >>>> >>>> Using a sample of 100000 of our own reads runs smoothly again, 2.5 Mio >>>> crash with a similarly high memory request as mentioned above. >>>> Including snowfall or not has no effect. >>>> >>>> >>>> Is there a way to trick PING into processing more than some few 100000 >>>> reads with "normal" memory (I have 48 Gb available). If PING really >>>> has a very high memory need, this could be mentioned in the documentation. >>>> >>>> >>>> Thank you very much, >>>> >>>> >>>> Lars >>>> >>>> >>>> Script: >>>> >>>> >>>> library(ShortRead) >>>> >>>> >>>> reads<- readAligned("reads_sorted.bam", type="BAM") >>>> >>>> reads<- reads[!is.na(position(reads))] >>>> >>>> reads<- reads[chromosome(reads) %in% c("Chr4")] >>>> >>>> >>>> #reads<- reads[1:100000] >>>> >>>> >>>> library(PING) >>>> >>>> library(snowfall) >>>> >>>> sfInit(parallel=TRUE,cpus=4) >>>> >>>> sfLibrary(PING) >>>> >>>> >>>> >>>> reads<- as(reads,"RangesList") >>>> >>>> reads<- as(reads,"RangedData") >>>> >>>> reads<- as(reads,"GenomeData") >>>> >>>> >>>> seg<-segmentReads(reads, minReads=5, maxLregion=1200,minLregion=80, >>>> jitter=T) >>>> >>>> >>>> >>>> >>>> >>>> traceback() >>>> >>>> 2: .Call("segReadsAll", data, dataC, start, end, as.integer(jitter), >>>> >>>> paraSW, as.integer(maxStep), as.integer(minLregion), PACKAGE = >>>> "PING") >>>> >>>> 1: segmentReads(reads_gd, minReads = 5, maxLregion = 1200, minLregion >>>> = 80, >>>> >>>> jitter = T) >>>> >>>> >>>> >>>> sessionInfo() >>>> >>>> R version 2.15.0 (2012-03-30) >>>> >>>> Platform: x86_64-pc-linux-gnu (64-bit) >>>> >>>> >>>> locale: >>>> >>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>>> >>>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>>> >>>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >>>> >>>> [7] LC_PAPER=C LC_NAME=C >>>> >>>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>>> >>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>>> >>>> >>>> attached base packages: >>>> >>>> [1] stats graphics grDevices utils datasets methods base >>>> >>>> >>>> other attached packages: >>>> >>>> [1] snowfall_1.84 snow_0.3-9 PING_1.0.0 >>>> >>>> [4] chipseq_1.6.0 ShortRead_1.14.3 latticeExtra_0.6-19 >>>> >>>> [7] RColorBrewer_1.0-5 Rsamtools_1.8.4 lattice_0.20-6 >>>> >>>> [10] BSgenome_1.24.0 Biostrings_2.24.1 GenomicRanges_1.8.6 >>>> >>>> [13] IRanges_1.14.3 BiocGenerics_0.2.0 >>>> >>>> >>>> loaded via a namespace (and not attached): >>>> >>>> [1] Biobase_2.16.0 biomaRt_2.12.0 bitops_1.0-4.1 >>>> >>>> [4] GenomeGraphs_1.16.0 grid_2.15.0 hwriter_1.3 >>>> >>>> [7] RCurl_1.91-1 stats4_2.15.0 tools_2.15.0 >>>> >>>> [10] XML_3.9-4 zlibbioc_1.2.0 >>>> >>>> >>>> >>>> Dr. Lars Hennig >>>> >>>> Professor of Genetics >>>> >>>> Swedish University of Agricultural Sciences >>>> >>>> Uppsala BioCenter >>>> >>>> Department of Plant Biology and Forest Genetics >>>> >>>> PO-Box 7080 >>>> >>>> SE-75007 Uppsala, Sweden >>>> >>>> Lars.Hennig@vbsg.slu.se >>>> >>>> Tel. +46 18 67 3326 >>>> >>>> Fax +46 18 67 3389 >>>> >>>> >>>> Visiting address: >>>> >>>> Uppsala BioCenter >>>> >>>> Almas Allé 5 >>>> >>>> SE-75651 Uppsala, Sweden >>>> >>>> Room A-489 >>>> >>>> >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> >>>> >>>> _______________________________________________ >>>> >>>> Bioconductor mailing list >>>> >>>> Bioconductor@r-project.org >>>> >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>>> >>>> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > -- > Computational Biology > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 > > Location: M1-B861 > Telephone: 206 667-2793 [[alternative HTML version deleted]]

ADD REPLY • link 12.5 years ago Raphael Gottardo ▴ 10

Login before adding your answer.