Cleaning up after getSeq(BSgenome, GRanges)
1
0
Entering edit mode
@steve-lianoglou-2771
Last seen 14 months ago
United States
Howdy, Say I'd like to fetch muchos sequences from hg19 that are defined in a GRanges object that spans all hg19 chromosomes. I can make my life easy and do: R> library(BSgenome.Hsapiens.UCSC.hg19) R> seqs <- getSeq(Hsapiens, my.GRanges) But while my life has been made easy, life for my CPU has been made harder as I (think that I) have now all of the Hsapiens chromosomes loaded up into (I think) the Hsapiens at .seqs_cache. I reckon I can do something like: R> rm(list=ls(Hsapiens at .seqs_cache), envir=Hsapiens at .seqs_cache) R> gc() to try to remedy the situation myself, but I wonder if I'm missing something else? Perhaps having a clearCache,BSgenome method to do some cleanup might be handy? Thanks, -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
Cancer Cancer • 1.2k views
ADD COMMENT
0
Entering edit mode
@herve-pages-1542
Last seen 1 day ago
Seattle, WA, United States
Hi Steve, The intention was really that the DNAStringSet object returned by getSeq() would not hold any reference to the chromosomes that getSeq() would load in the cache during the extraction so everything would get automatically uncached at the first gc() opportunity after getSeq() returns. Unfortunately this was broken because of an issue with a low-level helper in IRanges (the "xvcopy" method for XRawList objects to be precise). The problem is fixed in IRanges 1.15.16 (I'll apply the fix to release too): > library(BSgenome.Hsapiens.UCSC.hg19) > gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 1265019 67.6 1710298 91.4 1476915 78.9 Vcells 585626 4.5 1162592 8.9 901241 6.9 > options(verbose=TRUE) # so uncaching events will be reported ## Extracting the first 10 nucleotides from each chromosome: > first10 <- getSeq(Hsapiens, end=10) uncaching chr1 uncaching chr10 uncaching chr11_gl000202_random uncaching chr11 uncaching chr12 uncaching chr13 uncaching chr15 uncaching chr14 uncaching chr16 uncaching chr17_gl000203_random uncaching chr17_gl000206_random uncaching chr19 uncaching chr19_gl000208_random uncaching chr18_gl000207_random uncaching chr18 uncaching chr17_gl000205_random uncaching chr17_gl000204_random uncaching chr17_ctg5_hap1 uncaching chr1_gl000192_random uncaching chr1_gl000191_random uncaching chr19_gl000209_random uncaching chr17 uncaching chr2 uncaching chr21_gl000210_random uncaching chr21 uncaching chr20 uncaching chr22 uncaching chr3 uncaching chr4_gl000193_random uncaching chr4_ctg9_hap1 uncaching chr4_gl000194_random uncaching chr4 uncaching chr5 uncaching chr6_cox_hap2 uncaching chr6_dbb_hap3 uncaching chr6_apd_hap1 uncaching chr6_mcf_hap5 uncaching chr6_mann_hap4 uncaching chr6 uncaching chr7 uncaching chr7_gl000195_random uncaching chr6_ssto_hap7 uncaching chr6_qbl_hap6 uncaching chr8_gl000197_random uncaching chr8_gl000196_random uncaching chr8 uncaching chr9_gl000199_random uncaching chrM uncaching chrUn_gl000213 uncaching chrUn_gl000214 uncaching chrUn_gl000212 uncaching chrUn_gl000211 uncaching chr9_gl000201_random uncaching chr9_gl000200_random uncaching chr9_gl000198_random uncaching chrUn_gl000217 uncaching chrUn_gl000220 uncaching chrUn_gl000223 uncaching chrUn_gl000227 uncaching chrUn_gl000230 uncaching chrUn_gl000234 uncaching chrUn_gl000238 uncaching chrUn_gl000242 uncaching chrUn_gl000243 uncaching chrUn_gl000241 uncaching chrUn_gl000240 uncaching chrUn_gl000239 uncaching chrUn_gl000237 uncaching chrUn_gl000236 uncaching chrUn_gl000235 uncaching chrUn_gl000233 uncaching chrUn_gl000232 uncaching chrUn_gl000231 uncaching chrUn_gl000229 uncaching chrUn_gl000228 uncaching chrUn_gl000226 uncaching chrUn_gl000225 uncaching chrUn_gl000224 uncaching chrUn_gl000222 uncaching chrUn_gl000221 uncaching chrUn_gl000219 uncaching chrUn_gl000218 uncaching chrUn_gl000216 uncaching chrUn_gl000215 uncaching chrUn_gl000246 uncaching chrUn_gl000249 uncaching chrUn_gl000248 uncaching chrUn_gl000247 uncaching chrUn_gl000245 uncaching chrUn_gl000244 uncaching chrX uncaching chr9 > first10 A DNAStringSet instance of length 93 width seq [1] 10 NNNNNNNNNN [2] 10 NNNNNNNNNN [3] 10 NNNNNNNNNN [4] 10 NNNNNNNNNN [5] 10 NNNNNNNNNN [6] 10 NNNNNNNNNN [7] 10 NNNNNNNNNN [8] 10 NNNNNNNNNN [9] 10 NNNNNNNNNN ... ... ... [85] 10 GATCTGAAGA [86] 10 GATCATGCCT [87] 10 GATCTTCAGG [88] 10 GATCTGCGCA [89] 10 GATCAGATAG [90] 10 GATCTTAAGC [91] 10 GATCTAAGTT [92] 10 GATCTGTCAT [93] 10 GATCACCAAG > ls(Hsapiens at .seqs_cache) [1] "chrY" > gc() Garbage collection 177 = 120+21+36 (level 2) ... 69.6 Mbytes of cons cells used (66%) 61.8 Mbytes of vectors used (17%) uncaching chrY used (Mb) gc trigger (Mb) max used (Mb) Ncells 1301932 69.6 1967602 105.1 1967602 105.1 Vcells 8094983 61.8 48876866 373.0 58058596 443.0 > ls(Hsapiens at .seqs_cache) character(0) > gc() Garbage collection 178 = 120+21+37 (level 2) ... 69.5 Mbytes of cons cells used (66%) 4.6 Mbytes of vectors used (2%) used (Mb) gc trigger (Mb) max used (Mb) Ncells 1300073 69.5 1967602 105.1 1967602 105.1 Vcells 600775 4.6 39101492 298.4 58058596 443.0 Memory used is almost the same as before getSeq() was called. Thanks for reporting the issue! H. On 06/27/2012 10:20 AM, Steve Lianoglou wrote: > Howdy, > > Say I'd like to fetch muchos sequences from hg19 that are defined in a > GRanges object that spans all hg19 chromosomes. > > I can make my life easy and do: > > R> library(BSgenome.Hsapiens.UCSC.hg19) > R> seqs <- getSeq(Hsapiens, my.GRanges) > > But while my life has been made easy, life for my CPU has been made > harder as I (think that I) have now all of the Hsapiens chromosomes > loaded up into (I think) the Hsapiens at .seqs_cache. > > I reckon I can do something like: > > R> rm(list=ls(Hsapiens at .seqs_cache), envir=Hsapiens at .seqs_cache) > R> gc() > > to try to remedy the situation myself, but I wonder if I'm missing > something else? > > Perhaps having a clearCache,BSgenome method to do some cleanup might be handy? > > Thanks, > -steve > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD COMMENT
0
Entering edit mode
?Merci beaucoup! On Sat, Jun 30, 2012 at 3:42 AM, Hervé Pagès <hpages at="" fhcrc.org=""> wrote: > Hi Steve, > > The intention was really that the DNAStringSet object returned by > getSeq() would not hold any reference to the chromosomes that > getSeq() would load in the cache during the extraction so everything > would get automatically uncached at the first gc() opportunity after > getSeq() returns. > Unfortunately this was broken because of an issue with a low-level > helper in IRanges (the "xvcopy" method for XRawList objects to be > precise). The problem is fixed in IRanges 1.15.16 (I'll apply the > fix to release too): > >> library(BSgenome.Hsapiens.UCSC.hg19) > >> gc() > ? ? ? ? ?used (Mb) gc trigger (Mb) max used (Mb) > Ncells 1265019 67.6 ? ?1710298 91.4 ?1476915 78.9 > Vcells ?585626 ?4.5 ? ?1162592 ?8.9 ? 901241 ?6.9 > >> options(verbose=TRUE) ?# so uncaching events will be reported > > ## Extracting the first 10 nucleotides from each chromosome: >> first10 <- getSeq(Hsapiens, end=10) > uncaching chr1 > uncaching chr10 > uncaching chr11_gl000202_random > uncaching chr11 > uncaching chr12 > uncaching chr13 > uncaching chr15 > uncaching chr14 > uncaching chr16 > uncaching chr17_gl000203_random > uncaching chr17_gl000206_random > uncaching chr19 > uncaching chr19_gl000208_random > uncaching chr18_gl000207_random > uncaching chr18 > uncaching chr17_gl000205_random > uncaching chr17_gl000204_random > uncaching chr17_ctg5_hap1 > uncaching chr1_gl000192_random > uncaching chr1_gl000191_random > uncaching chr19_gl000209_random > uncaching chr17 > uncaching chr2 > uncaching chr21_gl000210_random > uncaching chr21 > uncaching chr20 > uncaching chr22 > uncaching chr3 > uncaching chr4_gl000193_random > uncaching chr4_ctg9_hap1 > uncaching chr4_gl000194_random > uncaching chr4 > uncaching chr5 > uncaching chr6_cox_hap2 > uncaching chr6_dbb_hap3 > uncaching chr6_apd_hap1 > uncaching chr6_mcf_hap5 > uncaching chr6_mann_hap4 > uncaching chr6 > uncaching chr7 > uncaching chr7_gl000195_random > uncaching chr6_ssto_hap7 > uncaching chr6_qbl_hap6 > uncaching chr8_gl000197_random > uncaching chr8_gl000196_random > uncaching chr8 > uncaching chr9_gl000199_random > uncaching chrM > uncaching chrUn_gl000213 > uncaching chrUn_gl000214 > uncaching chrUn_gl000212 > uncaching chrUn_gl000211 > uncaching chr9_gl000201_random > uncaching chr9_gl000200_random > uncaching chr9_gl000198_random > uncaching chrUn_gl000217 > uncaching chrUn_gl000220 > uncaching chrUn_gl000223 > uncaching chrUn_gl000227 > uncaching chrUn_gl000230 > uncaching chrUn_gl000234 > uncaching chrUn_gl000238 > uncaching chrUn_gl000242 > uncaching chrUn_gl000243 > uncaching chrUn_gl000241 > uncaching chrUn_gl000240 > uncaching chrUn_gl000239 > uncaching chrUn_gl000237 > uncaching chrUn_gl000236 > uncaching chrUn_gl000235 > uncaching chrUn_gl000233 > uncaching chrUn_gl000232 > uncaching chrUn_gl000231 > uncaching chrUn_gl000229 > uncaching chrUn_gl000228 > uncaching chrUn_gl000226 > uncaching chrUn_gl000225 > uncaching chrUn_gl000224 > uncaching chrUn_gl000222 > uncaching chrUn_gl000221 > uncaching chrUn_gl000219 > uncaching chrUn_gl000218 > uncaching chrUn_gl000216 > uncaching chrUn_gl000215 > uncaching chrUn_gl000246 > uncaching chrUn_gl000249 > uncaching chrUn_gl000248 > uncaching chrUn_gl000247 > uncaching chrUn_gl000245 > uncaching chrUn_gl000244 > uncaching chrX > uncaching chr9 > >> first10 > ?A DNAStringSet instance of length 93 > ? ? width seq > ?[1] ? ?10 NNNNNNNNNN > ?[2] ? ?10 NNNNNNNNNN > ?[3] ? ?10 NNNNNNNNNN > ?[4] ? ?10 NNNNNNNNNN > ?[5] ? ?10 NNNNNNNNNN > ?[6] ? ?10 NNNNNNNNNN > ?[7] ? ?10 NNNNNNNNNN > ?[8] ? ?10 NNNNNNNNNN > ?[9] ? ?10 NNNNNNNNNN > ?... ? ... ... > [85] ? ?10 GATCTGAAGA > [86] ? ?10 GATCATGCCT > [87] ? ?10 GATCTTCAGG > [88] ? ?10 GATCTGCGCA > [89] ? ?10 GATCAGATAG > [90] ? ?10 GATCTTAAGC > [91] ? ?10 GATCTAAGTT > [92] ? ?10 GATCTGTCAT > [93] ? ?10 GATCACCAAG > >> ls(Hsapiens at .seqs_cache) > [1] "chrY" > >> gc() > Garbage collection 177 = 120+21+36 (level 2) ... > 69.6 Mbytes of cons cells used (66%) > 61.8 Mbytes of vectors used (17%) > uncaching chrY > ? ? ? ? ?used (Mb) gc trigger ?(Mb) max used ?(Mb) > Ncells 1301932 69.6 ? ?1967602 105.1 ?1967602 105.1 > Vcells 8094983 61.8 ? 48876866 373.0 58058596 443.0 > >> ls(Hsapiens at .seqs_cache) > character(0) > >> gc() > Garbage collection 178 = 120+21+37 (level 2) ... > 69.5 Mbytes of cons cells used (66%) > 4.6 Mbytes of vectors used (2%) > ? ? ? ? ?used (Mb) gc trigger ?(Mb) max used ?(Mb) > Ncells 1300073 69.5 ? ?1967602 105.1 ?1967602 105.1 > Vcells ?600775 ?4.6 ? 39101492 298.4 58058596 443.0 > > Memory used is almost the same as before getSeq() was called. > > Thanks for reporting the issue! > > H. > > > > On 06/27/2012 10:20 AM, Steve Lianoglou wrote: >> >> Howdy, >> >> Say I'd like to fetch muchos sequences from hg19 that are defined in a >> GRanges object that spans all hg19 chromosomes. >> >> I can make my life easy and do: >> >> R> library(BSgenome.Hsapiens.UCSC.hg19) >> R> seqs <- getSeq(Hsapiens, my.GRanges) >> >> But while my life has been made easy, life for my CPU has been made >> harder as I (think that I) have now all of the Hsapiens chromosomes >> loaded up into (I think) the Hsapiens at .seqs_cache. >> >> I reckon I can do something like: >> >> R> rm(list=ls(Hsapiens at .seqs_cache), envir=Hsapiens at .seqs_cache) >> R> gc() >> >> to try to remedy the situation myself, but I wonder if I'm missing >> something else? >> >> Perhaps having a clearCache,BSgenome method to do some cleanup might be >> handy? >> >> Thanks, >> -steve >> > > > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpages at fhcrc.org > Phone: ?(206) 667-5791 > Fax: ? ?(206) 667-1319 > > -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
ADD REPLY

Login before adding your answer.

Traffic: 470 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6