BSgenome and R memory use
1
0
Entering edit mode
Paul Leo ▴ 970
@paul-leo-2092
Last seen 10.2 years ago
An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20070703/ cfd6d1a6/attachment.pl
• 699 views
ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 4 months ago
United States
Hi Paul -- See class?BSgenome I think what happens is that > use.chromo <- Mmusculus[[chr.search[j]]] causes the data to be loaded, and a 'view' to be created. > rm(use.chromo) removes the view, but does not unload the data. So you'll need to also > unload(Mmusculus, chr.search[j]) I've found these packages very useful, thanks Herve! Martin "Paul Leo" <p.leo at="" uq.edu.au=""> writes: > I have a bit of a problem with R running out of memory with BSgenome . I > have distilled it down to the bare bones. Basically I am just calling up > different mouse chromosomes and putting them into an object > (use.chromo). I then immediately remove it with the simplistic idea that > this will free up the space that this object required. I always use the > same object and I do nothing with it. > > The memory is rapidly depleted. I would love to know what tricks are out > there for cleaning up after removed objects. And in general what the > origin of this behavior is....and ideas now to avoid it. > > Until the loop below is stated I have enough memory to load any single > mouse chromosome. > > Thanks > Paul > > > ### set up the test > library(BSgenome.Mmusculus.UCSC.mm8) > chromos<-c(1:19,"X","Y") > chr.search<-paste("chr",chromos,sep="") > #> chr.search > # [1] "chr1" "chr2" "chr3" "chr4" "chr5" "chr6" "chr7" "chr8" > "chr9" "chr10" "chr11" "chr12" "chr13" > #[14] "chr14" "chr15" "chr16" "chr17" "chr18" "chr19" "chrX" "chrY" > > ##### run the test > k<-0 > for (i in 1:10){ > for (j in 1:length(chr.search)){ > use.chromo <- Mmusculus[[chr.search[j]]] > rm(use.chromo) > k<-k+1 } } # k is between 6 and 8 typically when this fails > Error: cannot allocate vector of size 138.4 Mb > > ## note same behavior for R2.5 and earlier version of BS genome > ## I am using the standard memory location for windows (1.5GB) I don't > think increasing this will help much > > If you replace > use.chromo <- Mmusculus[[chr.search[j]]] > in the above loop with > p<- getSeq(Mmusculus, chr.search[j], 100,1000) > a similar failure occurs. > > > sessionInfo() > R version 2.6.0 Under development (unstable) (2007-06-26 r42066) > i386-pc-mingw32 > > locale: > LC_COLLATE=English_Australia.1252;LC_CTYPE=English_Australia.1252;LC _MON > ETARY=English_Australia.1252;LC_NUMERIC=C;LC_TIME=English_Australia. 1252 > > attached base packages: > [1] tools stats graphics grDevices utils datasets methods > > [8] base > > other attached packages: > [1] BSgenome.Mmusculus.UCSC.mm8_1.3.0 BSgenome_1.5.0 > [3] Biobase_1.15.17 Biostrings_2.5.11 >> > > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Martin Morgan Bioconductor / Computational Biology http://bioconductor.org
ADD COMMENT
0
Entering edit mode
Hi Martin, Thanks for those tips; I think I can use them to get past this bottle-neck. I too have become dependent on this nice package. Cheers Paul -----Original Message----- From: Martin Morgan [mailto:mtmorgan@fhcrc.org] Sent: Tuesday, 3 July 2007 9:32 PM To: Paul Leo Cc: bioconductor at stat.math.ethz.ch Subject: Re: [BioC] BSgenome and R memory use Hi Paul -- See class?BSgenome I think what happens is that > use.chromo <- Mmusculus[[chr.search[j]]] causes the data to be loaded, and a 'view' to be created. > rm(use.chromo) removes the view, but does not unload the data. So you'll need to also > unload(Mmusculus, chr.search[j]) I've found these packages very useful, thanks Herve! Martin "Paul Leo" <p.leo at="" uq.edu.au=""> writes: > I have a bit of a problem with R running out of memory with BSgenome . I > have distilled it down to the bare bones. Basically I am just calling up > different mouse chromosomes and putting them into an object > (use.chromo). I then immediately remove it with the simplistic idea that > this will free up the space that this object required. I always use the > same object and I do nothing with it. > > The memory is rapidly depleted. I would love to know what tricks are out > there for cleaning up after removed objects. And in general what the > origin of this behavior is....and ideas now to avoid it. > > Until the loop below is stated I have enough memory to load any single > mouse chromosome. > > Thanks > Paul > > > ### set up the test > library(BSgenome.Mmusculus.UCSC.mm8) > chromos<-c(1:19,"X","Y") > chr.search<-paste("chr",chromos,sep="") > #> chr.search > # [1] "chr1" "chr2" "chr3" "chr4" "chr5" "chr6" "chr7" "chr8" > "chr9" "chr10" "chr11" "chr12" "chr13" > #[14] "chr14" "chr15" "chr16" "chr17" "chr18" "chr19" "chrX" "chrY" > > ##### run the test > k<-0 > for (i in 1:10){ > for (j in 1:length(chr.search)){ > use.chromo <- Mmusculus[[chr.search[j]]] > rm(use.chromo) > k<-k+1 } } # k is between 6 and 8 typically when this fails > Error: cannot allocate vector of size 138.4 Mb > > ## note same behavior for R2.5 and earlier version of BS genome > ## I am using the standard memory location for windows (1.5GB) I don't > think increasing this will help much > > If you replace > use.chromo <- Mmusculus[[chr.search[j]]] > in the above loop with > p<- getSeq(Mmusculus, chr.search[j], 100,1000) > a similar failure occurs. > > > sessionInfo() > R version 2.6.0 Under development (unstable) (2007-06-26 r42066) > i386-pc-mingw32 > > locale: > LC_COLLATE=English_Australia.1252;LC_CTYPE=English_Australia.1252;LC_M ON > ETARY=English_Australia.1252;LC_NUMERIC=C;LC_TIME=English_Australia.12 52 > > attached base packages: > [1] tools stats graphics grDevices utils datasets methods > > [8] base > > other attached packages: > [1] BSgenome.Mmusculus.UCSC.mm8_1.3.0 BSgenome_1.5.0 > [3] Biobase_1.15.17 Biostrings_2.5.11 >> > > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Martin Morgan Bioconductor / Computational Biology http://bioconductor.org
ADD REPLY
0
Entering edit mode
Hi Paul, Martin, Martin Morgan wrote: > Hi Paul -- > > See class?BSgenome > > I think what happens is that > >> use.chromo <- Mmusculus[[chr.search[j]]] > > causes the data to be loaded, and a 'view' to be created. To be more precise, use.chromo <- Mmusculus[[chr.search[j]]] creates a new reference to the sequence data (you could call this a "view" too but this might be confusing since the concept of "view" is already used in Biostrings but for something slightly different). To illustrate this, here is what happens to the chr1 sequence data during a typical work flow (#ref_to_chr1 is the number of references to the memory address of this sequence, i.e. the number of existing objects in your current session that point to this sequence): > library(BSgenome.Mmusculus.UCSC.mm8) # This doesn't load the chromosome data # into memory -> #ref_to_chr1 = 0 > gc()["Vcells", "(Mb)"] # [1] 1.6 # We start with only 1.6 Mb of data in memory > Mmusculus$chr1 # Loads chr1 seq into memory (hence takes # a long time) + creates a reference to it # -> #ref_to_chr1 = 1 > gc()["Vcells", "(Mb)"] # [1] 189.6 # 190 Mb of data in memory! > Mmusculus$chr1 # Doesn't do anything -> #ref_to_chr1 = 1 > x <- Mmusculus$chr1 # This is very fast because a BString object # doesn't contain the sequence data, only # a pointer to the sequence data, hence # chr1 seq is not duplicated in memory. # But we now have 2 BString objects pointing # to the same place in memory -> #ref_to_chr1 = 2 > y <- substr(x, 10, 100) # -> #ref_to_chr1 = 3 You must remove all references to chr1 seq if you want the 190 Mb of memory used by this seq to be freed (it can be hard to keep track of all the references to a given sequence). IMPORTANT: The 1st reference to chr1 seq should be removed last. This is achieved with unload(). All other references are removed by just removing the referencing object. > rm(x) # -> #ref_to_chr1 = 2 > rm(y) # -> #ref_to_chr1 = 1 > unload(Mmusculus, "chr1") # -> #ref_to_chr1 = 0 > gc()["Vcells", "(Mb)"] [1] 1.6 Hope this helps. > >> rm(use.chromo) > > removes the view, but does not unload the data. So you'll need to also > >> unload(Mmusculus, chr.search[j]) > > I've found these packages very useful, thanks Herve! I'm glad you like them. Thanks! Cheers, H. > > Martin > > "Paul Leo" <p.leo at="" uq.edu.au=""> writes: > >> I have a bit of a problem with R running out of memory with BSgenome . I >> have distilled it down to the bare bones. Basically I am just calling up >> different mouse chromosomes and putting them into an object >> (use.chromo). I then immediately remove it with the simplistic idea that >> this will free up the space that this object required. I always use the >> same object and I do nothing with it. >> >> The memory is rapidly depleted. I would love to know what tricks are out >> there for cleaning up after removed objects. And in general what the >> origin of this behavior is....and ideas now to avoid it. >> >> Until the loop below is stated I have enough memory to load any single >> mouse chromosome. >> >> Thanks >> Paul >> >> >> ### set up the test >> library(BSgenome.Mmusculus.UCSC.mm8) >> chromos<-c(1:19,"X","Y") >> chr.search<-paste("chr",chromos,sep="") >> #> chr.search >> # [1] "chr1" "chr2" "chr3" "chr4" "chr5" "chr6" "chr7" "chr8" >> "chr9" "chr10" "chr11" "chr12" "chr13" >> #[14] "chr14" "chr15" "chr16" "chr17" "chr18" "chr19" "chrX" "chrY" >> >> ##### run the test >> k<-0 >> for (i in 1:10){ >> for (j in 1:length(chr.search)){ >> use.chromo <- Mmusculus[[chr.search[j]]] >> rm(use.chromo) >> k<-k+1 } } # k is between 6 and 8 typically when this fails >> Error: cannot allocate vector of size 138.4 Mb >> >> ## note same behavior for R2.5 and earlier version of BS genome >> ## I am using the standard memory location for windows (1.5GB) I don't >> think increasing this will help much >> >> If you replace >> use.chromo <- Mmusculus[[chr.search[j]]] >> in the above loop with >> p<- getSeq(Mmusculus, chr.search[j], 100,1000) >> a similar failure occurs. >> >> >> sessionInfo() >> R version 2.6.0 Under development (unstable) (2007-06-26 r42066) >> i386-pc-mingw32 >> >> locale: >> LC_COLLATE=English_Australia.1252;LC_CTYPE=English_Australia.1252;L C_MON >> ETARY=English_Australia.1252;LC_NUMERIC=C;LC_TIME=English_Australia .1252 >> >> attached base packages: >> [1] tools stats graphics grDevices utils datasets methods >> >> [8] base >> >> other attached packages: >> [1] BSgenome.Mmusculus.UCSC.mm8_1.3.0 BSgenome_1.5.0 >> [3] Biobase_1.15.17 Biostrings_2.5.11 >> >> >> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD REPLY

Login before adding your answer.

Traffic: 630 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6