ggbio: Data stored twice in 'GGbio' object
1
0
Entering edit mode
Julian Gehring ★ 1.3k
@julian-gehring-5818
Last seen 5.0 years ago
Hi, The 'ggbio::ggplot' (ggbio_1.9.7, R_2013-08-05 r63513) function seems to store its data twice. library(ggbio) df = data.frame(x = 1:10, y = rnorm(10)) p = ggbio::ggplot(data = df) str(p) identical(p at data, p at ggplot$data) ## TRUE shows that the data 'df' is stored in p at data as well as p at ggplot$data. Especially for large data sets, this is inefficient. Is there a good reason for this? Best wishes Julian
• 861 views
ADD COMMENT
0
Entering edit mode
@michael-lawrence-3846
Last seen 2.4 years ago
United States
This is a flaw in the design of ggbio. It was a solution to the problem of ggplot2 requiring a data.frame in the plot object, while ggbio would like to keep the original data structure (like a GRanges) around. Probably the correct solution is for ggbio to extend the ggplot object, or otherwise represent the plot, and to perform the necessary reduction of the data when the plot is rendered. This is how the ggsubplot package works, although it is not changing the underlying data structure. But the data is only stored *exactly* twice if the input data is a data.frame. It's not very efficient to store the data twice, but my main concern is the redundancy in the data model. On Tue, Aug 6, 2013 at 2:33 AM, Julian Gehring <julian.gehring@embl.de>wrote: > Hi, > > The 'ggbio::ggplot' (ggbio_1.9.7, R_2013-08-05 r63513) function seems to > store its data twice. > > library(ggbio) > df = data.frame(x = 1:10, y = rnorm(10)) > p = ggbio::ggplot(data = df) > str(p) > identical(p@data, p@ggplot$data) ## TRUE > > shows that the data 'df' is stored in p@data as well as p@ggplot$data. > > Especially for large data sets, this is inefficient. Is there a good > reason for this? > > Best wishes > Julian > > ______________________________**_________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.et="" hz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: http://news.gmane.org/gmane.** > science.biology.informatics.**conductor<http: news.gmane.org="" gmane.="" science.biology.informatics.conductor=""> > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Hi Michael, I agree that the main problem is that the data is practically stored twice, irrespective whether this is done in the form of two identical or similar object. Especially having the large amounts of genomic data in mind, this way of handling data may not scale well. Best wishes Julian On 08/06/2013 08:29 PM, Michael Lawrence wrote: > This is a flaw in the design of ggbio. It was a solution to the problem of > ggplot2 requiring a data.frame in the plot object, while ggbio would like > to keep the original data structure (like a GRanges) around. Probably the > correct solution is for ggbio to extend the ggplot object, or otherwise > represent the plot, and to perform the necessary reduction of the data when > the plot is rendered. This is how the ggsubplot package works, although it > is not changing the underlying data structure. > > But the data is only stored *exactly* twice if the input data is a > data.frame. It's not very efficient to store the data twice, but my main > concern is the redundancy in the data model. > > > > > On Tue, Aug 6, 2013 at 2:33 AM, Julian Gehring <julian.gehring at="" embl.de="">wrote: > >> Hi, >> >> The 'ggbio::ggplot' (ggbio_1.9.7, R_2013-08-05 r63513) function seems to >> store its data twice. >> >> library(ggbio) >> df = data.frame(x = 1:10, y = rnorm(10)) >> p = ggbio::ggplot(data = df) >> str(p) >> identical(p at data, p at ggplot$data) ## TRUE >> >> shows that the data 'df' is stored in p at data as well as p at ggplot$data. >> >> Especially for large data sets, this is inefficient. Is there a good >> reason for this? >> >> Best wishes >> Julian >> >> ______________________________**_________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.e="" thz.ch="" mailman="" listinfo="" bioconductor=""> >> Search the archives: http://news.gmane.org/gmane.** >> science.biology.informatics.**conductor<http: news.gmane.org="" gmane="" .science.biology.informatics.conductor=""> >> >
ADD REPLY

Login before adding your answer.

Traffic: 673 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6