working with large dataframes in R
1
0
Entering edit mode
Elena Sorokin ▴ 150
@elena-sorokin-4659
Last seen 9.6 years ago
Hello, I was recommended to seek out help from this forum. When working with large tables of count data (or any other type of data, for that matter), R runs out of RAM. Specifically, I'm trying to visualize a large data set consisting of count data (55,840 rows by 4 columns) using the graphical package ggplot2, and when I try to make a complex scatterplot, I get an error message. I've pasted an example code below, along with some description of what the data frame is. Any advice about how to store this data.frame object in a less memory-intensive way would be greatly appreciated. Should I just increase my memory-limit? Alternatively, I don't know anything about SQL and relational databases, but am willing to learn, if this is really the key to working with large objects in R. Sincerely, Elena > library(ggplot2) # I already loaded my data into a data frame object using read.delim > summary(df) X.val Y.val time.value graph.type 0 :20642 0 :20737 1:55840 D1vD2:27920 1 : 2139 1 : 2310 U1vU2:27920 2 : 1162 2 : 1150 3 : 774 3 : 797 4 : 607 4 : 572 5 : 535 5 : 513 (Other):29981 (Other):29761 > class(df) [1] "data.frame" > dim(df) [1] 55840 4 > qplot(X.val,Y.val, data= df, colour=graph.type) Error: cannot allocate vector of size 119.2 Mb In addition: Warning messages: 1: In paste(rep(l, length(lvs)), rep(lvs, each = length(l)), sep = sep) : Reached total allocation of 1535Mb: see help(memory.size) 2: In paste(rep(l, length(lvs)), rep(lvs, each = length(l)), sep = sep) : Reached total allocation of 1535Mb: see help(memory.size) 3: In paste(rep(l, length(lvs)), rep(lvs, each = length(l)), sep = sep) : Reached total allocation of 1535Mb: see help(memory.size) 4: In paste(rep(l, length(lvs)), rep(lvs, each = length(l)), sep = sep) : Reached total allocation of 1535Mb: see help(memory.size) > sessionInfo() R version 2.13.0 (2011-04-13) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C LC_TIME=English_United States.1252 attached base packages: [1] grid stats graphics grDevices utils datasets methods base other attached packages: [1] ggplot2_0.8.9 proto_0.3-9.2 reshape_0.8.4 plyr_1.5.2 loaded via a namespace (and not attached): [1] tools_2.13.0
graph graph • 1.2k views
ADD COMMENT
0
Entering edit mode
Robert Baer ▴ 70
@robert-baer-4660
Last seen 22 months ago
United States
I ran the following code which would seem to simulate your data with no problem on a compute with only 2 Gb memory. Further, this is not a particularly large dataframe. X.val = sample(0:10, 2*55840/2, replace = T) Y.val = sample(0:10, 2*55840/2, replace = T) time.value = 1:55840 graph.type = c(rep('D1vD2', 27920), rep('U1vU2',27920)) df = data.frame(X.val, Y.val, time.value, graph.type) library(ggplot2) qplot(X.val,Y.val, data= df, colour=graph.type) My guess is that either you have a large amount of you memory used for something else or you should try again after restarting everything. Rob ------------------------------------------ Robert W. Baer, Ph.D. Professor of Physiology Kirksville College of Osteopathic Medicine A. T. Still University of Health Sciences 800 W. Jefferson St. Kirksville, MO 63501 660-626-2322 FAX 660-626-2965 -------------------------------------------------- From: "Elena Sorokin" <sorokin@wisc.edu> Sent: Wednesday, May 25, 2011 3:09 PM To: <bioconductor at="" r-project.org=""> Subject: [BioC] working with large dataframes in R > Hello, I was recommended to seek out help from this forum. When working > with large tables of count data (or any other type of data, for that > matter), R runs out of RAM. Specifically, I'm trying to visualize a large > data set consisting of count data (55,840 rows by 4 columns) using the > graphical package ggplot2, and when I try to make a complex scatterplot, I > get an error message. I've pasted an example code below, along with some > description of what the data frame is. Any advice about how to store this > data.frame object in a less memory-intensive way would be greatly > appreciated. Should I just increase my memory-limit? Alternatively, I > don't know anything about SQL and relational databases, but am willing to > learn, if this is really the key to working with large objects in R. > Sincerely, Elena > > > library(ggplot2) > # I already loaded my data into a data frame object using read.delim > > summary(df) > X.val Y.val time.value graph.type > 0 :20642 0 :20737 1:55840 D1vD2:27920 > 1 : 2139 1 : 2310 U1vU2:27920 > 2 : 1162 2 : 1150 > 3 : 774 3 : 797 > 4 : 607 4 : 572 > 5 : 535 5 : 513 > (Other):29981 (Other):29761 > > class(df) > [1] "data.frame" > > dim(df) > [1] 55840 4 > > qplot(X.val,Y.val, data= df, colour=graph.type) > Error: cannot allocate vector of size 119.2 Mb > In addition: Warning messages: > 1: In paste(rep(l, length(lvs)), rep(lvs, each = length(l)), sep = sep) : > Reached total allocation of 1535Mb: see help(memory.size) > 2: In paste(rep(l, length(lvs)), rep(lvs, each = length(l)), sep = sep) : > Reached total allocation of 1535Mb: see help(memory.size) > 3: In paste(rep(l, length(lvs)), rep(lvs, each = length(l)), sep = sep) : > Reached total allocation of 1535Mb: see help(memory.size) > 4: In paste(rep(l, length(lvs)), rep(lvs, each = length(l)), sep = sep) : > Reached total allocation of 1535Mb: see help(memory.size) > > > sessionInfo() > R version 2.13.0 (2011-04-13) > Platform: i386-pc-mingw32/i386 (32-bit) > > locale: > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United > States.1252 LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C LC_TIME=English_United > States.1252 > > attached base packages: > [1] grid stats graphics grDevices utils datasets methods > base > > other attached packages: > [1] ggplot2_0.8.9 proto_0.3-9.2 reshape_0.8.4 plyr_1.5.2 > > loaded via a namespace (and not attached): > [1] tools_2.13.0 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD COMMENT

Login before adding your answer.

Traffic: 713 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6