12.1 years ago by
Jeff Gentry wrote:
> Hi there ...
> In the past I've had some difficulty inserting some larger tables
> postgres using dbWriteTable() from RdbiPgSQL, in terms of speed
> performance but now I've run into some data.frames which are just
> too large (although they don't seem to be all that large in
> As an example, I was attempting to insert the assayData from two
> SnpCallSets into a DB over the weekend, 6 tables each of ~350
> 250K rows. When I came back this morning (~3.5 days), it hadn't
> completed one of these tables, with both CPU and RAM maxxed (on a
> w/ 16GB of RAM and a pretty decent CPU).
> Anyone have ideas on a faster method to get these tables inserted?
> possibility that I've found in the past to be quicker is to manually
> INSERT statements, but the problem there is detection of data types
> I believe is the real slowdown from the RdbiPgSQL package in the
> place) and proper quoting/non-quoting.
Hi, Jeff. The fastest way to get data into postgresql is to use some
version of the copy protocol. I do not know what RdbiPgSQL uses for
doing dbWriteTable. I did a quick test on my machine (4Gb RAM,
3-year-old Mac G5) and was able to insert 250,000 rows by 350 columns
floats in about 2 minutes. In order to do this:
1) Create the table. You can generate the create statements in R
the class of the various columns of your dataframe and combinations of
paste. Alternatively, use dbWriteTable with a couple of rows to get
table creation done automatically. If you use dbWriteTable, do not
forget to do a TRUNCATE before loading data.
2) Write your data.frame to disk, tab-delimited text, no column
with NULL values as "\N"
3) start up psql and do:
\copy tablename from filename
Generally, looping over inserts (or any form of insert statement) will
be much slower than using "copy". Also, make sure that you drop
before using copy (or doing inserts, for that matter)--these will kill
you on a large insert to a large table. If you do need to do inserts,
be sure to issue a "begin" before doing the inserts and a "commit" at
the end; isolating things in a transaction block will gain you speed.
Hope that helps.