RdbiPgSQL: slowness in dbWriteTable
2
0
Entering edit mode
Jeff Gentry ★ 3.9k
@jeff-gentry-12
Last seen 10.3 years ago
Hi there ... In the past I've had some difficulty inserting some larger tables into postgres using dbWriteTable() from RdbiPgSQL, in terms of speed performance but now I've run into some data.frames which are just too large (although they don't seem to be all that large in practical terms). As an example, I was attempting to insert the assayData from two SnpCallSets into a DB over the weekend, 6 tables each of ~350 columns and 250K rows. When I came back this morning (~3.5 days), it hadn't even completed one of these tables, with both CPU and RAM maxxed (on a machine w/ 16GB of RAM and a pretty decent CPU). Anyone have ideas on a faster method to get these tables inserted? One possibility that I've found in the past to be quicker is to manually loop INSERT statements, but the problem there is detection of data types (which I believe is the real slowdown from the RdbiPgSQL package in the first place) and proper quoting/non-quoting.
RdbiPgSQL RdbiPgSQL • 2.9k views
ADD COMMENT
0
Entering edit mode
@sean-davis-490
Last seen 4 months ago
United States
Jeff Gentry wrote: > Hi there ... > > In the past I've had some difficulty inserting some larger tables into > postgres using dbWriteTable() from RdbiPgSQL, in terms of speed > performance but now I've run into some data.frames which are just > too large (although they don't seem to be all that large in practical > terms). > > As an example, I was attempting to insert the assayData from two > SnpCallSets into a DB over the weekend, 6 tables each of ~350 columns and > 250K rows. When I came back this morning (~3.5 days), it hadn't even > completed one of these tables, with both CPU and RAM maxxed (on a machine > w/ 16GB of RAM and a pretty decent CPU). > > Anyone have ideas on a faster method to get these tables inserted? One > possibility that I've found in the past to be quicker is to manually loop > INSERT statements, but the problem there is detection of data types (which > I believe is the real slowdown from the RdbiPgSQL package in the first > place) and proper quoting/non-quoting. Hi, Jeff. The fastest way to get data into postgresql is to use some version of the copy protocol. I do not know what RdbiPgSQL uses for doing dbWriteTable. I did a quick test on my machine (4Gb RAM, 3-year-old Mac G5) and was able to insert 250,000 rows by 350 columns of floats in about 2 minutes. In order to do this: 1) Create the table. You can generate the create statements in R using the class of the various columns of your dataframe and combinations of paste. Alternatively, use dbWriteTable with a couple of rows to get the table creation done automatically. If you use dbWriteTable, do not forget to do a TRUNCATE before loading data. 2) Write your data.frame to disk, tab-delimited text, no column names, with NULL values as "\N" 3) start up psql and do: \copy tablename from filename Generally, looping over inserts (or any form of insert statement) will be much slower than using "copy". Also, make sure that you drop indexes before using copy (or doing inserts, for that matter)--these will kill you on a large insert to a large table. If you do need to do inserts, be sure to issue a "begin" before doing the inserts and a "commit" at the end; isolating things in a transaction block will gain you speed. Hope that helps. Sean
ADD COMMENT
0
Entering edit mode
> Hi, Jeff. The fastest way to get data into postgresql is to use some > version of the copy protocol. I do not know what RdbiPgSQL uses for Yup, your method is essentially what I came across yesterday afternoon, with some slight variations. What you outline is a bit better though. This works for my immediate task, although I'm going to need a way to allow this to be automated (hopefully within RdbiPgSQL itself). COPY won't work, and /copy of course needs to be called from psql. One thought was to try and get /copy implemented in R.
ADD REPLY
0
Entering edit mode
Jeff Gentry wrote: >> Hi, Jeff. The fastest way to get data into postgresql is to use some >> version of the copy protocol. I do not know what RdbiPgSQL uses for > > Yup, your method is essentially what I came across yesterday afternoon, > with some slight variations. What you outline is a bit better > though. This works for my immediate task, although I'm going to need a > way to allow this to be automated (hopefully within RdbiPgSQL > itself). COPY won't work, and /copy of course needs to be called from > psql. One thought was to try and get /copy implemented in R. > This is the best solution for dbWriteTable, I think, which is meant to be the fastest way to get data into and out of the database, but there may be good reasons for not doing so if it isn't using the copy protocol. Other clients (perl and python that I know of) have so- called client-side copy in place already. Of course the underlying C libraries have PGPutCopyData() and PGGetCopyData() (sp?) for just this purpose. However, I haven't played in C/C++ for quite some time to be able to implement something quickly. Sean
ADD REPLY
0
Entering edit mode
John Zhang ★ 2.9k
@john-zhang-6
Last seen 10.3 years ago
> >Yup, your method is essentially what I came across yesterday afternoon, >with some slight variations. What you outline is a bit better >though. This works for my immediate task, although I'm going to need a >way to allow this to be automated (hopefully within RdbiPgSQL >itself). COPY won't work, and /copy of course needs to be called from >psql. One thought was to try and get /copy implemented in R. > Hi, jeff, Have you tried to do copy in plain sql within RdbiPgSQL? e. g. > sql <- "copy yourDBName from 'yourFileName' using delimiters 'yourDelimiters'" > dbSendQuery(yourConnection, sql) >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor Jianhua Zhang Department of Medical Oncology Dana-Farber Cancer Institute 44 Binney Street Boston, MA 02115-6084
ADD COMMENT
0
Entering edit mode
John Zhang wrote: >> Yup, your method is essentially what I came across yesterday afternoon, >> with some slight variations. What you outline is a bit better >> though. This works for my immediate task, although I'm going to need a >> way to allow this to be automated (hopefully within RdbiPgSQL >> itself). COPY won't work, and /copy of course needs to be called from >> psql. One thought was to try and get /copy implemented in R. >> > > > Hi, jeff, > > Have you tried to do copy in plain sql within RdbiPgSQL? e. g. > >> sql <- "copy yourDBName from 'yourFileName' using delimiters 'yourDelimiters'" >> dbSendQuery(yourConnection, sql) I don't think this will work unless the file is already on the postgresql server. From the postgres manual: "COPY with a file name instructs the PostgreSQL server to directly read from or write to a file. The file must be accessible to the server and the name must be specified from the viewpoint of the server." If Jeff is on the DB machine, then I think this will work. Sean
ADD REPLY

Login before adding your answer.

Traffic: 706 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6