Inquiries of rhdf5
0
0
Entering edit mode
Bernd Fischer ▴ 550
@bernd-fischer-5348
Last seen 8.2 years ago
Germany / Heidelberg / DKFZ
Dear Jason! I added this question to the bioconductor mailing list that other people can join the discusion and can benefit from the answers. Please always send these inquiries to bioconductor@r-project.org. Reading and writing chunk datasets works fastest, if the left most dimensions have the same extension as the dataset itself. E.g. in the example below, the dataset has extensions 20000 x 10000 and the chunk size is 20000 x 10. library(rhdf5) h5createFile("test.h5") h5createDataset(file="test.h5", dataset="A", dims=c(20000,10000), chunk = c(20000,10), level=3) for (i in 1:1000) { print(i) S = matrix(rnorm(200000), nrow=20000, ncol=10) h5write(obj=S, file="test.h5", name="A", index=list(NULL,1:10+(i-1)*10)) } On my computer it takes about half a minute to fill the dataset with random numbers. You can now even use compression, e.g. by setting level=3. This increases the runtime to fill the matrix to about 2 minutes, but can reduce the file size a lot. Best, Bernd > Hi Bernd, > I am now using rhdf5 package to store a large matrix as hdf5 format. The matrix is about 10000*10000 big and contained float type data. We want to maximize the speed of reading in data, but we do not care about the speed of writing the datasets. Though the dataset is stored in matrix, each time only one or several complete rows will be read, ie. all columns will be read for specific rows for each time. According to the manual on bioconductor, the optimal chunk size would be 100*(number of columns). However, when we increased the chunk size from 100*100 to 100*1000, the reading in speed significantly decreases. We did not try 100*10000 chunk size yet because rhdf5 cannot finish writing the dataset for more than several hours. All testings are done with no compression (level=0) > > According to our situation, would you please suggest an optimal chunk size so that the reading in speed reaches its maximum? Or is there any other methods to increase the performance? Thanks! > > > > Best > Jason [[alternative HTML version deleted]]
rhdf5 rhdf5 • 1.2k views
ADD COMMENT

Login before adding your answer.

Traffic: 803 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6