Reading by column

0

Entering edit mode

Hajas, Wayne ▴ 30

@hajas-wayne-6317

Last seen 10.2 years ago

This is likely a simple question - but I couldn't find a similar problem in the archives. I am trying to use rhdr5 to read a .hdf5 file of a pre-determined structure. My problem is that I am generating a table of values that is going to grow very large. So far, I can only figure out how to read the entire table at once. Eventually, I expect my table to be 800 x 100000 so I will need to be able to go one column at a time. Here is an example. > typeof(h5read(HDF5file,"chain0/PyMCsamples")) [1] "list" One of the columns in the dataframe (elements in the list) is named 'deviance'. > length(h5read(HDF5file,"chain0/PyMCsamples")$deviance) [1] 47 I would like to be able to do something like: > h5read(HDF5file,"chain0/PyMCsamples/deviance") Error in h5read(HDF5file, "chain0/PyMCsamples/deviance") : Object chain0/PyMCsamples/deviance does not exist in this HDF5 file. > Can anyone point me in the right direction? Thanks very much, Wayne Hajas [[alternative HTML version deleted]]

GO GO • 1.8k views

ADD COMMENT • link 10.9 years ago Hajas, Wayne ▴ 30

0

Entering edit mode

Nathaniel Hayden ▴ 180

@nathaniel-hayden-6327

Last seen 9.4 years ago

United States

Hi, Wayne. Did you find what you needed for your subsetting scenario? I'm not very familiar with rhdf5 myself, but the documentation addresses subsetting for reads and writes by any number of dimensions, using the index argument. See section 3.3 of the rhdf5 vignette. On 01/08/2014 12:09 PM, Hajas, Wayne wrote: > This is likely a simple question - but I couldn't find a similar problem > in the archives. > > > > I am trying to use rhdr5 to read a .hdf5 file of a pre-determined > structure. My problem is that I am generating a table of values that is > going to grow very large. So far, I can only figure out how to read the > entire table at once. Eventually, I expect my table to be 800 x 100000 > so I will need to be able to go one column at a time. > > > > Here is an example. > > > >> typeof(h5read(HDF5file,"chain0/PyMCsamples")) > [1] "list" > > > > One of the columns in the dataframe (elements in the list) is named > 'deviance'. > > > >> length(h5read(HDF5file,"chain0/PyMCsamples")$deviance) > [1] 47 > > > > I would like to be able to do something like: > > > >> h5read(HDF5file,"chain0/PyMCsamples/deviance") > Error in h5read(HDF5file, "chain0/PyMCsamples/deviance") : > > Object chain0/PyMCsamples/deviance does not exist in this HDF5 file. > > > > Can anyone point me in the right direction? > > Thanks very much, > > Wayne Hajas > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > >

ADD COMMENT • link 10.9 years ago Nathaniel Hayden ▴ 180

0

Entering edit mode

Thanks for asking Nathaniel! Yours is the first response and I haven't had any luck plugging away on my own. I tried chunking as described in 3.3 but didn't get anywhere. At, the end of this message, I will attach another example of what is happening for me. Any suggestions are appreciated, Wayne > > class( h5read(HDF5file,"chain0/PyMCsamples") ) [1] "data.frame" > dim( h5read(HDF5file,"chain0/PyMCsamples") ) [1] 1000 828 > > h5read(HDF5file,"chain0/PyMCsamples",index=list(2:3,c(1,2,4,5))) Error in h5read(HDF5file, "chain0/PyMCsamples", index = list(2:3, c(1, : length of index has to be equal to dimensional extension of HDF5 dataset. > > -----Original Message----- From: Nathaniel Hayden [mailto:nhayden@fhcrc.org] Sent: January-14-14 1:12 PM To: Hajas, Wayne; bioconductor at r-project.org Subject: Re: [BioC] Reading by column Hi, Wayne. Did you find what you needed for your subsetting scenario? I'm not very familiar with rhdf5 myself, but the documentation addresses subsetting for reads and writes by any number of dimensions, using the index argument. See section 3.3 of the rhdf5 vignette. On 01/08/2014 12:09 PM, Hajas, Wayne wrote: > This is likely a simple question - but I couldn't find a similar > problem in the archives. > > > > I am trying to use rhdr5 to read a .hdf5 file of a pre-determined > structure. My problem is that I am generating a table of values that > is going to grow very large. So far, I can only figure out how to > read the entire table at once. Eventually, I expect my table to be > 800 x 100000 so I will need to be able to go one column at a time. > > > > Here is an example. > > > >> typeof(h5read(HDF5file,"chain0/PyMCsamples")) > [1] "list" > > > > One of the columns in the dataframe (elements in the list) is named > 'deviance'. > > > >> length(h5read(HDF5file,"chain0/PyMCsamples")$deviance) > [1] 47 > > > > I would like to be able to do something like: > > > >> h5read(HDF5file,"chain0/PyMCsamples/deviance") > Error in h5read(HDF5file, "chain0/PyMCsamples/deviance") : > > Object chain0/PyMCsamples/deviance does not exist in this HDF5 file. > > > > Can anyone point me in the right direction? > > Thanks very much, > > Wayne Hajas > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > >

ADD REPLY • link 10.9 years ago Hajas, Wayne ▴ 30

0

Entering edit mode

Dear Wayne! Unfortunately, subsetting for the HDF5 compound data type is not yet supported in rhdf5. Anyway, reading a compound datatype with so many columns in R is very inefficient. You should consider storing your data as one or multiple arrays. In this way you can speed up you code and you can benefit from subsetting. Bernd On 14.01.2014, at 22:54, "Hajas, Wayne" <wayne.hajas@dfo-mpo.gc.ca> wrote: > Thanks for asking Nathaniel! > > Yours is the first response and I haven't had any luck plugging away on > my own. I tried chunking as described in 3.3 but didn't get anywhere. > > At, the end of this message, I will attach another example of what is > happening for me. Any suggestions are appreciated, > Wayne > > >> >> class( h5read(HDF5file,"chain0/PyMCsamples") ) > [1] "data.frame" >> dim( h5read(HDF5file,"chain0/PyMCsamples") ) > [1] 1000 828 >> >> h5read(HDF5file,"chain0/PyMCsamples",index=list(2:3,c(1,2,4,5))) > Error in h5read(HDF5file, "chain0/PyMCsamples", index = list(2:3, c(1, > : > length of index has to be equal to dimensional extension of HDF5 > dataset. >> >> > > > -----Original Message----- > From: Nathaniel Hayden [mailto:nhayden@fhcrc.org] > Sent: January-14-14 1:12 PM > To: Hajas, Wayne; bioconductor@r-project.org > Subject: Re: [BioC] Reading by column > > Hi, Wayne. Did you find what you needed for your subsetting scenario? > I'm not very familiar with rhdf5 myself, but the documentation addresses > subsetting for reads and writes by any number of dimensions, using the > index argument. See section 3.3 of the rhdf5 vignette. > On 01/08/2014 12:09 PM, Hajas, Wayne wrote: >> This is likely a simple question - but I couldn't find a similar >> problem in the archives. >> >> >> >> I am trying to use rhdr5 to read a .hdf5 file of a pre-determined >> structure. My problem is that I am generating a table of values that >> is going to grow very large. So far, I can only figure out how to >> read the entire table at once. Eventually, I expect my table to be >> 800 x 100000 so I will need to be able to go one column at a time. >> >> >> >> Here is an example. >> >> >> >>> typeof(h5read(HDF5file,"chain0/PyMCsamples")) >> [1] "list" >> >> >> >> One of the columns in the dataframe (elements in the list) is named >> 'deviance'. >> >> >> >>> length(h5read(HDF5file,"chain0/PyMCsamples")$deviance) >> [1] 47 >> >> >> >> I would like to be able to do something like: >> >> >> >>> h5read(HDF5file,"chain0/PyMCsamples/deviance") >> Error in h5read(HDF5file, "chain0/PyMCsamples/deviance") : >> >> Object chain0/PyMCsamples/deviance does not exist in this HDF5 > file. >> >> >> >> Can anyone point me in the right direction? >> >> Thanks very much, >> >> Wayne Hajas >> >> >> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]

ADD REPLY • link 10.9 years ago Bernd Fischer ▴ 550

0

Entering edit mode

Hajas, Wayne ▴ 30

@hajas-wayne-6317

Last seen 10.2 years ago

Just as an update, I have gone around the problem. I thought things through and realized the calculations I wanted to do are actually pretty easy to code. Pytables is actually pretty good at reading-in the data the way I need it. Now I'm getting everything done in the PYTHON world. Many thanks Nathaniel and anybody else who spent some effort trying to help me out. Wayne Hajas -----Original Message----- From: Hajas, Wayne Sent: January-14-14 1:55 PM To: 'Nathaniel Hayden'; bioconductor at r-project.org Subject: RE: [BioC] Reading by column Thanks for asking Nathaniel! Yours is the first response and I haven't had any luck plugging away on my own. I tried chunking as described in 3.3 but didn't get anywhere. At, the end of this message, I will attach another example of what is happening for me. Any suggestions are appreciated, Wayne > > class( h5read(HDF5file,"chain0/PyMCsamples") ) [1] "data.frame" > dim( h5read(HDF5file,"chain0/PyMCsamples") ) [1] 1000 828 > > h5read(HDF5file,"chain0/PyMCsamples",index=list(2:3,c(1,2,4,5))) Error in h5read(HDF5file, "chain0/PyMCsamples", index = list(2:3, c(1, : length of index has to be equal to dimensional extension of HDF5 dataset. > > -----Original Message----- From: Nathaniel Hayden [mailto:nhayden@fhcrc.org] Sent: January-14-14 1:12 PM To: Hajas, Wayne; bioconductor at r-project.org Subject: Re: [BioC] Reading by column Hi, Wayne. Did you find what you needed for your subsetting scenario? I'm not very familiar with rhdf5 myself, but the documentation addresses subsetting for reads and writes by any number of dimensions, using the index argument. See section 3.3 of the rhdf5 vignette. On 01/08/2014 12:09 PM, Hajas, Wayne wrote: > This is likely a simple question - but I couldn't find a similar > problem in the archives. > > > > I am trying to use rhdr5 to read a .hdf5 file of a pre-determined > structure. My problem is that I am generating a table of values that > is going to grow very large. So far, I can only figure out how to > read the entire table at once. Eventually, I expect my table to be > 800 x 100000 so I will need to be able to go one column at a time. > > > > Here is an example. > > > >> typeof(h5read(HDF5file,"chain0/PyMCsamples")) > [1] "list" > > > > One of the columns in the dataframe (elements in the list) is named > 'deviance'. > > > >> length(h5read(HDF5file,"chain0/PyMCsamples")$deviance) > [1] 47 > > > > I would like to be able to do something like: > > > >> h5read(HDF5file,"chain0/PyMCsamples/deviance") > Error in h5read(HDF5file, "chain0/PyMCsamples/deviance") : > > Object chain0/PyMCsamples/deviance does not exist in this HDF5 file. > > > > Can anyone point me in the right direction? > > Thanks very much, > > Wayne Hajas > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > >

ADD COMMENT • link 10.9 years ago Hajas, Wayne ▴ 30

Login before adding your answer.