bsseq combineList: how to get an HDF5Array-backed object as return value?
1
0
Entering edit mode
Martin Aryee ▴ 20
@martin-aryee-5585
Last seen 4.3 years ago
Boston, MA, USA

I'm using HDF5Array-backed BSseq objects. I'd like to combine several of them into a new HDF5Array-backed object using combineList. I'd like to know 1) how to get a disk-backed BSseq object returned from combineList and 2) how to specify the file to be used. Below is an example of what I'm doing. Note that the output of combineList(hdf5_BS1, hdf5_BS2) is an in-memory object.

Thanks,

Martin

 

> library(bsseq)
> library(HDF5Array)
> 
> M <- matrix(0:8, 3, 3)
> Cov <- matrix(1:9, 3, 3)
> 
> hdf5_M <- writeHDF5Array(M)
> hdf5_Cov <- writeHDF5Array(Cov)
> hdf5_BS1 <- BSseq(chr = c("chr1", "chr2", "chr1"),
+                   pos = c(1, 2, 3),
+                   M = hdf5_M,
+                   Cov = hdf5_Cov,
+                   sampleNames = c("A", "B", "C"))
> hdf5_BS1
An object of type 'BSseq' with
  3 methylation loci
  3 samples
has not been smoothed
Some assays are HDF5Array-backed
> hdf5_BS2 <- BSseq(chr = c("chr1", "chr1", "chr1"),
+                   pos = c(3, 4, 5),
+                   M = hdf5_M,
+                   Cov = hdf5_Cov,
+                   sampleNames = c("D", "E", "F"))
> hdf5_BS2
An object of type 'BSseq' with
  3 methylation loci
  3 samples
has not been smoothed
Some assays are HDF5Array-backed
> 
> combineList(hdf5_BS1, hdf5_BS2)
An object of type 'BSseq' with
  5 methylation loci
  6 samples
has not been smoothed
All assays are in-memory
> 
> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.5

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] HDF5Array_1.4.8            rhdf5_2.20.0               bsseq_1.12.1              
 [4] SummarizedExperiment_1.6.3 DelayedArray_0.2.7         matrixStats_0.52.2        
 [7] Biobase_2.36.2             GenomicRanges_1.28.3       GenomeInfoDb_1.12.1       
[10] IRanges_2.10.2             S4Vectors_0.14.3           BiocGenerics_0.22.0       

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.11            XVector_0.16.0          zlibbioc_1.22.0         munsell_0.4.3          
 [5] colorspace_1.3-2        lattice_0.20-35         plyr_1.8.4              tools_3.4.0            
 [9] grid_3.4.0              data.table_1.10.4       R.oo_1.21.0             gtools_3.5.0           
[13] permute_0.9-4           Matrix_1.2-10           GenomeInfoDbData_0.99.0 R.utils_2.5.0          
[17] bitops_1.0-6            RCurl_1.95-4.8          limma_3.32.2            compiler_3.4.0         
[21] R.methodsS3_1.7.1       scales_0.4.1            locfit_1.5-9.1         
>
bsseq • 853 views
ADD COMMENT
2
Entering edit mode
Peter Hickey ▴ 470
@petehaitch
Last seen 21 hours ago
Walter and Eliza Hall Institute of Medi…

Hi Martin,

combineList(hdf5_BS1, hdf5_BS2, BACKEND = "HDF5Array") will give you a HDF5Array-backed BSseq object. It's somewhat buried in the docs, but see ?combineList. 

Currently, the default location for HDF5 files is specified via HDF5Array::setHDF5DumpDir(). You can check the current dump dir (by default, a temp dir) using HDF5Array::getHDF5DumpDir(). Using combineList() you'll end up with one HDF5 file per assay, per sample (I'm 95% sure on this).

Once you've made your final object, you may want to simplify the number of HDF5 files and datasets backing the object. The easiest way to do this, and what I'd recommend, is using SummarizedExperiment::saveHDF5SummarizedExperiment(). Alternatively, you can run DelayedArray::realize() on each assay. Please see the docs of these functions for control over where they write their output files. Both these options will write yet more HDF5 file(s), but 'consolidate' intermediate files made along the way.

I plan to write a vignette documenting best practises for using HDF5-backed BSseq objects, but it's a bit of a moving target at the moment as we figure things out. I'd certainly appreciate any feedback and suggestions you have from your experience.

Cheers,

Pete

ADD COMMENT
0
Entering edit mode

Thanks! That works exactly as expected. I now have the combined HDF5Array-backed BSseq object. I'll report back on my experience working with these objects.

ADD REPLY
0
Entering edit mode
We should make this (and other) function(s) auto-guess the backend of the output. Obv. the default when input is all HDF5 should be HDF5 and likewise if all input is in-memory matrices. Should have capability of overriding of course. On Sat, Jun 10, 2017 at 1:20 AM, Martin Aryee [bioc] < noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org > > User Martin Aryee <https: support.bioconductor.org="" u="" 5585=""/> wrote Comment: > bsseq combineList: how to get an HDF5Array-backed object as return value? > <https: support.bioconductor.org="" p="" 96658="" #96914="">: > > Thanks! That works exactly as expected. I now have the > combined HDF5Array-backed BSseq object. I'll report back on my experience > working with these objects. > > ------------------------------ > > Post tags: bsseq > > You may reply via email or visit https://support.bioconductor. > org/p/96658/#96914 >
ADD REPLY

Login before adding your answer.

Traffic: 271 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6