Hi,
I am currently analyzing some RNAseq data. I worked my way through the RNAseq gene level workflow (https://www.bioconductor.org/help/workflows/rnaseqGene/)
And most things work well, and I will get it done in some time. The only really annoying thing is, that I cannot save and load my RangedSummarizedExperiment. Here is the problematic code with my description:
se <- summarizeOverlaps(features=ebg, reads=bamfiles, mode="Union", singleEnd=F, ignore.strand=TRUE, fragments=T) colData(se) <- DataFrame(sampleTable) colData(se) se class: RangedSummarizedExperiment dim: 32833 70 metadata(0): assays(1): counts rownames(32833): AT1G01010 AT1G01020 ... ATMG09960 ATMG09980 rowData names(0): colnames(70): 1_1 1_2 ... 36_1 36_2 colData names(7): ID rep ... ds.DNA.Concentration..nM. SampleName
This works, summarizing overlaps takes about 1-2 hours if I run it locally on 4 cores and 16gb RAM. I can load this SummarizedExperiment into Deseq2.
DESeqDataSet(se, ~ Genotype + SA.treatment + Genotype:SA.treatment, ignoreRank = T) class: DESeqDataSet dim: 32833 70 metadata(1): version assays(1): counts rownames(32833): AT1G01010 AT1G01020 ... ATMG09960 ATMG09980 rowData names(0): colnames(70): 1_1 1_2 ... 36_1 36_2 colData names(7): ID rep ... ds.DNA.Concentration..nM. SampleName
However, I would really like to save this SummarizedExperiment and load it later with Deseq2. I found that the only package supporting this is SummarizedExperiment with the dependency HDF5Array. I can save and load SummarizedExperiments with that and the SummarizedExperiment looks identical:
saveHDF5SummarizedExperiment(se, dir='data/se', replace=T,level=NULL, verbose=T) Start writing assay 1/1 to 'data/se/assays.h5': Finished writing assay 1/1 to 'data/se/assays.h5'. se_load <- loadHDF5SummarizedExperiment('data/se/') se_load class: RangedSummarizedExperiment dim: 32833 70 metadata(0): assays(1): counts rownames(32833): AT1G01010 AT1G01020 ... ATMG09960 ATMG09980 rowData names(0): colnames(70): 1_1 1_2 ... 36_1 36_2 colData names(7): ID rep ... ds.DNA.Concentration..nM. SampleName
But the error shows, once I wanna load the loaded SummarizedExperiemnt into Deseq2:
DESeqDataSet(se_load, ~ Genotype + SA.treatment + Genotype:SA.treatment, ignoreRank = T) converting counts to integer mode Error in all_dims[, 1L] : incorrect number of dimensions
I read quite a bit on the Error, but I cannot find a real reference to someone dealing with it. I assume its something in the writeHDF5Array arguments, which somehow interfere with DESeq2, but that is just over my current skill level...
Any suggestions ?
sessionInfo() R version 3.4.1 Patched (2017-07-07 r72904) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows >= 8 x64 (build 9200) Matrix products: default locale: [1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1252 [3] LC_MONETARY=English_Australia.1252 LC_NUMERIC=C [5] LC_TIME=English_Australia.1252 attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods base other attached packages: [1] pheatmap_1.0.8 bindrcpp_0.2 HDF5Array_1.4.8 [4] rhdf5_2.20.0 readr_1.1.1 DESeq2_1.16.1 [7] dplyr_0.7.2 BiocParallel_1.10.1 GenomicAlignments_1.12.1 [10] GenomicFeatures_1.28.4 AnnotationDbi_1.38.1 Rsamtools_1.28.0 [13] Biostrings_2.44.1 XVector_0.16.0 airway_0.110.0 [16] SummarizedExperiment_1.6.3 DelayedArray_0.2.7 matrixStats_0.52.2 [19] Biobase_2.36.2 GenomicRanges_1.28.4 GenomeInfoDb_1.12.2 [22] IRanges_2.10.2 S4Vectors_0.14.3 BiocGenerics_0.22.0 loaded via a namespace (and not attached): [1] bit64_0.9-7 splines_3.4.1 Formula_1.2-2 [4] assertthat_0.2.0 latticeExtra_0.6-28 blob_1.1.0 [7] GenomeInfoDbData_0.99.0 RSQLite_2.0 backports_1.1.0 [10] lattice_0.20-35 glue_1.1.1 digest_0.6.12 [13] RColorBrewer_1.1-2 checkmate_1.8.3 colorspace_1.3-2 [16] htmltools_0.3.6 Matrix_1.2-10 plyr_1.8.4 [19] XML_3.98-1.9 pkgconfig_2.0.1 biomaRt_2.32.1 [22] genefilter_1.58.1 zlibbioc_1.22.0 xtable_1.8-2 [25] snow_0.4-2 scales_0.4.1 annotate_1.54.0 [28] tibble_1.3.3 htmlTable_1.9 ggplot2_2.2.1 [31] nnet_7.3-12 lazyeval_0.2.0 survival_2.41-3 [34] magrittr_1.5 memoise_1.1.0 foreign_0.8-69 [37] data.table_1.10.4 tools_3.4.1 hms_0.3 [40] stringr_1.2.0 locfit_1.5-9.1 munsell_0.4.3 [43] cluster_2.0.6 compiler_3.4.1 rlang_0.1.1 [46] grid_3.4.1 RCurl_1.95-4.8 htmlwidgets_0.9 [49] bitops_1.0-6 base64enc_0.1-3 gtable_0.2.0 [52] DBI_0.7 R6_2.2.2 gridExtra_2.2.1 [55] knitr_1.16 rtracklayer_1.36.4 bit_1.1-12 [58] bindr_0.1 Hmisc_4.0-3 stringi_1.1.5 [61] Rcpp_0.12.12 geneplotter_1.54.0 rpart_4.1-11 [64] acepack_1.4.1
Can you not do this using the standard R approaches, e.g.,
and then later
re: merging counts for each file: you can either do this manually in R and use DESeqDataSetFromMatrix, or you can use DESeqDataSetFromHTSeqCount, which will read in count files in the htseq format (first column is gene name, second column is count, no header). See ?DESeqDataSet