Search
Question: Load saved RangedSummarizedExperiment as Deseq2 dataset
0
gravatar for Jakob Petereit
3 months ago by
UWA Perth
Jakob Petereit0 wrote:

Hi,

 

I am currently analyzing some RNAseq data. I worked my way through the RNAseq gene level workflow (https://www.bioconductor.org/help/workflows/rnaseqGene/

And most things work well, and I will get it done in some time. The only really annoying thing is, that I cannot save and load my RangedSummarizedExperiment.  Here is the problematic code with my description:

se <- summarizeOverlaps(features=ebg, reads=bamfiles,
                        mode="Union",
                        singleEnd=F,
                        ignore.strand=TRUE,
                        fragments=T)
colData(se) <- DataFrame(sampleTable)
colData(se)

se
class: RangedSummarizedExperiment 
dim: 32833 70 
metadata(0):
assays(1): counts
rownames(32833): AT1G01010 AT1G01020 ... ATMG09960 ATMG09980
rowData names(0):
colnames(70): 1_1 1_2 ... 36_1 36_2
colData names(7): ID rep ... ds.DNA.Concentration..nM. SampleName

This works, summarizing overlaps takes about 1-2 hours if I run it locally on 4 cores and 16gb RAM. I can load this SummarizedExperiment into Deseq2.

DESeqDataSet(se, ~ Genotype + SA.treatment + Genotype:SA.treatment, ignoreRank = T)
class: DESeqDataSet 
dim: 32833 70 
metadata(1): version
assays(1): counts
rownames(32833): AT1G01010 AT1G01020 ... ATMG09960 ATMG09980
rowData names(0):
colnames(70): 1_1 1_2 ... 36_1 36_2
colData names(7): ID rep ... ds.DNA.Concentration..nM. SampleName

However, I would really like to save this SummarizedExperiment and load it later with Deseq2. I found that the only package supporting this is SummarizedExperiment with the dependency HDF5Array. I can save and load SummarizedExperiments with that and the SummarizedExperiment looks identical:

saveHDF5SummarizedExperiment(se, dir='data/se', replace=T,level=NULL, verbose=T)
Start writing assay 1/1 to 'data/se/assays.h5':
Finished writing assay 1/1 to 'data/se/assays.h5'.
se_load <- loadHDF5SummarizedExperiment('data/se/')
se_load
class: RangedSummarizedExperiment 
dim: 32833 70 
metadata(0):
assays(1): counts
rownames(32833): AT1G01010 AT1G01020 ... ATMG09960 ATMG09980
rowData names(0):
colnames(70): 1_1 1_2 ... 36_1 36_2
colData names(7): ID rep ... ds.DNA.Concentration..nM. SampleName

But the error shows, once I wanna load the loaded SummarizedExperiemnt into Deseq2:

DESeqDataSet(se_load, ~ Genotype + SA.treatment + Genotype:SA.treatment, ignoreRank = T)
converting counts to integer mode
Error in all_dims[, 1L] : incorrect number of dimensions

I read quite a bit on the Error, but I cannot find a real reference to someone dealing with it. I assume its something in the writeHDF5Array arguments, which somehow interfere with DESeq2, but that is just over my current skill level...

Any suggestions ? 

sessionInfo()
R version 3.4.1 Patched (2017-07-07 r72904)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_Australia.1252  LC_CTYPE=English_Australia.1252   
[3] LC_MONETARY=English_Australia.1252 LC_NUMERIC=C                      
[5] LC_TIME=English_Australia.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] pheatmap_1.0.8             bindrcpp_0.2               HDF5Array_1.4.8           
 [4] rhdf5_2.20.0               readr_1.1.1                DESeq2_1.16.1             
 [7] dplyr_0.7.2                BiocParallel_1.10.1        GenomicAlignments_1.12.1  
[10] GenomicFeatures_1.28.4     AnnotationDbi_1.38.1       Rsamtools_1.28.0          
[13] Biostrings_2.44.1          XVector_0.16.0             airway_0.110.0            
[16] SummarizedExperiment_1.6.3 DelayedArray_0.2.7         matrixStats_0.52.2        
[19] Biobase_2.36.2             GenomicRanges_1.28.4       GenomeInfoDb_1.12.2       
[22] IRanges_2.10.2             S4Vectors_0.14.3           BiocGenerics_0.22.0       

loaded via a namespace (and not attached):
 [1] bit64_0.9-7             splines_3.4.1           Formula_1.2-2          
 [4] assertthat_0.2.0        latticeExtra_0.6-28     blob_1.1.0             
 [7] GenomeInfoDbData_0.99.0 RSQLite_2.0             backports_1.1.0        
[10] lattice_0.20-35         glue_1.1.1              digest_0.6.12          
[13] RColorBrewer_1.1-2      checkmate_1.8.3         colorspace_1.3-2       
[16] htmltools_0.3.6         Matrix_1.2-10           plyr_1.8.4             
[19] XML_3.98-1.9            pkgconfig_2.0.1         biomaRt_2.32.1         
[22] genefilter_1.58.1       zlibbioc_1.22.0         xtable_1.8-2           
[25] snow_0.4-2              scales_0.4.1            annotate_1.54.0        
[28] tibble_1.3.3            htmlTable_1.9           ggplot2_2.2.1          
[31] nnet_7.3-12             lazyeval_0.2.0          survival_2.41-3        
[34] magrittr_1.5            memoise_1.1.0           foreign_0.8-69         
[37] data.table_1.10.4       tools_3.4.1             hms_0.3                
[40] stringr_1.2.0           locfit_1.5-9.1          munsell_0.4.3          
[43] cluster_2.0.6           compiler_3.4.1          rlang_0.1.1            
[46] grid_3.4.1              RCurl_1.95-4.8          htmlwidgets_0.9        
[49] bitops_1.0-6            base64enc_0.1-3         gtable_0.2.0           
[52] DBI_0.7                 R6_2.2.2                gridExtra_2.2.1        
[55] knitr_1.16              rtracklayer_1.36.4      bit_1.1-12             
[58] bindr_0.1               Hmisc_4.0-3             stringi_1.1.5          
[61] Rcpp_0.12.12            geneplotter_1.54.0      rpart_4.1-11           
[64] acepack_1.4.1   
ADD COMMENTlink modified 3 months ago • written 3 months ago by Jakob Petereit0
0
gravatar for Michael Love
3 months ago by
Michael Love14k
United States
Michael Love14k wrote:

Unrelated but you shouldn't use ignoreRank=TRUE, this is just for other packages not for end users. You will just miss out of useful error messages and get more confusing error messages later. Why don't you save the se with save() and load() instead? Does this work?

ADD COMMENTlink modified 3 months ago • written 3 months ago by Michael Love14k
0
gravatar for Jakob Petereit
3 months ago by
UWA Perth
Jakob Petereit0 wrote:

It somehow works with save.image, to mirror my work space and then load it later on again.

It would be still good to investigate the error or find a better way to save and load the SummarizedExperiment. 

I plan to source the summary of the overlaps out to our in house unix-cluster, as summarizeOverlaps is quite resource intense and a working save/load function will be necessary to get this to work (dunno if i can export the work space from the server R to my local R).

 

A little bit off the topic, summarizing the Overlaps from BAM files seems to be the bottleneck of this workflow. If you use STAR to map and align the fastq files to the genome, it already creates raw counts for each file. It would be probably a lot quicker if we just merge the raw-counts for each file and add some meta data, but I don't know how complicated it would be to feed that to DESeq.

Cheers for you quick answer anyway :)

ADD COMMENTlink written 3 months ago by Jakob Petereit0

Can you not do this using the standard R approaches, e.g.,

example(RangedSummarizedExperiment)  # generate 'rse', an example
fl = tempfile()
saveRDS(rse, fl)

and then later

rse1 = readRDS(fl)
ADD REPLYlink written 3 months ago by Martin Morgan ♦♦ 20k

re: merging counts for each file: you can either do this manually in R and use DESeqDataSetFromMatrix, or you can use DESeqDataSetFromHTSeqCount, which will read in count files in the htseq format (first column is gene name, second column is count, no header). See ?DESeqDataSet

ADD REPLYlink written 3 months ago by Michael Love14k
0
gravatar for Jakob Petereit
3 months ago by
UWA Perth
Jakob Petereit0 wrote:

sweet, saveRDS works indeed very nicely here.

I'll try the DESeqDataSetFromMatrix and DESeqDataSetFromHTSeqCount later on

 

Cheers for the help <3

 

 

ADD COMMENTlink written 3 months ago by Jakob Petereit0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 319 users visited in the last hour