I am trying to follow a vignette for finding enriched pathways. As part of that pipeline, I would like to calculate the differentially expressed genes using DESeq2. Initially, I am working with a Seurat Object, as such I run the following code to achieve what I would like to do:
exp_mtx <- seurat_exp_mtx@assays$RNA@counts # To generate the raw counts from the Seurat Object
exp_design <- as.data.frame(matrix(NA,length(colnames(exp_mtx)),1))
names(exp_design) <- c("cluster")
row.names(exp_design) <- seurat_exp_mtx@meta.data$cells.
exp_design$cluster <- seurat_exp_mtx@meta.data$clusters # To generate the colData
dds_cortex <- DESeqDataSetFromMatrix(countData = as.matrix(exp_mtx),
colData = exp_design, design = ~ cluster)
However, when trying to do so, I run into the following error:
"Error in `assays<-`(`*tmp*`, withDimnames = withDimnames, ..., value = `*vtmp*`) :
please use 'assay(x, withDimnames=FALSE)) <- value' or 'assays(x, withDimnames=FALSE)) <- value' when the dimnames on the supplied assay(s) are not identical to the
dimnames on RangedSummarizedExperiment object 'x'"
I checked the dimnames for both the exp_mtx
and exp_design
and noticed that one was named only 1 and 2, whereas the other was named cells and features and as such, I ran the following:
names(dimnames(exp_design)) <- c("cells", "cluster")
Unfortunately, it had no effect. I have checked and the CellIDs are the same. This is also to be expected considering that they were both extracted from the same initial file. I have also checked the documentation for the function to check that I've made the colData correctly, and it states "for matrix input: a DataFrame or data.frame with at least a single column. Rows of colData correspond to columns of countData" (https://www.rdocumentation.org/packages/DESeq2/versions/1.12.3/topics/DESeqDataSet-class) which is what I have.
I really have no idea about what it is that is causing the issue. Any suggestions would be greatly appreciated.
> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.7
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] grid parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] fgsea_1.16.0 VennDiagram_1.6.20 futile.logger_1.4.3 tibble_3.1.2 dplyr_1.0.7 DESeq2_1.30.1
[7] SummarizedExperiment_1.20.0 Biobase_2.50.0 MatrixGenerics_1.2.1 matrixStats_0.59.0 GenomicRanges_1.42.0 GenomeInfoDb_1.26.7
[13] IRanges_2.24.1 S4Vectors_0.28.1 BiocGenerics_0.36.1 SeuratObject_4.0.2
loaded via a namespace (and not attached):
[1] httr_1.4.2 pkgload_1.2.1 bit64_4.0.5 splines_4.0.3 assertthat_0.2.1 BiocManager_1.30.16 blob_1.2.1 GenomeInfoDbData_1.2.4
[9] yaml_2.2.1 remotes_2.4.0 sessioninfo_1.1.1 pillar_1.6.1 RSQLite_2.2.7 lattice_0.20-44 glue_1.4.2 RColorBrewer_1.1-2
[17] XVector_0.30.0 colorspace_2.0-2 Matrix_1.3-4 XML_3.99-0.6 pkgconfig_2.0.3 devtools_2.4.2 genefilter_1.72.1 zlibbioc_1.36.0
[25] purrr_0.3.4 xtable_1.8-4 scales_1.1.1 processx_3.5.2 BiocParallel_1.24.1 annotate_1.68.0 generics_0.1.0 ggplot2_3.3.5
[33] usethis_2.0.1 ellipsis_0.3.2 cachem_1.0.5 withr_2.4.2 cli_3.0.1 survival_3.2-11 magrittr_2.0.1 crayon_1.4.1
[41] memoise_2.0.0 ps_1.6.0 fs_1.5.0 fansi_0.5.0 pkgbuild_1.2.0 data.table_1.14.0 tools_4.0.3 prettyunits_1.1.1
[49] formatR_1.11 lifecycle_1.0.0 munsell_0.5.0 locfit_1.5-9.4 DelayedArray_0.16.3 lambda.r_1.2.4 AnnotationDbi_1.52.0 callr_3.7.0
[57] compiler_4.0.3 rlang_0.4.11 RCurl_1.98-1.3 bitops_1.0-7 testthat_3.0.4 gtable_0.3.0 DBI_1.1.1 R6_2.5.0
[65] gridExtra_2.3 fastmap_1.1.0 bit_4.0.4 utf8_1.2.1 fastmatch_1.1-0 rprojroot_2.0.2 futile.options_1.0.1 desc_1.3.0
[73] Rcpp_1.0.7 vctrs_0.3.8 geneplotter_1.68.0 tidyselect_1.1.1
Why don't you show the heads of your counts and coldata?
I could do that, my dataset is quite large so it omits alot.
head(exp_design)
yields:Whereas
as.matrix(exp_mtx[1:5, 1:5]
yields:I can also add that I've tried
identical(row.names(exp_design), colnames(cortex_mtx))
and that returnedDo you have a plan for size normalization? DEseq's method is not going to be happy with so many genes containing a zero. Also, your exp_mtx doesn't look right. "features" and "cells" should not be in there. Also, sometimes R has a problem with things that start with numbers, you might try prepending an "X" to your cell IDs
See the 2nd, 3rd and 4th bullets here
https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#recommendations-for-single-cell-analysis
The 1st bullet concerns ZI modeling which is not always needed, I think with UMI it’s not often performed now.
Thanks for pointing out that there is something wrong with the appearance of the exp_mtx, it seemed to be the root cause of my issue. I'm well aware of the problem with the geometric means. In my case, I changed the sftype to 'poscount' to test it. I'm mainly interested in getting everything to work in an exploratory fashion so it's acceptable if everything isn't optimized from the start.