I'm using Coseq to cluster my RNAseq data processed with DESeq. There is a similar error mentioned in coseq plotting error: no method for coercing this S4 class to a vector but I tried to start to finish without loading .Rdata
. I am getting an error in the plot
function and can't seem to find a fix for it.
set.seed(1)
DE_CUF_runLogCLR <- coseq(DEseq_DE_CUF_normalized, K=2:25, transformation="logclr",norm="none",
meanFilterCutoff=10, model="kmeans",
nstart=1, iter.max=15)
###
****************************************
coseq analysis: kmeans approach & logclr transformation
K = 2 to 25
Use set.seed() prior to running coseq for reproducible results.
****************************************
Running K = 2 ...
Running K = 3 ...
Running K = 4 ...
Running K = 5 ...
Running K = 6 ...
Running K = 7 ...
Running K = 8 ...
Running K = 9 ...
Running K = 10 ...
Running K = 11 ...
Running K = 12 ...
Running K = 13 ...
Running K = 14 ...
Running K = 15 ...
Running K = 16 ...
Running K = 17 ...
Running K = 18 ...
Running K = 19 ...
Running K = 20 ...
Running K = 21 ...
Running K = 22 ...
Running K = 23 ...
Running K = 24 ...
Running K = 25 ...
summary(DE_CUF_runLogCLR)
*************************************************
K-means algorithm
Transformation: logclr
*************************************************
Clusters fit: 2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25
Clusters with errors: 4,6,7,8,9,10,12,13,14,15,16,17,18,19,20,21,22,23,24,25
Selected number of clusters via capushe: 10
*************************************************
Number of clusters = 10
*************************************************
Cluster sizes:
Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 Cluster 6 Cluster 7 Cluster 8 Cluster 9 Cluster 10
132 2200 107 802 28 2361 275 11 74 24
plot(DE_CUF_runLogCLR, graphs="boxplots")
Error in as.vector(x, mode = "numeric") : no method for coercing this S4 class to a vector
> sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.6 LTS
Matrix products: default
BLAS: /usr/lib/openblas-base/libblas.so.3
LAPACK: /usr/lib/libopenblasp-r0.2.18.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
[6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] grid parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] circlize_0.4.13 gridExtra_2.3 sva_3.35.2 genefilter_1.68.0 mgcv_1.8-38
[6] nlme_3.1-155 DESeq2_1.26.0 ggvenn_0.1.9 GGally_2.1.2 ggrepel_0.9.1
[11] plotly_4.10.0 corrplot_0.92 ComplexHeatmap_2.2.0 forcats_0.5.1 tidyverse_1.3.1
[16] purrr_0.3.4 ggplot2_3.3.5 stringr_1.4.0 readr_2.1.1 tidyr_1.1.4
[21] dplyr_1.0.7 tibble_3.1.6 statmod_1.4.36 coseq_1.10.0 SummarizedExperiment_1.16.1
[26] DelayedArray_0.12.3 BiocParallel_1.20.1 matrixStats_0.61.0 Biobase_2.46.0 GenomicRanges_1.38.0
[31] GenomeInfoDb_1.22.1 IRanges_2.20.2 S4Vectors_0.24.4 BiocGenerics_0.32.0 edgeR_3.28.1
[36] limma_3.42.2
loaded via a namespace (and not attached):
[1] readxl_1.3.1 backports_1.4.1 Hmisc_4.6-0 plyr_1.8.6 HTSCluster_2.0.8 lazyeval_0.2.2
[7] splines_3.6.0 digest_0.6.29 htmltools_0.5.2 fansi_1.0.2 magrittr_2.0.2 checkmate_2.0.0
[13] memoise_2.0.1 cluster_2.1.2 tzdb_0.2.0 annotate_1.64.0 modelr_0.1.8 bayesm_3.1-4
[19] jpeg_0.1-9 colorspace_2.0-2 blob_1.2.2 rvest_1.0.2 haven_2.4.3 xfun_0.29
[25] crayon_1.4.2 RCurl_1.98-1.5 jsonlite_1.7.3 survival_3.2-13 glue_1.6.1 gtable_0.3.0
[31] zlibbioc_1.32.0 XVector_0.26.0 compositions_2.0-4 GetoptLong_1.0.5 shape_1.4.6 DEoptimR_1.0-10
[37] scales_1.1.1 DESeq_1.38.0 mvtnorm_1.1-3 DBI_1.1.2 Rcpp_1.0.8 plotrix_3.8-2
[43] viridisLite_0.4.0 xtable_1.8-4 htmlTable_2.4.0 clue_0.3-60 foreign_0.8-71 bit_4.0.4
[49] proxy_0.4-26 Formula_1.2-4 htmlwidgets_1.5.4 httr_1.4.2 RColorBrewer_1.1-2 ellipsis_0.3.2
[55] reshape_0.8.8 pkgconfig_2.0.3 XML_3.99-0.3 nnet_7.3-17 dbplyr_2.1.1 locfit_1.5-9.4
[61] utf8_1.2.2 tidyselect_1.1.1 rlang_0.4.12 AnnotationDbi_1.48.0 munsell_0.5.0 cellranger_1.1.0
[67] tools_3.6.0 cachem_1.0.6 cli_3.1.1 generics_0.1.1 RSQLite_2.2.9 broom_0.7.11
[73] evaluate_0.14 fastmap_1.1.0 yaml_2.2.2 knitr_1.37 bit64_4.0.5 fs_1.5.2
[79] robustbase_0.93-9 xml2_1.3.3 compiler_3.6.0 rstudioapi_0.13 png_0.1-7 e1071_1.7-9
[85] reprex_2.0.1 geneplotter_1.64.0 stringi_1.7.6 lattice_0.20-45 Matrix_1.4-0 tensorA_0.36.2
[91] vctrs_0.3.8 pillar_1.6.5 capushe_1.1.1 lifecycle_1.0.1 BiocManager_1.30.16 HTSFilter_1.26.0
[97] GlobalOptions_0.1.2 data.table_1.14.2 bitops_1.0-7 Rmixmod_2.1.6 R6_2.5.1 latticeExtra_0.6-29
[103] MASS_7.3-55 assertthat_0.2.1 rjson_0.2.20 withr_2.4.3 GenomeInfoDbData_1.2.2 hms_1.1.1
[109] rpart_4.1-15 class_7.3-20 rmarkdown_2.11 lubridate_1.8.0 base64enc_0.1-3
Hi @ajwijeratne1, thanks for your question. I'll need a little bit more input before I can answer your question since I can't reproduce your error on my side.
First, I'm surprised to see so many clusters with errors -- as shown in the output of
summary(DE_CUF_runLogCLR)
, very few clusters had no errors (it looks like only K = 2,3,5,11 completed without error). This point isn't very well documented in the vignette, but the models with different cluster numbers are reported as having an error if theifault
flag is raised inkmeans
. This makes me wonder whether the input you provided tocoseq
(DEseq_DE_CUF_normalized
) is well-adapted to using thelogclr
transformation.Could you confirm what the analysis pipeline was leading up to the creation of
DEseq_DE_CUF_normalized
? What do the data look like? How many genes and samples? Based on the name, I am assuming you used DESeq2 to perform normalization and differential analyses -- are these normalized counts or log-normalized counts?Also, I noticed that you're using an older version of coseq, the current version is 1.18.0. Do you have the possibility to update your version of R and use the latest Bioconductor version?
Best, AR
Thanks for your reply, Andrea! I have updated the package and the error for plotting has gone now - thank you very much! For the cluster number error, I have used EdgeR to normalized data (TMM). However, these are differentially expressed genes using DEseq2. I have 13 samples three treatments and two batches. I have used Combat_seq to remove batch effect prior to differential gene expression call. If it is easier, I can send you the dataset via email to take a look.
Great news for the plot!
Are you still getting a lot of clusters with errors even with the new version of the package? And if so, is the same thing happening with a different transformation (e.g.,
transformation = "clr"
?) If so, it would be great if you could forward the dataset via email so that I can do some debugging and try to understand what is going on.Yes, tried different clustering and also dropped one batch, but still the same with errors in clusters. I will send my code and dataset via email.
After an exchange by email, it seems that the K-means algorithm just needed to run for a larger number of iterations to converge (you used
iter.max = 10
, which I had used in the vignette simply to shorten run times in the illustration). By default, coseq caps the maximum number of iterations atiter.max = 50
, and the following code runs for me with far fewer cluster errors: