Hi,
I recently updated GSVA to 1.24.0 and I'm still experiencing a crash when I run GSVA with bootstrapping. This is the code I ran, including with urls to RNA-Seq test data and gene set test data.
library(qusage) library(GSVA) library(ggplot2) library(reshape2) ### Input files ### Skin Cutaneous Melanoma RNA Seq from TCGA (http://www.cbioportal.org/study?id=skcm_tcga) ### From https://github.com/cBioPortal/datahub/blob/master/public/skcm_tcga.tar.gz skcm_expr_file <- "/Users/sander/Desktop/test_gsva/data_RNA_Seq_v2_expression_median.txt" ### Hallmark gene sets from MSigDB ### From http://software.broadinstitute.org/gsea/downloads.jsp genesets_file <- "/Users/sander/Desktop/test_gsva/h.all.v5.2.entrez.gmt" ### Parameters n_bootstraps <- 2 ### Read gene sets hallmark_genesets <- read.gmt(genesets_file) ### Read expression skcm_inp <- read.table(skcm_expr_file, header = T, sep = "\t", quote = "", fill = T, check.names = F) skcm_inp$Entrez_Gene_Id <- as.factor(skcm_inp$Entrez_Gene_Id) ### Remove NA entrez IDs skcm_expr_not_na <- skcm_inp[complete.cases(skcm_inp),] ### Remove dup skcm_expr_not_dub <- skcm_expr_not_na[!duplicated(skcm_expr_not_na$Entrez_Gene_Id),] ### Create only expr data table skcm_expr <- as.matrix(skcm_expr_not_dub[,3:ncol(skcm_expr_not_dub)]) rownames(skcm_expr) <- skcm_expr_not_dub$Entrez_Gene_Id ### Plot distrubution of reads skcm_expr_m <- melt(skcm_expr) colnames(skcm_expr_m) <- c("Gene", "Sample", "Expression") ggplot(skcm_expr_m, aes(x = Expression)) + geom_histogram() ### Log2 normalize skcm_expr_norm <- log2(skcm_expr + 1) ### Plot distrubution of reads skcm_expr_norm_m <- melt(skcm_expr_norm) colnames(skcm_expr_norm_m) <- c("Gene", "Sample", "Log2_Expression") ggplot(skcm_expr_norm_m, aes(x = Log2_Expression)) + geom_histogram() ### Run GSVA gsva_result <- gsva(skcm_expr_norm, hallmark_genesets, method = "gsva") gsva_scores <- data.frame("geneset_id" = rownames(gsva_result$es.obs), gsva_result$es.obs, check.names = F) print(gsva_scores[1:5,1:5]) ### Run GSVA with bootstrapping (THIS CRASHES) gsva_result <- gsva(skcm_expr_norm, hallmark_genesets, method = "gsva", no.bootstraps = n_bootstraps)
SessionInfo():
R version 3.4.0 (2017-04-21) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS Sierra 10.12.3 Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] reshape2_1.4.2 ggplot2_2.2.1 GSVA_1.24.0 qusage_2.10.0 limma_3.32.1 loaded via a namespace (and not attached): [1] zoo_1.8-0 splines_3.4.0 lattice_0.20-35 colorspace_1.3-2 htmltools_0.3.6 stats4_3.4.0 yaml_2.1.14 [8] survival_2.41-3 XML_3.98-1.6 DBI_0.6-1 BiocGenerics_0.22.0 multcomp_1.4-6 plyr_1.8.4 stringr_1.2.0 [15] lsmeans_2.25-5 munsell_0.4.3 gtable_0.2.0 mvtnorm_1.0-6 codetools_0.2-15 coda_0.19-1 memoise_1.1.0 [22] evaluate_0.10 Biobase_2.36.0 knitr_1.15.1 IRanges_2.10.0 parallel_3.4.0 AnnotationDbi_1.38.0 TH.data_1.0-8 [29] GSEABase_1.38.0 Rcpp_0.12.10 xtable_1.8-2 scales_0.4.1 backports_1.0.5 S4Vectors_0.14.0 graph_1.54.0 [36] annotate_1.54.0 digest_0.6.12 stringi_1.1.5 grid_3.4.0 rprojroot_1.2 tools_3.4.0 bitops_1.0-6 [43] sandwich_2.3-4 magrittr_1.5 RCurl_1.95-4.8 lazyeval_0.2.0 RSQLite_1.1-2 tibble_1.3.0 MASS_7.3-47 [50] Matrix_1.2-10 estimability_1.2 rmarkdown_1.5 nlme_3.1-131 compiler_3.4.0
Thanks Robert for the excellent solution and advice. This also explains why I didn't experience the issue with a dataset that was based on a different kind of data.