GSVA Error in .mapGeneSetsToFeatures(mapped.gset.idx.list, rownames(expr))
1
0
Entering edit mode
Siyuan • 0
@siyuan-24740
Last seen 3.7 years ago

Hi,

I have a problem with GSVA. I tried to run Msigdb (KEGG gene set) in the GSVA using microarray data. This code worked several days ago but today when I ran it again, there was an error showed below:


 #head(exp,5)
              sample1       samples2   sample3  sample4    sample5      sample6
A1CF     2.589551   2.656472   2.524491   2.748733   2.423472   2.618552
A2M     10.299896   9.196994   8.912481   9.664004   9.301919   9.829284
A2ML1    2.870450   3.084727   3.044007   3.166133   3.211959   3.292066
A4GALT   4.173940   5.132295   4.348393   4.229899   4.569535   4.087214


library(GSEABase)
library(GSVA)

msigdb_GMTs <- "msigdb_v7.2_GMTs"
msigdb <- "c2.cp.kegg.v7.2.symbols.gmt"

geneset <- getGmt(file.path(msigdb_GMTs, msigdb))  

es.max <- gsva(exp, geneset, 
               mx.diff=FALSE, verbose=FALSE, 
               parallel.sz=1)

# Error in .mapGeneSetsToFeatures(mapped.gset.idx.list, rownames(expr)) : 
#  No identifiers in the gene sets could be matched to the identifiers in the expression data. 

sessionInfo( )
R version 4.0.3 (2020-10-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 10.16

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] forcats_0.5.1        stringr_1.4.0        dplyr_1.0.4         
 [4] purrr_0.3.4          readr_1.4.0          tidyr_1.1.2         
 [7] tibble_3.0.6         ggplot2_3.3.3        tidyverse_1.3.0     
[10] devtools_2.3.2       usethis_2.0.0        GSVA_1.39.16        
[13] GSEABase_1.52.1      graph_1.68.0         annotate_1.68.0     
[16] XML_3.99-0.5         AnnotationDbi_1.52.0 IRanges_2.24.1      
[19] S4Vectors_0.28.1     Biobase_2.50.0       BiocGenerics_0.36.0 

loaded via a namespace (and not attached):
 [1] bitops_1.0-6                matrixStats_0.58.0         
 [3] fs_1.5.0                    lubridate_1.7.9.2          
 [5] bit64_4.0.5                 httr_1.4.2                 
 [7] rprojroot_2.0.2             GenomeInfoDb_1.26.2        
 [9] tools_4.0.3                 backports_1.2.1            
[11] R6_2.5.0                    DBI_1.1.1                  
[13] colorspace_2.0-0            withr_2.4.1                
[15] tidyselect_1.1.0            prettyunits_1.1.1          
[17] processx_3.4.5              bit_4.0.4                  
[19] curl_4.3                    compiler_4.0.3             
[21] rvest_0.3.6                 cli_2.3.0                  
[23] xml2_1.3.2                  desc_1.2.0                 
[25] DelayedArray_0.16.1         scales_1.1.1               
[27] callr_3.5.1                 XVector_0.30.0             
[29] pkgconfig_2.0.3             sessioninfo_1.1.1          
[31] MatrixGenerics_1.2.1        dbplyr_2.1.0               
[33] fastmap_1.1.0               readxl_1.3.1               
[35] rlang_0.4.10                rstudioapi_0.13            
[37] RSQLite_2.2.3               generics_0.1.0             
[39] jsonlite_1.7.2              BiocParallel_1.24.1        
[41] RCurl_1.98-1.2              magrittr_2.0.1             
[43] GenomeInfoDbData_1.2.4      Matrix_1.3-2               
[45] Rcpp_1.0.6                  munsell_0.5.0              
[47] lifecycle_0.2.0             stringi_1.5.3              
[49] SummarizedExperiment_1.20.0 zlibbioc_1.36.0            
[51] pkgbuild_1.2.0              grid_4.0.3                 
[53] blob_1.2.1                  crayon_1.4.1               
[55] lattice_0.20-41             haven_2.3.1                
[57] hms_1.0.0                   ps_1.5.0                   
[59] pillar_1.4.7                GenomicRanges_1.42.0       
[61] pkgload_1.1.0               reprex_1.0.0               
[63] glue_1.4.2                  remotes_2.2.0              
[65] BiocManager_1.30.10         modelr_0.1.8               
[67] vctrs_0.3.6                 cellranger_1.1.0           
[69] testthat_3.0.1              gtable_0.3.0               
[71] assertthat_0.2.1            cachem_1.0.3               
[73] xtable_1.8-4                broom_0.7.4                
[75] memoise_2.0.0               ellipsis_0.3.1             
>

I would appreciate it if you could help me to fix this robust problem! Thank you.

Siyuan

GSVA • 7.1k views
ADD COMMENT
0
Entering edit mode
Robert Castelo ★ 3.4k
@rcastelo
Last seen 4 days ago
Barcelona/Universitat Pompeu Fabra

hi,

thanks for reporting this problem, is has been fixed in the release version of GSVA 1.38.2 and in devel 1.39.17. By the way, your session information shows that you are currently using the development version of GSVA. Beware that, in general, the development version of Bioconductor packages may not work as expected because developers work on that version adding new features or refactoring code, which may lead to unexpected behavior of the package. So, unless the end user wants to beta-test new features, he/she should be using the release version.

a comment on your code, when you use the function getGmt() to read a GMT file of gene sets defined using gene symbols, I'd set the argument geneIdType=SymbolIdentifier() so that the resulting GeneSetCollection object has the additional bit of metadata that tells the type of identifier being used:

geneset <- getGmt(file.path(msigdb_GMTs, msigdb), , geneIdType=SymbolIdentifier())

this becomes useful if you need to map those identifiers to another type of identifier.

cheers,

robert.

ADD COMMENT
0
Entering edit mode

Dear Robert,

Thanks for your reply! Unfortunately, I still have this "No identifiers" problem after I updated my package to GSVA 1.38.2. Are there some problems in my computer?

I appreciate your help!

Best wishes

ADD REPLY
0
Entering edit mode

hi,

the following code, which was reproducing the bug before, now runs fine:

library(GSEABase)
library(GSVA)
library(GSVAdata)
library(hgu95a.db)
library(annotate)

geneset <- getGmt("c2.cp.kegg.v7.2.symbols.gmt", geneIdType=SymbolIdentifier())

data(leukemia)
syms <- getSYMBOL(featureNames(leukemia_eset), "hgu95a.db")
exps <- exprs(leukemia_eset)
rownames(exps) <- syms
exps[1:5, 1:5]
        CL2001011101AA.CEL CL2001011102AA.CEL CL2001011104AA.CEL
MAPK3            11.354426          10.932543          11.185906
TIE1              9.185470           8.823661           8.687186
CYP2C19           7.806993           8.127591           7.842353
CXCR5            10.164370          10.048514          10.006014
CXCR5             9.642389           9.834265           9.750938
        CL2001011105AA.CEL CL2001011109AA.CEL
MAPK3            11.251631          11.540745
TIE1              8.958305           9.762877
CYP2C19           8.319227           8.334177
CXCR5            10.474046          10.115543
CXCR5            10.430205          10.066628
es <- gsva(exps, geneset)
Estimating GSVA scores for 186 gene sets.
Estimating ECDFs with Gaussian kernels
  |======================================================================| 100%
sessionInfo()                                                                           
R version 4.0.3 (2020-10-10)                                                              
Platform: x86_64-apple-darwin17.0 (64-bit)                                                
Running under: macOS Catalina 10.15.7                                                     

Matrix products: default                                                                  
BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib         
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib       

locale:                                                                                   
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8                         

attached base packages:                                                                   
[1] stats4    parallel  stats     graphics  grDevices utils     datasets                  
[8] methods   base                                                                        

other attached packages:                                                                  
 [1] GSVAdata_1.26.0      hgu95a.db_3.2.3      org.Hs.eg.db_3.12.0 
 [4] GSVA_1.38.2          GSEABase_1.52.1      graph_1.68.0        
 [7] annotate_1.68.0      XML_3.99-0.5         AnnotationDbi_1.52.0
[10] IRanges_2.24.1       S4Vectors_0.28.1     Biobase_2.50.0      
[13] BiocGenerics_0.36.0  nvimcom_0.9-28       colorout_1.2-2      
loaded via a namespace (and not attached):
 [1] Rcpp_1.0.6                  compiler_4.0.3             
 [3] GenomeInfoDb_1.26.2         XVector_0.30.0             
 [5] MatrixGenerics_1.2.1        bitops_1.0-6               
 [7] tools_4.0.3                 zlibbioc_1.36.0            
 [9] bit_4.0.4                   lattice_0.20-41            
[11] RSQLite_2.2.3               memoise_2.0.0              
[13] pkgconfig_2.0.3             rlang_0.4.10               
[15] Matrix_1.3-2                DelayedArray_0.16.1        
[17] DBI_1.1.1                   fastmap_1.1.0              
[19] GenomeInfoDbData_1.2.4      httr_1.4.2                 
[21] vctrs_0.3.6                 grid_4.0.3                 
[23] bit64_4.0.5                 R6_2.5.0                   
[25] BiocParallel_1.24.1         blob_1.2.1                 
[27] matrixStats_0.58.0          GenomicRanges_1.42.0       
[29] SummarizedExperiment_1.20.0 xtable_1.8-4               
[31] RCurl_1.98-1.2              cachem_1.0.3

so, for me to fix the problem, you would have to provide code and data reproducing it.

cheers,

robert.

ADD REPLY
0
Entering edit mode

Dear Robert,

Thank you for your help! I run it again. This time it did work!

Best wishes

ADD REPLY
0
Entering edit mode

Dear Robert,

I am getting the same error using R 4.1.0 and GSVA 1.42 . The funny thing is, everything was working fine until today and suddenly started giving me this specific error. I tried troubleshooting, and running the mapGeneSetsToFeatures() function manually, which works just fine.

Even this simple example I found in one of the github issues as a test case:

  p <- 10 ## number of genes
  n <- 30 ## number of samples
  nGrp1 <- 15 ## number of samples in group 1
  nGrp2 <- n - nGrp1 ## number of samples in group 2

  geneSets <- list(set1=paste("g", 1:3, sep=""),
                      set2=paste("g", 4:6, sep=""),
                      set3=paste("g", 7:10, sep=""))
   y <- matrix(rnorm(n*p), nrow=p, ncol=n,
                 dimnames=list(paste("g", 1:p, sep="") , paste("s", 1:n, sep="")))
   y[geneSets$set1, (nGrp1+1):n] <- y[geneSets$set1, (nGrp1+1):n] + 2

  library(limma)
  design <- cbind(sampleGroup1=1, sampleGroup2vs1=c(rep(0, nGrp1), rep(1, nGrp2)))
  fit <- lmFit(y, design)
  fit <- eBayes(fit)
  topTable(fit, coef="sampleGroup2vs1")

   ## estimate GSVA enrichment scores for the three sets
   library(GSVA)
   gsva_es <- gsva(y, geneSets, mx.diff=1)

gives me:

Error in .mapGeneSetsToFeatures(gset.idx.list, rownames(expr)) : No identifiers in the gene sets could be matched to the identifiers in the expression data.

I tried removing the GSVA package and re-installing but it didn't help. Any help is appreciated.

Best,
Selcan

ADD REPLY
0
Entering edit mode

Just some more details. When I restart the R session and just run the example above, it works. When I load in any additional packages then I get the error again. Same thing happens when I use R 4.2.0 and GSVA 1.44 as well on another machine. I am using Rstudio if that changes anything.

ADD REPLY
0
Entering edit mode

I really can't figure out why loading of other packages and/or data is messing up GSVA. Specifically the mapGeneSetsToFeatures function. I ended up saving the data objects needed to run GSVA and running it stand alone, new and empty R session. That works! So I can save the output and plug it into where I need the results in the Rmd files I originally was running them. This seems to be a very weird issue.

ADD REPLY
0
Entering edit mode

Hi, I cannot reproduce the error with the current release version of GSVA, your code works just fine, this is my session information just in case it helps you tracking down the problem:

R version 4.2.1 (2022-06-23)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.7

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] GSVA_1.44.4    limma_3.52.3   colorout_1.2-2

loaded via a namespace (and not attached):
 [1] KEGGREST_1.36.3             SummarizedExperiment_1.26.1
 [3] beachmat_2.12.0             BiocSingular_1.12.0        
 [5] HDF5Array_1.24.2            lattice_0.20-45            
 [7] rhdf5_2.40.0                vctrs_0.4.1                
 [9] stats4_4.2.1                blob_1.2.3                 
[11] XML_3.99-0.10               rlang_1.0.5                
[13] DBI_1.1.3                   BiocParallel_1.30.3        
[15] SingleCellExperiment_1.18.0 BiocGenerics_0.42.0        
[17] bit64_4.0.5                 matrixStats_0.62.0         
[19] GenomeInfoDbData_1.2.8      zlibbioc_1.42.0            
[21] MatrixGenerics_1.8.1        Biostrings_2.64.1          
[23] rsvd_1.0.5                  ScaledMatrix_1.4.1         
[25] codetools_0.2-18            memoise_2.0.1              
[27] Biobase_2.56.0              IRanges_2.30.1             
[29] fastmap_1.1.0               GenomeInfoDb_1.32.4        
[31] irlba_2.3.5                 parallel_4.2.1             
[33] AnnotationDbi_1.58.0        GSEABase_1.58.0            
[35] Rcpp_1.0.9                  xtable_1.8-4               
[37] cachem_1.0.6                DelayedArray_0.22.0        
[39] S4Vectors_0.34.0            graph_1.74.0               
[41] annotate_1.74.0             XVector_0.36.0             
[43] bit_4.0.4                   png_0.1-7                  
[45] GenomicRanges_1.48.0        grid_4.2.1                 
[47] cli_3.4.0                   tools_4.2.1                
[49] bitops_1.0-7                rhdf5filters_1.8.0         
[51] RCurl_1.98-1.8              RSQLite_2.2.17             
[53] crayon_1.5.1                Matrix_1.5-1               
[55] DelayedMatrixStats_1.18.0   sparseMatrixStats_1.8.0    
[57] httr_1.4.4                  Rhdf5lib_1.18.2            
[59] R6_2.5.1                    compiler_4.2.1
ADD REPLY

Login before adding your answer.

Traffic: 446 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6