Error in parameter setting using svmOptimisation for pRoloc analysis
2
0
Entering edit mode
sjp221 • 0
@sjp221-15237
Last seen 6.1 years ago

 

 

I am attempting proteome analysis in pRoloc, and am following the workflow of a recently published hyperLOPIT paper (https://www.nature.com/articles/nprot.2017.026/tables/3). I am currently on step 87, where I am trying to use supervised machine learning to predict the localisation of unlabelled proteins. I am attempting to set my parameters (as also seen in section 5.2.1 here: https://bioconductor.org/packages/3.7/bioc/vignettes/pRoloc/inst/doc/pRoloc-tutorial.html#52_supervised_ml), and keep getting an error message saying:

(data, reference, dnn = dnn, ...) :    all arguments must have the same length

I assume this must be in relation to my list of markers, but am unsure why this is occurring. I have been successful in producing the Profile plots and PCA plots (in section 3.1 and 3.3 here: https://bioconductor.org/packages/3.7/bioc/vignettes/pRoloc/inst/doc/pRoloc-tutorial.html#52_supervised_ml), so I am pretty sure my MSnSet is working correctly.

I have attached my code below.

"msnmarkers" refers to my MSnSet, and within this "markers" is the marker list. 

w <- table(fData(msnsetmarkers)[, "markers"])
w <- 1/w[names(w) != "unknown"]
# the part I get an error message for:
params <- svmOptimisation(msnsetmarkers, times = 100, xval = 5, class.weights = w)

Below is the traceback:
Error in table(data, reference, dnn = dnn, ...) : all arguments must have the same length
5. stop("all arguments must have the same length")
4. table(data, reference, dnn = dnn, ...)
3. confusionMatrix.default(ans, .test2$markers)
2. confusionMatrix(ans, .test2$markers)
1. svmOptimization(msnsetmarkers, times = 100, xval = 5, class.weights = w)

Here is the sessionInfo output:

sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.1

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] BiocInstaller_1.28.0 tidyr_0.8.0          dplyr_0.7.4          pRoloc_1.18.0       
 [5] MLInterfaces_1.58.0  cluster_2.0.6        annotate_1.56.1      XML_3.98-1.10       
 [9] AnnotationDbi_1.40.0 IRanges_2.12.0       S4Vectors_0.16.0     MSnbase_2.4.2       
[13] ProtGenerics_1.10.0  BiocParallel_1.12.0  mzR_2.12.0           Rcpp_0.12.15        
[17] Biobase_2.38.0       BiocGenerics_0.24.0 

loaded via a namespace (and not attached):
  [1] plyr_1.8.4            igraph_1.1.2          lazyeval_0.2.1        splines_3.4.3        
  [5] ggvis_0.4.3           crosstalk_1.0.0       ggplot2_2.2.1         digest_0.6.15        
  [9] foreach_1.4.4         htmltools_0.3.6       viridis_0.5.0         gdata_2.18.0         
 [13] magrittr_1.5          memoise_1.1.0         doParallel_1.0.11     sfsmisc_1.1-2        
 [17] limma_3.34.9          recipes_0.1.2         gower_0.1.2           rda_1.0.2-2          
 [21] dimRed_0.1.0          lpSolve_5.6.13        prettyunits_1.0.2     colorspace_1.3-2     
 [25] blob_1.1.0            RCurl_1.95-4.10       hexbin_1.27.2         genefilter_1.60.0    
 [29] bindr_0.1.1           impute_1.52.0         survival_2.41-3       iterators_1.0.9      
 [33] glue_1.2.0            DRR_0.0.3             gtable_0.2.0          ipred_0.9-6          
 [37] zlibbioc_1.24.0       kernlab_0.9-25        ddalpha_1.3.1.1       prabclus_2.2-6       
 [41] DEoptimR_1.0-8        scales_0.5.0          vsn_3.46.0            mvtnorm_1.0-7        
 [45] DBI_0.8               viridisLite_0.3.0     xtable_1.8-2          progress_1.1.2       
 [49] foreign_0.8-69        bit_1.1-12            proxy_0.4-21          mclust_5.4           
 [53] preprocessCore_1.40.0 lava_1.6              prodlim_1.6.1         sampling_2.8         
 [57] htmlwidgets_1.0       httr_1.3.1            threejs_0.3.1         FNN_1.1              
 [61] RColorBrewer_1.1-2    fpc_2.1-11            modeltools_0.2-21     pkgconfig_2.0.1      
 [65] flexmix_2.3-14        nnet_7.3-12           caret_6.0-78          tidyselect_0.2.4     
 [69] rlang_0.2.0           reshape2_1.4.3        munsell_0.4.3         mlbench_2.1-1        
 [73] tools_3.4.3           RSQLite_2.0           pls_2.6-0             broom_0.4.3          
 [77] stringr_1.3.0         mzID_1.16.0           ModelMetrics_1.1.0    knitr_1.20           
 [81] bit64_0.9-7           robustbase_0.92-8     randomForest_4.6-12   purrr_0.2.4          
 [85] dendextend_1.7.0      bindrcpp_0.2          nlme_3.1-131.1        whisker_0.3-2        
 [89] mime_0.5              RcppRoll_0.2.2        biomaRt_2.34.2        compiler_3.4.3       
 [93] e1071_1.6-8           affyio_1.48.0         tibble_1.4.2          stringi_1.1.7        
 [97] lattice_0.20-35       trimcluster_0.1-2     Matrix_1.2-12         psych_1.7.8          
[101] gbm_2.1.3             pillar_1.2.1          MALDIquant_1.17       bitops_1.0-6         
[105] httpuv_1.3.6.2        R6_2.2.2              pcaMethods_1.70.0     affy_1.56.0          
[109] hwriter_1.3.2         gridExtra_2.3         codetools_0.2-15      MASS_7.3-49          
[113] gtools_3.5.0          assertthat_0.2.0      CVST_0.2-1            withr_2.1.1          
[117] mnormt_1.5-5          diptest_0.75-7        grid_3.4.3            rpart_4.1-13         
[121] timeDate_3043.102     class_7.3-14          Rtsne_0.13            shiny_1.0.5          
[125] lubridate_1.7.3       base64enc_0.1-3      

Thanks very much!!!

 

bioconductor proloc • 1.4k views
ADD COMMENT
1
Entering edit mode
@laurent-gatto-5645
Last seen 2 days ago
Belgium

It is a bit difficult to provide an explanation at this stage. How many markers do you have for your sub-cellular classes? Could you paste the output of getMarkers - here's the output for the stem cell data:

> library("pRoloc")
> library("pRolocdata")
> data(hyperLOPIT2015)
> getMarkers(hyperLOPIT2015)
organelleMarkers
                         40S Ribosome
                                   27
                         60S Ribosome
                                   43
                   Actin cytoskeleton
                                   13
                              Cytosol
                                   43
Endoplasmic reticulum/Golgi apparatus
                                  107
                             Endosome
                                   13
                 Extracellular matrix
                                   13
                             Lysosome
                                   33
                        Mitochondrion
                                  383
                  Nucleus - Chromatin
                                   64
              Nucleus - Non-chromatin
                                   85
                           Peroxisome
                                   17
                      Plasma membrane
                                   51
                           Proteasome
                                   34
                              unknown
                                 4106

 

Could it be that you have too few markers? We typically recommend 13+ per class.

ADD COMMENT
0
Entering edit mode

 

Here are my markers. There are less than 13+ per marker - will this make a big impact?

organelleMarkers

          CYTOSOL                ER             GOLGI          LYSOSOME 

               25                        48                  3                      9 
     MITOCHONDRIA           NUCLEUS    NUCLEUS-CHROMATIN        PEROXISOME 
               34                                  11                         2                                       2 
               PM        PROTEASOME      RIBOSOME 40S      RIBOSOME 60S 
               12                24                               25                           41 
          unknown 
             1141  

code used: getMarkers(msnsetmarkers)
ADD REPLY
1
Entering edit mode

Thank you. My first suggestion would be to increase the number of markers, especially for the Golgi, chromatin and peroxisome. As I said, ideally, try to get 13+ for each class.

I can't say it this is the reason for the error you see (although I suspect it is), but even if it's not, currently, you won't be able to (1) get reliable model hyper-parameters without enough markers to train your model (that's what the svmOptimisation function helps with) and (2) it will be unlikely that more proteins will be assigned to these classes (and if they do, the assignments won't be very reliable).

To help you with identifying markers, you may want to have a look at those we propose (based on previous studies); here's what we have at the moment:

> pRolocmarkers()
7 marker lists available:
Arabidopsis thaliana [atha]:
 Ids: TAIR, 543 markers
Drosophila melanogaster [dmel]:
 Ids: Uniprot, 179 markers
Gallus gallus [ggal]:
 Ids: IPI, 102 markers
Homo sapiens [hsap]:
 Ids: Uniprot, 872 markers
Mus musculus [mmus]:
 Ids: Uniprot, 937 markers
Saccharomyces cerevisiae [scer_sgd]:
 Ids: SGD, 259 markers
Saccharomyces cerevisiae [scer_uniprot]:
 Ids: Uniprot Accession, 259 markers

See the documentation of the addMarkers function for help on how to add them.

ADD REPLY
0
Entering edit mode

I have tried this and am getting the same error 

Code for new markers and adding them:

mrk <- pRolocmarkers(species = "hsap")
msnsetmarkersnew <- addMarkers(msnsetnorm, mrk, verbose = FALSE)
msnsetmarkersnew <- fDataToUnknown(msnsetmarkersnew, fcol="markers")​

wnew <- table(fData(msnsetmarkersnew)[, "markers"])
wnew <- 1/wnew[names(wnew) != "unknown"]
#this line still gives the error
params <- svmOptimisation(msnsetmarkersnew, times = 100, xval = 5, class.weights = wnew2)​

 

Error:

Error in table(data, reference, dnn = dnn, ...) : all arguments must have the same length

 

Markers now look like this:

organelleMarkers
         40S Ribosome          60S Ribosome    actin cytoskeleton               Cytosol 
                   29                    43                    24                    25 
Endoplasmic Reticulum       Golgi Apparatus              Lysosome          Mitochondria 
                   44                     2                    11                    31 
              Nucleus            PEROXISOME            plasma mem            Proteasome 
                   19                     4                    23                    27 
              unknown 
                 1095 ​
ADD REPLY
0
Entering edit mode

The markers still look a bit week (Golgi apparatus has 2 proteins, Peroxison has 4).

The error comes from somewhere within the code, where table expects two vectors of the same length, and somehow they don't with your data:

> table(letters[1:10], letters[1:9])
Error in base::table(...) : all arguments must have the same length

I could look into it if you send me your data.

ADD REPLY
0
Entering edit mode
@laurent-gatto-5645
Last seen 2 days ago
Belgium

In the end, the issues seemed to be related to installation issues, as, after reinstallation as well as me running the code on the data, the error couldn't be reproduced.

ADD COMMENT

Login before adding your answer.

Traffic: 1069 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6