pRoloc - phenoDisco - issue with example
1
0
Entering edit mode
johnscrn • 0
@johnscrn-21568
Last seen 4.7 years ago

I am having trouble with defining my semi-supervised marker set in the phenoDisco function. I am able to recreate the problem using the example from version 2 of the main paper https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6053703/

Here is the code which will work

library(MSnbase)
library(pRoloc)
library(pRolocdata)
library(pRolocGUI)

extdatadir <- system.file ( "extdata" , package = "pRolocdata" )

csvfile <- dir (extdatadir, full.names = TRUE ,
                pattern = "hyperLOPIT-SIData-ms3-rep12-intersect.csv" )

hl <- readMSnSet2 (csvfile, ecol = 8: 27 , fnames = 1 , skip = 1 )

fvarLabels (hl)[ 1: 3 ] <- c ( "uniprot.accession" , "uniprot.id" , "description" )
fvarLabels (hl)[ 4: 6 ] <- paste0 ( "peptides.expt" , 1: 3 )

fData (hl)[ 1: 4 , c ( 1: 2 , 4: 6 )]
pData (hl)$ Replicate <- rep ( 1: 2 , each = 10 )
pData (hl)$ Tag <- sub ( "\\.1$" , "" , sub ( "^X" , "" , sampleNames (hl)))

expinfo <- dir (extdatadir, full.names = TRUE ,
                pattern = "hyperLOPIT-SIData-fraction-info.csv" )

fracinfo <- read.csv (expinfo, row.names= 1 , skip = 2 ,
                      header = FALSE , stringsAsFactors = FALSE )
pData (hl)$ Gradient.Fraction <- c (fracinfo[, 1 ], fracinfo[, 2 ])
pData (hl)$ Iodixonal.Density <- c (fracinfo[, 4 ], fracinfo[, 5 ])

hl <- normalise (hl, method = "sum" )

hl <- impute (hl, method = "knn" )

mrk <- pRolocmarkers ( species = "mmus" )
hl <- addMarkers (hl, mrk)

hl <- fDataToUnknown (hl, from = "Golgi apparatus" , to = "unknown" )

getMarkers (hl, fcol = "phenoDisco.Input" )

> getMarkers (hl, fcol = "phenoDisco.Input" )
organelleMarkers
                         40S Ribosome                          60S Ribosome 
                                   26                                    43 
Endoplasmic reticulum/Golgi apparatus                         Mitochondrion 
                                   76                                   261 
                      Plasma membrane                            Proteasome 
                                   50                                    34 
                              unknown 
                                 4542 

> fvarLabels(hl)
 [1] "uniprot.accession"                 "uniprot.id"                       
 [3] "description"                       "peptides.expt1"                   
 [5] "peptides.expt2"                    "peptides.expt3"                   
 [7] "Experiment.2.1"                    "phenoDisco.Input"                 
 [9] "phenoDisco.Output"                 "Curated.phenoDisco.Output"        
[11] "SVM.marker.set"                    "SVM.classification"               
[13] "SVM.score"                         "SVM.classification..top.quartile."
[15] "Final.Localization.Assignment"     "First.localization.evidence."     
[17] "Curated.Organelles"                "Cytoskeletal.Components"          
[19] "Trafficking.Proteins"              "Protein.Complexes"                
[21] "Signaling.Cascades"                "Oct4.Interactome"                 
[23] "Nanog.Interactome"                 "Sox2.Interactome"                 
[25] "Cell.Surface.Proteins"             "markers" 

hl <- phenoDisco (hl, fcol = "phenoDisco.Input" , times = 200 , GS = 60 )

However, if I remove the "markers" feature I get the same error I am getting in my data

> fData(hl) <- fData(hl)[-26]
> hl <- phenoDisco (hl, fcol = "phenoDisco.Input" , times = 200 , GS = 60 )
Error in `[.data.frame`(fData(x), , fcol) : undefined columns selected
>

This suggests to me that phenoDisco is using the "markers" feature no matter what is requested with fcol.

'markers' is set by pRolocmarkers earlier and is not the same as phenoDisco.Input

> getMarkers(hl,fcol="markers")
organelleMarkers
           40S Ribosome            60S Ribosome      Actin cytoskeleton 
                     27                      43                      13 
                Cytosol   Endoplasmic reticulum                Endosome 
                     43                      95                      12 
   Extracellular matrix                Lysosome           Mitochondrion 
                     10                      33                     383 
    Nucleus - Chromatin Nucleus - Non-chromatin              Peroxisome 
                     64                      85                      17 
        Plasma membrane              Proteasome                 unknown 
                     51                      34                    4122

Note: in the paper/vignette the phenodisco function is not actually called because it would take a long time to run so the results are loaded in to save time

In my own data I can use the feature name 'markers' and keep going but unless I am missing something that isn't the intended use of fcol.

> traceback()
5: stop("undefined columns selected")
4: `[.data.frame`(fData(x), , fcol)
3: fData(x)[, fcol]
2: anyUnknown(object)
1: phenoDisco(hl, fcol = "phenoDisco.Input", times = 200, GS = 60)

> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17763)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] pRolocGUI_1.18.0     pRolocdata_1.22.0    pRoloc_1.24.0       
 [4] BiocParallel_1.17.18 MLInterfaces_1.64.0  cluster_2.1.0       
 [7] annotate_1.62.0      XML_3.98-1.20        AnnotationDbi_1.46.0
[10] IRanges_2.18.1       MSnbase_2.10.1       ProtGenerics_1.16.0 
[13] S4Vectors_0.22.0     mzR_2.18.0           Rcpp_1.0.2          
[16] Biobase_2.44.0       BiocGenerics_0.30.0 

loaded via a namespace (and not attached):
  [1] snow_0.4-3            backports_1.1.4       plyr_1.8.4           
  [4] igraph_1.2.4.1        lazyeval_0.2.2        splines_3.6.1        
  [7] ggvis_0.4.4           crosstalk_1.0.0       ggplot2_3.2.0        
 [10] digest_0.6.20         foreach_1.4.7         htmltools_0.3.6      
 [13] viridis_0.5.1         gdata_2.18.0          magrittr_1.5         
 [16] memoise_1.1.0         doParallel_1.0.14     mixtools_1.1.0       
 [19] sfsmisc_1.1-4         limma_3.40.6          recipes_0.1.6        
 [22] gower_0.2.1           rda_1.0.2-2.1         lpSolve_5.6.13.1     
 [25] prettyunits_1.0.2     colorspace_1.4-1      blob_1.2.0           
 [28] xfun_0.8              dplyr_0.8.3           crayon_1.3.4         
 [31] RCurl_1.95-4.12       hexbin_1.27.3         genefilter_1.66.0    
 [34] zeallot_0.1.0         impute_1.58.0         survival_2.44-1.1    
 [37] iterators_1.0.12      glue_1.3.1            gtable_0.3.0         
 [40] ipred_0.9-9           zlibbioc_1.30.0       kernlab_0.9-27       
 [43] prabclus_2.3-1        DEoptimR_1.0-8        scales_1.0.0         
 [46] vsn_3.52.0            mvtnorm_1.0-11        DBI_1.0.0            
 [49] viridisLite_0.3.0     xtable_1.8-4          progress_1.2.2       
 [52] bit_1.1-14            proxy_0.4-23          mclust_5.4.5         
 [55] preprocessCore_1.46.0 DT_0.7                lava_1.6.5           
 [58] prodlim_2018.04.18    sampling_2.8          htmlwidgets_1.3      
 [61] httr_1.4.0            threejs_0.3.1         FNN_1.1.3            
 [64] RColorBrewer_1.1-2    fpc_2.2-3             modeltools_0.2-22    
 [67] pkgconfig_2.0.2       flexmix_2.3-15        nnet_7.3-12          
 [70] caret_6.0-84          reshape2_1.4.3        tidyselect_0.2.5     
 [73] rlang_0.4.0           later_0.8.0           munsell_0.5.0        
 [76] mlbench_2.1-1         tools_3.6.1           LaplacesDemon_16.1.1 
 [79] generics_0.0.2        RSQLite_2.1.2         pls_2.7-1            
 [82] stringr_1.4.0         mzID_1.22.0           ModelMetrics_1.2.2   
 [85] knitr_1.23            bit64_0.9-7           robustbase_0.93-5    
 [88] randomForest_4.6-14   purrr_0.3.2           dendextend_1.12.0    
 [91] ncdf4_1.16.1          nlme_3.1-140          mime_0.7             
 [94] biomaRt_2.40.3        compiler_3.6.1        e1071_1.7-2          
 [97] affyio_1.54.0         tibble_2.1.3          stringi_1.4.3        
[100] lattice_0.20-38       Matrix_1.2-17         gbm_2.1.5            
[103] vctrs_0.2.0           pillar_1.4.2          BiocManager_1.30.4   
[106] MALDIquant_1.19.3     data.table_1.12.2     bitops_1.0-6         
[109] httpuv_1.5.1          R6_2.4.0              pcaMethods_1.76.0    
[112] affy_1.62.0           hwriter_1.3.2         promises_1.0.1       
[115] gridExtra_2.3         codetools_0.2-16      MASS_7.3-51.4        
[118] gtools_3.8.1          assertthat_0.2.1      withr_2.1.2          
[121] diptest_0.75-7        hms_0.5.0             grid_3.6.1           
[124] rpart_4.1-15          timeDate_3043.102     coda_0.19-3          
[127] class_7.3-15          segmented_1.0-0       shiny_1.3.2          
[130] lubridate_1.7.4       base64enc_0.1-3

.

pRoloc phenoDisco fcol • 977 views
ADD COMMENT
1
Entering edit mode
@laurent-gatto-5645
Last seen 3 days ago
Belgium

Thank you very much for the bug report. The issue wasn't in phenoDisco itself (which used the appropriate feature variable), but a helper function (anyUnknown) that performed some test and that didn't get the user's fcol passed along.

The issue is fixed in version 1.25.2 that is available immediately on github (get it with BiocManager::install("lgatto/pRoloc")) or within 24 hours or so from the Bioconductor server.

> fvarLabels(hl)
 [1] "uniprot.accession"                 "uniprot.id"                       
 [3] "description"                       "peptides.expt1"                   
 [5] "peptides.expt2"                    "peptides.expt3"                   
 [7] "Experiment.2.1"                    "phenoDisco.Input"                 
 [9] "phenoDisco.Output"                 "Curated.phenoDisco.Output"        
[11] "SVM.marker.set"                    "SVM.classification"               
[13] "SVM.score"                         "SVM.classification..top.quartile."
[15] "Final.Localization.Assignment"     "First.localization.evidence."     
[17] "Curated.Organelles"                "Cytoskeletal.Components"          
[19] "Trafficking.Proteins"              "Protein.Complexes"                
[21] "Signaling.Cascades"                "Oct4.Interactome"                 
[23] "Nanog.Interactome"                 "Sox2.Interactome"                 
[25] "Cell.Surface.Proteins"            
> phenoDisco(hl, fcol = "phenoDisco.Input" , times = 200 , GS = 60)
Iteration 1
[...]
> packageVersion("pRoloc")
[1] ‘1.25.2’
ADD COMMENT

Login before adding your answer.

Traffic: 662 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6