I am having trouble with defining my semi-supervised marker set in the phenoDisco function. I am able to recreate the problem using the example from version 2 of the main paper https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6053703/
Here is the code which will work
library(MSnbase)
library(pRoloc)
library(pRolocdata)
library(pRolocGUI)
extdatadir <- system.file ( "extdata" , package = "pRolocdata" )
csvfile <- dir (extdatadir, full.names = TRUE ,
pattern = "hyperLOPIT-SIData-ms3-rep12-intersect.csv" )
hl <- readMSnSet2 (csvfile, ecol = 8: 27 , fnames = 1 , skip = 1 )
fvarLabels (hl)[ 1: 3 ] <- c ( "uniprot.accession" , "uniprot.id" , "description" )
fvarLabels (hl)[ 4: 6 ] <- paste0 ( "peptides.expt" , 1: 3 )
fData (hl)[ 1: 4 , c ( 1: 2 , 4: 6 )]
pData (hl)$ Replicate <- rep ( 1: 2 , each = 10 )
pData (hl)$ Tag <- sub ( "\\.1$" , "" , sub ( "^X" , "" , sampleNames (hl)))
expinfo <- dir (extdatadir, full.names = TRUE ,
pattern = "hyperLOPIT-SIData-fraction-info.csv" )
fracinfo <- read.csv (expinfo, row.names= 1 , skip = 2 ,
header = FALSE , stringsAsFactors = FALSE )
pData (hl)$ Gradient.Fraction <- c (fracinfo[, 1 ], fracinfo[, 2 ])
pData (hl)$ Iodixonal.Density <- c (fracinfo[, 4 ], fracinfo[, 5 ])
hl <- normalise (hl, method = "sum" )
hl <- impute (hl, method = "knn" )
mrk <- pRolocmarkers ( species = "mmus" )
hl <- addMarkers (hl, mrk)
hl <- fDataToUnknown (hl, from = "Golgi apparatus" , to = "unknown" )
getMarkers (hl, fcol = "phenoDisco.Input" )
> getMarkers (hl, fcol = "phenoDisco.Input" )
organelleMarkers
40S Ribosome 60S Ribosome
26 43
Endoplasmic reticulum/Golgi apparatus Mitochondrion
76 261
Plasma membrane Proteasome
50 34
unknown
4542
> fvarLabels(hl)
[1] "uniprot.accession" "uniprot.id"
[3] "description" "peptides.expt1"
[5] "peptides.expt2" "peptides.expt3"
[7] "Experiment.2.1" "phenoDisco.Input"
[9] "phenoDisco.Output" "Curated.phenoDisco.Output"
[11] "SVM.marker.set" "SVM.classification"
[13] "SVM.score" "SVM.classification..top.quartile."
[15] "Final.Localization.Assignment" "First.localization.evidence."
[17] "Curated.Organelles" "Cytoskeletal.Components"
[19] "Trafficking.Proteins" "Protein.Complexes"
[21] "Signaling.Cascades" "Oct4.Interactome"
[23] "Nanog.Interactome" "Sox2.Interactome"
[25] "Cell.Surface.Proteins" "markers"
hl <- phenoDisco (hl, fcol = "phenoDisco.Input" , times = 200 , GS = 60 )
However, if I remove the "markers" feature I get the same error I am getting in my data
> fData(hl) <- fData(hl)[-26]
> hl <- phenoDisco (hl, fcol = "phenoDisco.Input" , times = 200 , GS = 60 )
Error in `[.data.frame`(fData(x), , fcol) : undefined columns selected
>
This suggests to me that phenoDisco is using the "markers" feature no matter what is requested with fcol.
'markers' is set by pRolocmarkers earlier and is not the same as phenoDisco.Input
> getMarkers(hl,fcol="markers")
organelleMarkers
40S Ribosome 60S Ribosome Actin cytoskeleton
27 43 13
Cytosol Endoplasmic reticulum Endosome
43 95 12
Extracellular matrix Lysosome Mitochondrion
10 33 383
Nucleus - Chromatin Nucleus - Non-chromatin Peroxisome
64 85 17
Plasma membrane Proteasome unknown
51 34 4122
Note: in the paper/vignette the phenodisco function is not actually called because it would take a long time to run so the results are loaded in to save time
In my own data I can use the feature name 'markers' and keep going but unless I am missing something that isn't the intended use of fcol.
> traceback()
5: stop("undefined columns selected")
4: `[.data.frame`(fData(x), , fcol)
3: fData(x)[, fcol]
2: anyUnknown(object)
1: phenoDisco(hl, fcol = "phenoDisco.Input", times = 200, GS = 60)
> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17763)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] pRolocGUI_1.18.0 pRolocdata_1.22.0 pRoloc_1.24.0
[4] BiocParallel_1.17.18 MLInterfaces_1.64.0 cluster_2.1.0
[7] annotate_1.62.0 XML_3.98-1.20 AnnotationDbi_1.46.0
[10] IRanges_2.18.1 MSnbase_2.10.1 ProtGenerics_1.16.0
[13] S4Vectors_0.22.0 mzR_2.18.0 Rcpp_1.0.2
[16] Biobase_2.44.0 BiocGenerics_0.30.0
loaded via a namespace (and not attached):
[1] snow_0.4-3 backports_1.1.4 plyr_1.8.4
[4] igraph_1.2.4.1 lazyeval_0.2.2 splines_3.6.1
[7] ggvis_0.4.4 crosstalk_1.0.0 ggplot2_3.2.0
[10] digest_0.6.20 foreach_1.4.7 htmltools_0.3.6
[13] viridis_0.5.1 gdata_2.18.0 magrittr_1.5
[16] memoise_1.1.0 doParallel_1.0.14 mixtools_1.1.0
[19] sfsmisc_1.1-4 limma_3.40.6 recipes_0.1.6
[22] gower_0.2.1 rda_1.0.2-2.1 lpSolve_5.6.13.1
[25] prettyunits_1.0.2 colorspace_1.4-1 blob_1.2.0
[28] xfun_0.8 dplyr_0.8.3 crayon_1.3.4
[31] RCurl_1.95-4.12 hexbin_1.27.3 genefilter_1.66.0
[34] zeallot_0.1.0 impute_1.58.0 survival_2.44-1.1
[37] iterators_1.0.12 glue_1.3.1 gtable_0.3.0
[40] ipred_0.9-9 zlibbioc_1.30.0 kernlab_0.9-27
[43] prabclus_2.3-1 DEoptimR_1.0-8 scales_1.0.0
[46] vsn_3.52.0 mvtnorm_1.0-11 DBI_1.0.0
[49] viridisLite_0.3.0 xtable_1.8-4 progress_1.2.2
[52] bit_1.1-14 proxy_0.4-23 mclust_5.4.5
[55] preprocessCore_1.46.0 DT_0.7 lava_1.6.5
[58] prodlim_2018.04.18 sampling_2.8 htmlwidgets_1.3
[61] httr_1.4.0 threejs_0.3.1 FNN_1.1.3
[64] RColorBrewer_1.1-2 fpc_2.2-3 modeltools_0.2-22
[67] pkgconfig_2.0.2 flexmix_2.3-15 nnet_7.3-12
[70] caret_6.0-84 reshape2_1.4.3 tidyselect_0.2.5
[73] rlang_0.4.0 later_0.8.0 munsell_0.5.0
[76] mlbench_2.1-1 tools_3.6.1 LaplacesDemon_16.1.1
[79] generics_0.0.2 RSQLite_2.1.2 pls_2.7-1
[82] stringr_1.4.0 mzID_1.22.0 ModelMetrics_1.2.2
[85] knitr_1.23 bit64_0.9-7 robustbase_0.93-5
[88] randomForest_4.6-14 purrr_0.3.2 dendextend_1.12.0
[91] ncdf4_1.16.1 nlme_3.1-140 mime_0.7
[94] biomaRt_2.40.3 compiler_3.6.1 e1071_1.7-2
[97] affyio_1.54.0 tibble_2.1.3 stringi_1.4.3
[100] lattice_0.20-38 Matrix_1.2-17 gbm_2.1.5
[103] vctrs_0.2.0 pillar_1.4.2 BiocManager_1.30.4
[106] MALDIquant_1.19.3 data.table_1.12.2 bitops_1.0-6
[109] httpuv_1.5.1 R6_2.4.0 pcaMethods_1.76.0
[112] affy_1.62.0 hwriter_1.3.2 promises_1.0.1
[115] gridExtra_2.3 codetools_0.2-16 MASS_7.3-51.4
[118] gtools_3.8.1 assertthat_0.2.1 withr_2.1.2
[121] diptest_0.75-7 hms_0.5.0 grid_3.6.1
[124] rpart_4.1-15 timeDate_3043.102 coda_0.19-3
[127] class_7.3-15 segmented_1.0-0 shiny_1.3.2
[130] lubridate_1.7.4 base64enc_0.1-3
.