Question

possible problem with mzR? cannot load Msnset for proteomics analysis

0

Entering edit mode

yaare • 0

@user-25000

Last seen 3.1 years ago

United States

Hi -- I am having trouble running a code that was working 6 months ago. I am getting an error message with mzR and wondering if this is the root of the problem. I tried updating mzR and bioconductor as stated in https://github.com/sneumann/mzR/wiki/mzR-Rcpp-compiler-linker-issue . I'm not sure why the code is not running correctly anymore. I'm seeing multiple errors downstream of loading the data file when using Limma that I was not seeing before. Any help is much appreciated. Thank you!

Loading required package: MSnbase
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, basename, cbind, colMeans,
    colnames, colSums, dirname, do.call, duplicated, eval, evalq,
    Filter, Find, get, grep, grepl, intersect, is.unsorted, lapply,
    lengths, Map, mapply, match, mget, order, paste, pmax, pmax.int,
    pmin, pmin.int, Position, rank, rbind, Reduce, rowMeans, rownames,
    rowSums, sapply, setdiff, sort, table, tapply, union, unique,
    unsplit, which, which.max, which.min

Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: mzR
Loading required package: Rcpp
Loading required package: S4Vectors
Loading required package: stats4

Attaching package: ‘S4Vectors’

The following object is masked from ‘package:base’:

    expand.grid

Loading required package: ProtGenerics

This is MSnbase version 2.8.3 
  Visit https://lgatto.github.io/MSnbase/ to get started.


Attaching package: ‘MSnbase’

The following object is masked from ‘package:stats’:

    smooth

The following object is masked from ‘package:base’:

    trimws

> setwd("C:/Users/xxx/Desktop/other_data")
Warning message:
In fun(libname, pkgname) :
  mzR has been built against a different Rcpp version (1.0.0)
than is installed on your system (1.0.6). This might lead to errors
when loading mzR. If you encounter such issues, please send a report,
including the output of sessionInfo() to the Bioc support forum at 
https://support.bioconductor.org/. For details see also
https://github.com/sneumann/mzR/wiki/mzR-Rcpp-compiler-linker-issue.
> 
> # ___________________________  FUNCTIONS TO LOAD THE DATA _________________________________
> 
> # The following functions load the protein tables exported from ProteomeDiscoverer 
> # and transform them into MSnbase objects.
> 
> # Load plotting packages
> library(ggplot2)
> library(RColorBrewer)
> library(reshape2)
> library(cowplot)
> 
> org.options <- options()
> 
> # create a nicer ggplot theme
> my_theme_bw <- theme_bw() +
+   theme(strip.background = element_blank(),
+         panel.border = element_rect(colour = "black"))
> 
> my_theme <- function(base_size = 12, base_family = "", flip_axis = F) {
+   flip_axis_theme <- theme(axis.text.x = element_text(angle = 45, hjust = 1))
+   
+   res <- theme_bw(base_size=base_size, base_family=base_family) %+replace% 
+     theme(strip.background = element_blank(),
+           panel.grid.major = element_blank(),
+           panel.grid.minor = element_blank(),
+           panel.border = element_blank(), 
+           axis.line = element_line(colour = "#333333")
+     )
+   
+   if (flip_axis) {
+     res <- res + flip_axis_theme
+   }
+   
+   return(res)
+ }
> 
> # add some convenience functions
> setPlotSize <- function(height, width) {
+   options(repr.plot.height=height, repr.plot.width = width)
+ }
> 
> resetPlotSize <- function() {
+   options(org.options)
+ }
> 
> showPlot <- function(plot.obj, width, height, filename = NULL) {
+   if (!is.null(filename) && dir.exists("Figures")) {
+     svg(paste0("/Users/xxx/from R/", filename, ".svg"), 
+         width=width, height=height)
+     print(plot.obj)
+     dev.off()
+   }
+   
+   setPlotSize(height, width)
+   print(plot.obj)
+ }
> 
> 
> # _______________________________  LOAD DATA ______________________________________
> 
> library(MSnbase)
> 
> loadPdResult <- function(quan.data.file) {
+   complete.data <- read.table(quan.data.file, header = T, sep = "\t", stringsAsFactors = F)
+   uses.clustering <- length(grep("-clustering-", quan.data.file)) > 0
+   unique.name <- gsub(".*/(.*)_Proteins\\.txt", "\\1", quan.data.file)[[1]]
+   
+   # only keep reliable ids
+   org.ids <- nrow(complete.data)
+   if ("Protein.FDR.Confidence.MS.Amanda.20" %in% colnames(complete.data)) {
+     # complete.data <- complete.data[complete.data$Protein.FDR.Confidence.MS.Amanda.20 == "High", ]    
+   } else  {
+     # complete.data <- complete.data[complete.data$Protein.FDR.Confidence == "High", ]
+   }
+   message("Retained ", nrow(complete.data), " / ", org.ids)
+   
+  
+   # only master protein
+   complete.data <- complete.data[complete.data$Master == "IsMasterProtein", ]
+   
+   if (length(grep("Peakjuggler", colnames(complete.data)) > 0)) {
+     quan.data <- complete.data[, grep("Peakjuggler\\.Area", colnames(complete.data))]
+     rownames(quan.data) <- complete.data[, "Accession"]
+     colnames(quan.data) <- gsub("Peakjuggler\\.Area\\.site[^.]*", unique.name, colnames(quan.data))
+   } else {
+     quan.data <- complete.data[, grep("apQuant\\.Area", colnames(complete.data))]
+     rownames(quan.data) <- complete.data[, "Accession"]
+     colnames(quan.data) <- gsub("apQuant\\.Area\\.site[^.]*", unique.name, colnames(quan.data))
+   }
+   
+   # create the metadata
+   samples <- data.frame(
+     row.names = colnames(quan.data),
+     engine = "MSAmanda",
+     dataset = gsub(".*-(site[^-.]*).*", "\\1", colnames(quan.data)),
+     run = gsub(".*\\.(6[A-E])\\..*", "\\1", colnames(quan.data)),
+     original_search_result = !uses.clustering,
+     replicate = as.numeric(gsub(".*\\.rep([1-3])", "\\1", colnames(quan.data)))
+   )
+   
+   # create the MSNSet object
+   qual <- data.frame()
+   
+   featureDataDf <- data.frame(
+     row.names = rownames(quan.data),
+     peptides.amanda = complete.data[, "Number.of.Peptides.MS.Amanda.20"],
+     pj.confidence = complete.data[, "apQuant.Confidence"],
+     n.psms = complete.data[, "Number.of.PSMs"],
+     stringsAsFactors = F)
+   
+   if ("Number.of.Peptides.Spectral.Cluster.Search" %in% colnames(complete.data)) {
+     featureDataDf$peptides.clustering <- complete.data[, "Number.of.Peptides.Spectral.Cluster.Search"]
+   } else {
+     featureDataDf$peptides.clustering <- 0
+   }
+   
+   colnames(featureDataDf) <- paste0(unique.name, colnames(featureDataDf))
+   
+   featureData <- new("AnnotatedDataFrame", data = featureDataDf)
+   
+   phenoData <- new("AnnotatedDataFrame", data = samples)
+   
+   pd.data <- new("MSnSet",
+                  qual = qual,
+                  exprs = as.matrix(quan.data),
+                  featureData = featureData,
+                  annotation = "No annotation",
+                  phenoData = phenoData)
+   
+   return(pd.data)
+ }
> 
> mergeMsnsets <- function(data1, data2) {
+   exp.cmb <- merge(exprs(data1), exprs(data2), by.x = 0, by.y = 0, all.x = T, all.y = T)
+   rownames(exp.cmb) <- exp.cmb[, 1]
+   exp.cmb[, 1] <- NULL
+   
+   fData.cmb <- merge(fData(data1), fData(data2), by.x = 0, by.y = 0, all.x = T, all.y = T)
+   rownames(fData.cmb) <- fData.cmb[, 1]
+   fData.cmb[, 1] <- NULL
+   colnames(fData.cmb)[colnames(fData.cmb) == "uniprot.x"] <- "uniprot"
+   fData.cmb[, grep("uniprot\\.", colnames(fData.cmb))] <- NULL
+   
+   pData.cmb <- rbind(pData(data1), pData(data2))
+   
+   msnset.cmb <- new("MSnSet",
+                     qual = data.frame(),
+                     exprs = as.matrix(exp.cmb),
+                     featureData = new("AnnotatedDataFrame", data = fData.cmb),
+                     annotation = "No annotation",
+                     phenoData = new("AnnotatedDataFrame", data = pData.cmb))
+   
+   return(msnset.cmb)
+ }
> 
> 
>  filenames <- list.files("other_data/", pattern = ".*_Proteins-AV\\.txt", full.names = T)
>  alvarez.data <- NA
>  for (f in filenames) {
+   message("Processing ", f, "...")
+   if (f == filenames[1]) {
+     alvarez.data <- loadPdResult(f)
+   } else {
+     tmp <- loadPdResult(f)
+     alvarez.data <- mergeMsnsets(alvarez.data, tmp)
+   }
+  }
> 
> alvarez.data$dataset <- as.data.frame("Alvarez")
Warning message:
In alvarez.data$dataset <- as.data.frame("Alvarez") :
  Coercing LHS to a list
> 
> 
> 
> 
> 
> 
> if (!requireNamespace("BiocManager", quietly = TRUE))
+     install.packages("BiocManager")
> BiocManager::install("mzR")
Bioconductor version 3.8 (BiocManager 1.30.10), R 3.5.3 (2019-03-11)
Installing package(s) 'mzR'
Warning: package ‘mzR’ is in use and will not be installed
Installation path not writeable, unable to update packages: class, cluster, KernSmooth, nlme, nnet,
  spatial, survival
Old packages: 'bit', 'bit64', 'data.table', 'e1071', 'gbm', 'gmm', 'gower', 'hexbin', 'igraph',
  'ipred', 'matrixStats', 'mclust', 'mlbench', 'pROC', 'proxy', 'RcppArmadillo', 'robustbase',
  'RSQLite', 'sampling'
Update all/some/none? [a/s/n]: 
n
> 
> sessionInfo()
R version 3.5.3 (2019-03-11)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] limma_3.38.3        cowplot_1.1.1       reshape2_1.4.4      RColorBrewer_1.1-2 
 [5] ggplot2_3.3.3       MSnbase_2.8.3       ProtGenerics_1.14.0 S4Vectors_0.20.1   
 [9] mzR_2.16.2          Rcpp_1.0.6          Biobase_2.42.0      BiocGenerics_0.28.0

loaded via a namespace (and not attached):
 [1] tidyselect_1.1.0      purrr_0.3.4           lattice_0.20-41       colorspace_2.0-0     
 [5] vctrs_0.3.6           generics_0.1.0        vsn_3.50.0            utf8_1.1.4           
 [9] XML_3.99-0.3          rlang_0.4.10          pillar_1.5.1          glue_1.4.2           
[13] withr_2.4.1           DBI_1.1.1             BiocParallel_1.16.6   affy_1.60.0          
[17] affyio_1.52.0         foreach_1.5.1         lifecycle_1.0.0       plyr_1.8.6           
[21] stringr_1.4.0         mzID_1.20.1           zlibbioc_1.28.0       munsell_0.5.0        
[25] pcaMethods_1.74.0     gtable_0.3.0          codetools_0.2-18      IRanges_2.16.0       
[29] doParallel_1.0.16     fansi_0.4.2           preprocessCore_1.44.0 scales_1.1.1         
[33] BiocManager_1.30.10   impute_1.56.0         digest_0.6.27         stringi_1.5.3        
[37] dplyr_1.0.5           ncdf4_1.17            grid_3.5.3            cli_2.3.1            
[41] tools_3.5.3           magrittr_2.0.1        tibble_3.1.0          crayon_1.4.1         
[45] pkgconfig_2.0.3       MASS_7.3-53.1         ellipsis_0.3.1        assertthat_0.2.1     
[49] iterators_1.0.13      R6_2.5.0              MALDIquant_1.19.3     compiler_3.5.3

MSnbase mzR Bioconductor limma • 854 views

ADD COMMENT • link updated 3.2 years ago by Laurent Gatto 1.6k • written 3.2 years ago by yaare • 0

score 0 · Answer 1 · 2021-03-12

0

Entering edit mode

Laurent Gatto 1.6k

@laurent-gatto-5645

Last seen 2 days ago

Belgium

The mzR message that you quote in your question isn't an error and can safely be ignored. As far as I can see, there is no other error in your script.

ADD COMMENT • link 3.2 years ago Laurent Gatto 1.6k