Hi Laurent,
Thanks a lot for your interest! We are developing an R package and a Shiny app for monitoring longitudinal QC data. We are creating control charts and other statistical methods to analyze instrument performance over time. We were able to generate csv files through Skyline that includes Acquired Time, Peptide seq, Annotations and any corresponding metrics of interest such that retention time and peak area for this particular peptide and time. Details are available via our paper and the shiny app. Trying the shiny app with the sample data will give a better idea about the input data we need.
1. Initially, I tried to replicate raw MS data in qcmetrics package. But the ultimate goal is to automatically create QCMetrics objects to be used in a converter function.
2. We want to create a converter for users that are using msnbase and/or qcmetrics packages. The converter will generate a csv file using qcmetrics objects. I think compatibility with qcmetrics package is a better approach but please let me know if you have any better ideas. The converter will be quite similar to what MSstats team did previously to convert but this time specifically for QC data. https://github.com/MeenaChoi/MSstats/blob/master/R/TransformMSnSet.R
3. Rather than quality reports, I need a data table (csv) including each time point and peptide per metric.
Here is my quick code chuck as the initial converter...
#' A function to convert MSnbase files to MSstatsQC format
#'
#' @param msfile data file to be converted
#' @return A data frame that can be used with MSstatsQC
#' @keywords MSnbase, qcmetrics, input
#' @export
#' @import MSnbase
#' @import qcmetrics
#' @examples
#' dontrun{MSstatsQCdata<- MSnbaseToMSstatsQC(msfile)}
MSnbaseToMSstatsQC <- function(msfile) {
data <- readMSData(msfile, verbose = FALSE)
if (!inherits(data, "MSnExp")) {
stop("Only MSnSet class can be converted to input format for MSstats.")
}
qc <- QcMetric(name = "NULL")
#Examples of metrics that can be monotired ###############################
RetentionTime <- rtime(data)
PrecursorIntensity <- precursorIntensity(data)
##########################################################################
qcdata(qc, "RetentionTime") <- RetentionTime
qcdata(qc, "PrecursorIntensity") <- PrecursorIntensity
MSstatsQCdata <- c()
MSstatsQCdata <- data.frame(setNames(lapply(ls(qc@qcdata), get, envir=qc@qcdata), ls(qc@qcdata)))
MSstatsQCdata <- data.frame(AcquiredTime=seq_along(RetentionTime), Precursor=NA, Annotations=NA, MSstatsQCdata)
## if there are any missing variable name, warn it and stop
check.name <- c("AcquiredTime", "Precursor", "Annotations", "RetentionTime", "PrecursorIntensity")
diff.name <- setdiff(check.name, colnames(MSstatsQCdata))
if (length(diff.name) > 0){
stop(paste("Please check the variable name. The provided variable name", paste(diff.name, collapse=","), "is not present in the data set.", sep=" "))
}
return(MSstatsQCdata)
}
Many thanks in advance,
Eralp
Hi Eralp,
The function above looks sensible. I would suggest you use
readMSData2
as it will be much faster (it won't read the raw data into memory), but you'll need to make sure you also define the MS level that you wish to read (by default, it reads all levels). Also, you can probably drop thec()
initialisation.If you have your code in a repo, it's would probably be easier to discuss the code there, rather than here, on the forum.
Thanks Laurent! We can discuss the code here.
https://github.com/eralpdogu/MSstatsQC/blob/master/R/MSnbaseToMSstatsQC.R
Back to my original question, do you know a way to pull out peptide sequences?
Peptide sequences are not available when one only calls
readMSData[2]
, as this function only accesses data from the raw files. But it is possible to add the identifiction data to the rawMSnExp
objects usingaddIdentificationData
. Then, the identification results, including the peptide sequences, will be available in the feature data.I will modify the converter and let you know. Is it possible to get info for other metrics such as peak area or FWHM?