Search
Question: qcmetrics package and longitudinal qc data
0
11 months ago by
TURKEY/Mugla/Mugla Sitki Kocman University
Eralp Dogu0 wrote:

I am developing an R package (MSstatsQC) for longitudinal assessment of QC performance. I have been trying to convert msfiles using qcmetrics objects to create a csv file that is compatible with my input format so that our package will be operable for msnbase users. My data format will be "time, peptide sequence, annotations, metric1, metric2, ..." Is there any way to pull out all those info. I could create a csv file and reach out to retention time and some other metrics but wasn't successful on other variables.

modified 11 months ago • written 11 months ago by Eralp Dogu0
0
11 months ago by
Laurent Gatto980
United Kingdom
Laurent Gatto980 wrote:

Your question is not clear to me. Could you clarify what you are trying to do and/or answer the following questions.

• Are you replicating the example for raw MS data shown in the qcmetrics vignette, or are you creating you own QcMetric objects?
• What is your input format, and what operability with MSnbase are you referring to?
• Are you trying to use the qcmetrics package to generate quality reports, or are you interested in creating a table with these variables to analyse independently?

0
11 months ago by
TURKEY/Mugla/Mugla Sitki Kocman University
Eralp Dogu0 wrote:

Hi Laurent,

Thanks a lot for your interest! We are developing an R package and a Shiny app for monitoring longitudinal QC data. We are creating control charts and other statistical methods to analyze instrument performance over time. We were able to generate csv files through Skyline that includes Acquired Time, Peptide seq, Annotations and any corresponding metrics of interest such that retention time and peak area for this particular peptide and time. Details are available via our paper and the shiny app. Trying the shiny app with the sample data will give a better idea about the input data we need.

1. Initially, I tried to replicate raw MS data in qcmetrics package. But the ultimate goal is to automatically create QCMetrics objects to be used in a converter function.

2. We want to create a converter for users that are using msnbase and/or qcmetrics packages. The converter will generate a csv file using qcmetrics objects. I think compatibility with qcmetrics package is a better approach but please let me know if you have any better ideas. The converter will be quite similar to what MSstats team did previously to convert but this time specifically for QC data. https://github.com/MeenaChoi/MSstats/blob/master/R/TransformMSnSet.R

3. Rather than quality reports, I need a data table (csv) including each time point and peptide per metric.

Here is my quick code chuck as the initial converter...

#' A function to convert MSnbase files to MSstatsQC format
#'
#' @param msfile data file to be converted
#' @return A data frame that can be used with MSstatsQC
#' @keywords MSnbase, qcmetrics, input
#' @export
#' @import MSnbase
#' @import qcmetrics
#' @examples
#' dontrun{MSstatsQCdata<- MSnbaseToMSstatsQC(msfile)}

MSnbaseToMSstatsQC  <-  function(msfile) {

data <- readMSData(msfile, verbose = FALSE)

if (!inherits(data, "MSnExp")) {
stop("Only MSnSet class can be converted to input format for MSstats.")
}

qc <- QcMetric(name = "NULL")

#Examples of metrics that can be monotired ###############################
RetentionTime <- rtime(data)
PrecursorIntensity <- precursorIntensity(data)
##########################################################################
qcdata(qc, "RetentionTime") <- RetentionTime
qcdata(qc, "PrecursorIntensity") <- PrecursorIntensity

MSstatsQCdata <- c()
MSstatsQCdata <- data.frame(setNames(lapply(ls(qc@qcdata), get, envir=qc@qcdata), ls(qc@qcdata)))
MSstatsQCdata <- data.frame(AcquiredTime=seq_along(RetentionTime), Precursor=NA, Annotations=NA, MSstatsQCdata)

## if there are any missing variable name, warn it and stop
check.name <- c("AcquiredTime", "Precursor", "Annotations", "RetentionTime", "PrecursorIntensity")

diff.name <- setdiff(check.name, colnames(MSstatsQCdata))
if (length(diff.name) > 0){
stop(paste("Please check the variable name. The provided variable name", paste(diff.name, collapse=","), "is not present in the data set.", sep=" "))
}
return(MSstatsQCdata)
}

Eralp

Hi Eralp,

The function above looks sensible. I would suggest you use readMSData2 as it will be much faster (it won't read the raw data into memory), but you'll need to make sure you also define the MS level that you wish to read (by default, it reads all levels). Also, you can probably drop the c() initialisation.

If you have your code in a repo, it's would probably be easier to discuss the code there, rather than here, on the forum.

Thanks Laurent! We can discuss the code here.

Back to my original question, do you know a way to pull out peptide sequences?

Peptide sequences are not available when one only calls readMSData[2], as this function only accesses data from the raw files. But it is possible to add the identifiction data to the raw MSnExp objects using addIdentificationData. Then, the identification results, including the peptide sequences, will be available in the feature data.