Question

mzTab-M in XCMS

0

Entering edit mode

Abrianna • 0

@455f5aa8

Last seen 10 months ago

United States

Hello,

I have a dataset of ~400 files (~300 samples, rest QCs and blanks) that I am trying to analyze. In my previous lab, we developed a pipeline using XCMS and python to align and group raw features into compounds and match MSMS fragmentation patterns to GNPS spectral libraries (github with code here). However, it seems that the Feature-Based Molecular Networking (FBMN) offered by GNPS can now tackle many of these steps more efficiently.

Option B of FBMN requires an mzTab-M file and the associated mzML files. I would like to export my processed data (via XCMS) in the mzTab-M format, but I'm not sure how to go about this, particularly with the current level of organization and analysis of the mzML files I am processing. I am iterating hierarchically (site > clone) over the files such that all files from one clone are analyzed together (with QCs and blanks processed multiple times with each associated sample). See example code (only for MS Level 1) below:

# Vector with all sites
# BM = Butte, Montana
# GM = Glacier, Montana
# PI = Pocatello, ID
# SL = Salt Lake, UT
# WU = Widtsoe, UT
# FA = Flagstaff, AZ
# TA = Tucson, AZ
sites <- c("BM","GM","PI","SL","WU","FA","TA")

####### Loop through directories containing .mzML files. Samples and their associated blanks should be in separate folders named "Sample" and "Blank". Depending on how files are arranged, you may or may not need to use the outer loop for site #######
for(i in 1:length(sites)) {
  # create list of clones based on the folders present for each site in the mzml directory
  clones <- list.dirs(paste(mzml_location,sites[i],sep=""), recursive=FALSE, full.names=FALSE)

for(j in 1:length(clones)) {
  print(clones[j])

  # make vector with relative filepaths (used by XCMS to find files)
  files_1 <- list.files(paste(mzml_location, sites[i], "/",clones[j],sep=""),
                        recursive = TRUE, full.names = TRUE)

  # make sure you are only using mzml files, in case there are files of other types in your directory
  files <- files_1[endsWith(files_1, ".mzML")]

  # make vector with sample groups (used by XCMS to group files into blanks and samples)
  s_groups <- sapply(files, function(x) {
    temp = unlist(strsplit(x, split = "/"))
    return(temp[length(temp)-1])
    })
  is.na(s_groups) <- 0

  # make vector with sample names (used by XCMS in peak list output)
  s_names <- sapply(files, function(x) {
    temp = unlist(strsplit(x, split = "/"))
    return(sub(temp[length(temp)], pattern = ".mzML", replacement = "", fixed = TRUE))
    })

  ##### set parameters for XCMS functions #####
  cwparam <- CentWaveParam(ppm=15, peakwidth=c(4,36), snthresh=2, prefilter=c(3,500))

  dparam1 <- PeakDensityParam(sampleGroups = s_groups, bw=10, binSize=0.05, minSamples=1, minFraction = 0.01)

  oparam <- ObiwarpParam(binSize=0.5)

  dparam2 <- PeakDensityParam(sampleGroups = s_groups, bw = 3, binSize = 0.025, minSamples = 1, minFraction = 0.01)

 fpparam <- FillChromPeaksParam(expandMz = 0.25, expandRt = 0.5)

  ## Read in mzML files
  raw_data <- readMSData(files, msLevel. = 1, mode="onDisk")

  ## Run XCMS functions with parameters set above
  xcmsexp <- findChromPeaks(object = raw_data, param = cwparam)
  xcmsexp <- groupChromPeaks(object = xcmsexp, param = dparam1)
  xcmsexp <- adjustRtime(object = xcmsexp, param = oparam)
  xcmsexp <- groupChromPeaks(object = xcmsexp, param = dparam2)
  xcmsexp <- fillChromPeaks(xcmsexp, param = fpparam)

  ## convert from OnDiskMSnExp object to xcmsset object so that CAMERA can be used
  ## to group adducts and isotopes
  xset <- as(xcmsexp, "xcmsSet")

  # set sample class to match sample grouping
  sampclass(xset) <- s_groups

  ## peak grouping and annotation
  xset1 <- xsAnnotate(xs=xset, polarity="positive")

  xset2 <- groupFWHM(xset1, perfwhm=0.75)

  xset3 <- findIsotopes(xset2, ppm=20, mzabs=0.015,intval="intb")

  xset4 <- groupCorr(xset3, cor_eic_th=0.5, pval=0.05, graphMethod="lpc", calcIso = TRUE, calcCiS = TRUE, calcCaS = ifelse(
    sum(sampclass(xset) == "Sample") > 3, TRUE, FALSE))

  xsetFA <- findAdducts(xset4, polarity="positive")

  xset5 <- getPeaklist(xsetFA)

  #### END OF XCMS AND CAMERA CODE ####

# ...create output table, remove internal standard and other unwanted data
  }
}

The current workflow avoids memory issues in R, but I am interested in alternative ideas for processing all files simultaneously using a supercomputer (which I have access to). My specific questions are as follows:

Could anyone provide example code or an explanation for how to create an mzTab-M file through the iterative process outlined above?
What other, non-iterative options are there for processing this many files at once (with associated blanks and QCs accounted for)?
Can mzTab-M files be outputted using either method (iterative or non)?

xcms mzTab-M iterativehierarchicalprocessing FBMN • 402 views

ADD COMMENT • link 10 months ago Abrianna • 0