Question: Intensity discrepancy between raw data and XCMS output
0
gravatar for cpataki
8 months ago by
cpataki0
cpataki0 wrote:

I am using xcms for metabolomic work, and I'm coming across a bug that I can not seem to track down. I've scoured the online forums and am trying a grid search of various parameters, but I'd love to get your input on this problem. 

My Goal:

To identify features in an experiment with various samples that are independently injected into the mass spec and correct for retention time. 

The commands I am using (in the order that I am using them):

Reading in the data: readMSData

Identify chromatographic peaks: findChromPeaks() with a CentWaveParam object

Retention time correction: adjustRtime() with a ObiWarpParam object

Grouping peaks: groupChromPeaks() with a PeakDensityParam object

Output: featureValues() to list all features with each column representing the intensity of the features in one sample. 

My Problem:

All the features identified in the one sample have the very similar intensities (+/- 10%) in the featureValues() output. said differently: Each feature has different intensities in all samples, but all features in the same sample differ by approximately 10%. 

This trend is not consistent with the raw traces of the data. I see much more variability between peak intensities in each sample. 

I've been reading the xcms documentation to track down what could be changing/controlling peak intensities, but I can not seem to find any parameter in the commands I'm using that could affect peak intensities. I'm very much working in the dark here. Any insight on your part would be HIGHLY appreciated. 

xcms intensity xcmsset • 188 views
ADD COMMENTlink written 8 months ago by cpataki0

Without having a good understanding of your experimental setup and without looking into the actual data or seeing all commands with all parameters that you used it is pretty hard to tell what happens or where the problem is. Some suggestions:

1) do never use the default parameters, especially not for the peak detection part. This very much depends on your LCMS setup (i.e. how broad the peaks are etc.). Have eventually a look at https://jotsetung.github.io/metabolomics2018/xcms-preprocessing.html for some details how to determine data set specific settings.

2) Check again the parameters that you used, especially make sure that you used featureValues with the parameter value = "into". By default the function returns just the index of the corresponding peak in the peak matrix but not the integrated peak intensity (which you get with "into").

3) Check how many peaks you have and how many features. i.e. do table(chromPeaks(your_data)[, "sample"]) to get an overview how many peaks were detected per sample and compare that number to nrow(featureValues(your_data)). Eventually only few peaks were grouped into a feature. Alternatively it could be that multiple peaks in each sample were added to the feature.

4) It is always good to have a look at how known compounds or internal standards look like, i.e. use the chromatogram function to extract the ion chromatogram for some and plot them, eventually highlighting the identified chromatographic peaks in that region using the highlightChromPeaks function (check the corresponding help page for details). This can help you understand what might have happened and you get a better feeling for your data.

5) If you think this is indeed a bug in xcms you could also make an issue at https://github.com/sneumann/xcms

For future posts it would be very helpful to post also the output of the sessionInfo() and to show the actual commands that you used (with all parameters).

ADD REPLYlink modified 8 months ago • written 8 months ago by Johannes Rainer1.4k

Thank you so much, Johannes. This was very helpful. I do think I might be implementing my code sub-optimally, which is producing these weird results. Looking at the raw data, I don't think xcms is changing the intensity of the features, I think I'm not identifying the peaks with the right parameters or I'm not grouping the peaks properly, which is why I'm identifying VERY similar features through out the entirety of the experiment (but concentrated within a range of retention times).

I'd love to give you a little more detail about my code and maybe you can tell me if I'm 1) not grouping peaks enough times (I see that some people group peaks 2-3 times in their code), or 2) I'm using suboptimal parameters for identifying peaks. Please note that I am using a mixture of parameters suggested by xcms-online for my experimental set-up and parameters determined by using "IPO: a tool for automated optimization of XCMS parameters" (https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-015-0562-8). I'm limited by the character number, so I will add another comment

ADD REPLYlink written 8 months ago by cpataki0

code (note that I am running this through the command line and not through RStudio):

library(xcms)

fols <- commandArgs(trailingOnly=TRUE)

setwd(fols[1])

#defines the .mzxml files to be processed and which samples are technical replicates 

filegroups = read.csv("filegroups.csv", row.names = 1, stringsAsFactors = F)

mzxml_pos <- filegroups$File

fileaccess <- readMSData(mzxml_pos, pdata = NULL, verbose = isMSnbaseVerbose(), msLevel. = 1, centroided. = T, smoothed. = NA,

                         mode = "onDisk")

cparam <- CentWaveParam(ppm = 10, peakwidth = c(5,30),  snthresh = 3., mzCenterFun = "wMean", firstBaselineCheck = FALSE, prefilter = c(3,100), mzdiff = -0.01, noise = 0, fitgauss = FALSE, verboseColumns = FALSE, integrate=1) 

bparam <- SnowParam(workers = 16)

xset <- findChromPeaks(fileaccess, cparam, BPPARAM = bparam, return.type = "XCMSnExp")

fgroups <- as.integer(as.factor(filegroups$Group))

gparam <- PeakDensityParam(sampleGroups = fgroups, minFraction = .5, bw = 5, binSize = 0.025, maxFeatures = 100, minSamples = 1)

rparam <- ObiwarpParam(binSize = 1,  #suggested by xcms online

                         response = 1, # not specified in xcms online

                         distFun = "cor_opt", # not specified in xcms online -- using default 

                         gapInit = 0.3,                 # not specified in xcms online - default 

                         gapExtend = 2.4,     # not specified in xcms online

                         factorDiag = 2, # not specified in xcms online  - penalty/weight for diagonal moves

                         factorGap = 1, # not specified in xcms online  - penalty/weigh for gap moves in alignment

                         localAlignment = F, # not specified in xcms online  - Default = False

                         initPenalty = 0) # not specified in xcms online  - penalty for initiating alignment

xset <- adjustRtime(xset, param = rparam)

xset <- groupChromPeaks(xset, gparam)

xset2 <- featureValues(xset, method = "maxint", value="index", intensity = "into", filled = TRUE)

xset3 <- featureDefinitions(xset)

write.csv(xset2, file = "xcms_output_features_members.csv")

write.csv(xset3, file = "xcms_output_features.csv")

ADD REPLYlink written 8 months ago by cpataki0

Thanks for providing the commands you're using for the analysis. I can not tell if the parameters are correct for your data but using IPO is definitely a good starting point. However, as mentioned in my previous reply, you should use featureValues with value = "into" and not value = "index". As it is also stated in the help of the featureValues function (which you can see by typing ?featureValues in the R terminal), value = "index" returns the index of the peaks in the peak matrix but not the integrated peak area that you want to extract. These can be returned with value = "into". I'll update the help page of the function to make this more explicit.

ADD REPLYlink written 8 months ago by Johannes Rainer1.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 196 users visited in the last hour