dear all!
I was trying various things with metabolomics data in xcms
. In particular, I wanted to look at the total ion count (TIC), which, following http://www.ncbi.nlm.nih.gov/pubmed/25078324 is the "sum of all signals across all m/z" for a given retention time RT. A TIC can be generated using the plotTIC
function in xcms
, but, in order to get a feeling of the data, I wanted to generate the plot on the data myself. So I extracted the raw data matrix, summed up the intensity values per time point but to my surprise the plots look different, with the plotTIC
resulting in higher intensities.
The code to generate the plots was:
> library(xcms) > cdfpath <- system.file("cdf", package="faahKO") > cdffiles <- list.files(cdfpath, recursive=TRUE, full.names=TRUE) > xraw <- xcmsRaw(cdffiles[1], profmethod="bin", profstep=0.1) > ## get the raw matrix and sum up the intensities per time point > rawmat <- rawMat(xraw) > aggr <- aggregate(rawmat, by=list(rawmat[, 1]), FUN=sum) > ## plot the TIC > plotTIC(xraw) > points(aggr[, 1], aggr[, 4], col="red", type="l")
I can to some extend understand that plotTIC
and plotChrom
generate different plots, as plotTIC
bases on the raw data and plotChrom
on the profile data, but it puzzles me why there is a difference between the plotTIC
and the sum of of intensities as I calculated them.
I think I am missing here something...
any help is very much appreciated
my session info:
> sessionInfo() R version 3.2.0 (2015-04-16) Platform: x86_64-apple-darwin14.3.0/x86_64 (64-bit) Running under: OS X 10.10.3 (Yosemite) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] grid parallel stats graphics grDevices utils datasets [8] methods base other attached packages: [1] lattice_0.20-31 xcms_1.45.0 ProtGenerics_1.0.0 [4] mzR_2.2.0 Rcpp_0.11.5 ascii_2.1 [7] RColorBrewer_1.1-2 Biobase_2.28.0 BiocGenerics_0.14.0 loaded via a namespace (and not attached): [1] compiler_3.2.0 tools_3.2.0 codetools_0.2-11
maybe I should have added some more information. Actually, I first came across this on one of my own files which is a mzML file in centroid mode; also the test file used above is in centroid mode. I just briefly looked at the code in the xcms package (actually the c-code) and as far as I understood there, it is also just summing the signal. it's puzzling...
Just had a look at the source and it is indeed reading hard coded values if they are present - you can check if these values are there by looking at object@tic. If your object have content in the tic slot then thats the answer to your question...
Out of curiosity - what c-code? plotTIC is pure R and all internal parsing of raw data is handled by mzR...
the plotTIC calls rawEIC which calls using .Call the getEIC c function in mzROI.c. I'll try to find time next week to investigate that further. It really bugs me when I don't understand what's going on...
But only if the tic slot of Tour xcmsRaw object is empty, which it shouldn't be in case of mzML/mzXML (again - never used netCDF so wouldn't know about that) - have you checked the content of the tic slot?
Yes, you're right. That's indeed the case, the @tic slot contains values that are pretty different from the sum over all intensities per scan that I get on the rawMat matrix. I guess that has something to do with the centroidizing? As far as I understand the values for the @tic slot are extracted from the scan header parameters in the mzML file.
I'm just wondering now what is more representative... the total ion current reported in the mzML file or the sum of all intensities per scan across all m/z values calculated for the actual raw data that is available in the xcmsRaw object.
As I said, yes, these values represent the true unprocessed total ion count as reported by the instruments, thus before any processing of the spectra is done. As to what to use it depends - If you've done serious changes to the spectra I would recalculate. Otherwise the differences between the two are mostly in scale, and the general pattern should be the same - In that case it's simply faster to extract the hard-coded values...
Nice; now I feel quite comfortable with the data. Thanks a lot for your explanations and thoughts.