Question

New to mass spec data analysis

0

Entering edit mode

Dan_reichert • 0

@dan_reichert-12756

Last seen 7.0 years ago

I am new to Mass Spec data analysis and am trying to find a good start to the analysis of a fairly large data set. I would like to use XCMS as I think it is my best option at this point. I have looked at and used XCMSonline but would like more freedom than that allows. Does anybody know of some links to good, entry level, data analysis tutorials? Or maybe you have some script that you commonly use for untargeted metabolomics analysis you are willing to share, I can read and use most of xcms R code I have encountered but am having some trouble putting it all together at once.

Thank you

data analysis xcms Tutorial • 1.5k views

ADD COMMENT • link updated 7.0 years ago by Gordon Smyth 50k • written 7.0 years ago by Dan_reichert • 0

score 0 · Answer 1 · 2017-04-04

If you're starting with xcms I suggest you use the new xcms user interface which will be officially available with the upcoming Bioconductor release (version 3.5, will be released end of April). If you want to use it already now, you can try to get it from github (https://github.com/sneumann/xcms).

library(devtools)
install_github("sneumann/xcms")

While usually not suggested, you can install developmental versions directly from github (safe here, since xcms is pretty stable and we don't plan to change anything prior release).

One big advantage of this version of xcms is that the help pages have been extended. All parameters and functions are described. A starting point might be the "new_functionality" vignette. Play with that and try to understand your data and look at the raw signal you've got

Finding the correct settings for e.g. the chromatographic peak detection is however tricky and really depends on your setting. So there won't be any default settings that work for all methods. The best is to define them based on what you expect (e.g. what retention time width your peaks will most likely have), run the peak detection and investigate (ideally if you've got internal controls) how the peaks look like, i.e. whether the peak detection succeeded in identifying the peaks (thinking it over, I might add an example to the "new_functionality" vignette). A tool that helps to determine good initial settings for your data is the IPO Bioconductor package.

Also, since your data set is large, I suggest that you first start investigating using a subset, ideally, if you have, pooled samples (also, run IPO on these to determine the best initial settings for peak detection).

Hope this helps a little.

cheers, jo