Question: How to get from xcmsRaw to a xcmsSet
0
4.5 years ago by
Johannes Rainer1.5k
Italy
Johannes Rainer1.5k wrote:

dear all!

I just started using the xcms package for metabolomics data analysis. I was however puzzled, or better said, did not understand the logic behind it. From intuition I was expecting the xcmsRaw class to represent the raw data and the xcmsSet the preprocessed data. While indeed that seems to be the case, there seems to be no way to get from a (list) of xcmsRaw classes to a xcmsSet object. I was expecting that a function like findPeaks would somehow return a xcmsSet but that was not the case. I actually would like (as I was used from microarray data and sequencing) to first look at the raw data, perform some quality controls and then process that raw data into the final data (which I thought might be the xcmsSet).

Is there a simple way to get from the raw data to the peak list data? I find it quite cumbersome to first load the raw data, do quality controls, and than basically re-load and process the raw data again (using the xcmsSet function) to generate the xcmsSet object.

thanks in advance for any help, suggestions etc

jo

metabolomics xcms • 1.3k views
modified 4.5 years ago by Thomas Lin Pedersen70 • written 4.5 years ago by Johannes Rainer1.5k
Answer: How to get from xcmsRaw to a xcmsSet
1
4.5 years ago by
Copenhagen, Denmark
Thomas Lin Pedersen70 wrote:

Hi Johannes

What you're describing is the logic workflow, but not something that is readily possible in xcms. I guess one of the reasons is memory constrained - having all raw data read into memory at once is not really feasible in many scenarios.

A workaround is to use the write.mzdata to save you're changes to the raw data into new files, and then input these files into the xcmsSet function.

My main gripe with xcms is exactly the lack of direct link between raw and derived data, which is why I started working on MSsary. That project is unfortunately on hold at the moment but will be taken up as soon as possible.

I hope this helps you with your current work...

Your package sounds promising; any time line you expect it to be more-or-less usable? I was about to implement some stuff for the xcms package, but eventually I should do that for MSsary ;)

Is the data really that big? I wonder if it shouldn't be possible to reduce the size of the data using special data types like Rle or alike...

With regards to data size, it really depends on your field - Proteomics, where I come from, are using instruments with ridiculous resolution and therefore huge files, metabolomic studies sometimes use instruments with unit resolution resulting in much smaller files. Anyway it is not so much how you encode it that is the culprit of the data size - It’s just the immense amount of raw numbers…

What MSsary does is to write changes to the underlying raw data into an sqlite database and automatically figures out where to look for it. Thus it never really have everything in memory at the same time, only the relevant pieces. The idea is though that the user shouldn’t really care about all these underlying details… As for the timeline of the package, I’m afraid my development have been stifled a bit in changes to the scope of my PhD, so It’s is currently on hold. I’m quite invested in it though, so it will be taken up again (and contributions are very welcome), but until then you would be better of sticking to xcms.