Question

How to get from xcmsRaw to a xcmsSet

0

Entering edit mode

Johannes Rainer ★ 2.1k

@johannes-rainer-6987

Last seen 8 months ago

Italy

dear all!

I just started using the xcms package for metabolomics data analysis. I was however puzzled, or better said, did not understand the logic behind it. From intuition I was expecting the xcmsRaw class to represent the raw data and the xcmsSet the preprocessed data. While indeed that seems to be the case, there seems to be no way to get from a (list) of xcmsRaw classes to a xcmsSet object. I was expecting that a function like findPeaks would somehow return a xcmsSet but that was not the case. I actually would like (as I was used from microarray data and sequencing) to first look at the raw data, perform some quality controls and then process that raw data into the final data (which I thought might be the xcmsSet).

Is there a simple way to get from the raw data to the peak list data? I find it quite cumbersome to first load the raw data, do quality controls, and than basically re-load and process the raw data again (using the xcmsSet function) to generate the xcmsSet object.

thanks in advance for any help, suggestions etc

jo

xcms metabolomics • 2.6k views

ADD COMMENT • link updated 10.2 years ago by Thomas Lin Pedersen ▴ 70 • written 10.2 years ago by Johannes Rainer ★ 2.1k

score 1 · Accepted Answer · 2015-04-21

1

Entering edit mode

Thomas Lin Pedersen ▴ 70

@thomas-lin-pedersen-5941

Last seen 9.5 years ago

Copenhagen, Denmark

Hi Johannes

What you're describing is the logic workflow, but not something that is readily possible in xcms. I guess one of the reasons is memory constrained - having all raw data read into memory at once is not really feasible in many scenarios.

A workaround is to use the write.mzdata to save you're changes to the raw data into new files, and then input these files into the xcmsSet function.

My main gripe with xcms is exactly the lack of direct link between raw and derived data, which is why I started working on MSsary. That project is unfortunately on hold at the moment but will be taken up as soon as possible.

I hope this helps you with your current work...

ADD COMMENT • link 10.2 years ago Thomas Lin Pedersen ▴ 70

0

Entering edit mode

Your package sounds promising; any time line you expect it to be more-or-less usable? I was about to implement some stuff for the xcms package, but eventually I should do that for MSsary ;)

Is the data really that big? I wonder if it shouldn't be possible to reduce the size of the data using special data types like Rle or alike...

ADD REPLY • link 10.2 years ago Johannes Rainer ★ 2.1k

0

Entering edit mode

With regards to data size, it really depends on your field - Proteomics, where I come from, are using instruments with ridiculous resolution and therefore huge files, metabolomic studies sometimes use instruments with unit resolution resulting in much smaller files. Anyway it is not so much how you encode it that is the culprit of the data size - It’s just the immense amount of raw numbers…

What MSsary does is to write changes to the underlying raw data into an sqlite database and automatically figures out where to look for it. Thus it never really have everything in memory at the same time, only the relevant pieces. The idea is though that the user shouldn’t really care about all these underlying details… As for the timeline of the package, I’m afraid my development have been stifled a bit in changes to the scope of my PhD, so It’s is currently on hold. I’m quite invested in it though, so it will be taken up again (and contributions are very welcome), but until then you would be better of sticking to xcms.

ADD REPLY • link 10.2 years ago Thomas Lin Pedersen ▴ 70

0

Entering edit mode

Thanks for the explanation. So I'll stick for now with xcms and will watch the MSsary on github.

ADD REPLY • link 10.2 years ago Johannes Rainer ★ 2.1k