Question

Data conversion to ANDI-MS netCDF format

0

Entering edit mode

ray.bacala • 0

@raybacala-11995

Last seen 8.8 years ago

Hello,

I am trying to create a netCDF file compatible with ANDI-MS. The history is as follows:

I am trying to convert mass spec data from Waters .raw to netcdf in order to import into Bruker Compass/Data Analysis.

Waters has an export tool (DataBridge) that does this AND the file can be read by the Bruker software, HOWEVER, DataBridge either centroids the data or INDICATES it is centroid in the cdf output. This is unacceptable as too much information is lost for the deconvolution of intact protein masses. Waters is aware of the problem and is working on it.

I was steered towards Proteowizard to convert to mzXML/mzML then to use Bioconductor XCMS to open/save as netCDF. Unfortunately this was unsuccessful and the XCMS manual explicitly states that XCMS created cdf files are only openable by XCMS and are expressly incompatible with ANDI-MS conventions.

I have been able to use the R package RNetCDF to successfully read the waters output cdf and create a new empty cdf file with the same variables and attributes.

I can use the mzR package to read mzXML output from ProteoWizard.

The goal:

As a first step, I am attempting to (using RNetCDF):

Open a Databridge exported cdf file
Create a new CDF file with the same variables and attributes
Read the data from the cdf file into R and then put it to the variables in the new cdf file
Save the file and verify it can be opened by the target software

The problem:

I can read data from the source file into an R data element, verify that the data is numeric, but when I put it to the destination variable/file, I get this error:

Error: R character data can only be written to NC_CHAR variable

I have appended the command lines I have used. I think that my problem is with my lack of knowledge of how to handle data in R, so any help would be appreciated. Also, if anyone knows an alternate/better way to get data from an mzXML file to an ANDI-MS compatible netCDF file, please inform me. I still have to verify that the cdf file I create with these tools will be openable by the target software...

Regards,

Ray

APPENDED COMMAND LINES:

CDF1 refers to the SOURCE cdf file

test1 refers to the DESTINATION cdf file

> var.inq.nc(CDF1,"a_d_sampling_rate")
$id
[1] 1

$name
[1] "a_d_sampling_rate"

$type
[1] "NC_DOUBLE"

$ndims
[1] 1

$dimids
[1] 11

$natts
[1] 0

>dim.def.nc(test1,"scan_number",2064)

>var.def.nc(test1,"a_d_sampling_rate","NC_DOUBLE","scan_number")

> var.inq.nc(test1,"a_d_sampling_rate")
$id
[1] 1

$name
[1] "a_d_sampling_rate"

$type
[1] "NC_DOUBLE"

$ndims
[1] 1

$dimids
[1] 11

$natts
[1] 0

> data2 <-var.get.nc(CDF1,"a_d_sampling_rate",start=NA,count=NA,na.mode=0,collapse=TRUE,unpack=TRUE,rawchar=TRUE)
> str(data2)
num [1:2064(1d)] -9999 -9999 -9999 -9999 -9999 ...

> var.put.nc(test1,"a_d_sampling_rate",data1,start=NA,count=NA,na.mode=0,pack=TRUE)
Error: R character data can only be written to NC_CHAR variable

mzr RNetCDF bioconductor • 3.5k views

ADD COMMENT • link 8.8 years ago ray.bacala • 0

0

Entering edit mode

Hello,

Sorry, I also read this in the manual for RNetCDF but am unsure how to apply it:

However, text represented by R types raw and character can only be written to NetCDF type NC_CHAR. The dimensions of R raw variables map directly to NetCDF dimensions, but character variables have an implied dimension corresponding to the string length. This implied dimension must be defined explicitly as the fastest-varying dimension of the NC_CHAR variable, and it must be included as the first element of arguments start and count taken by this function.

ADD REPLY • link 8.8 years ago ray.bacala • 0

score 0 · Answer 1 · 2017-01-25

Hello,

It seems I have solved my problem regarding parsing the data from File A to File B. Unfortunately the target software will not load it and does not think it is a netCDF (AND-MS file)...

The source file is 1,615,541 KB in size and the target file (essentially a R translated clone) is 1,615,540 KB so it would appear the right volume of data was copied over and everything looks good when I sift through var by var...

I am using R x64 3.2.2 along with RNetCDF package v76 and BioConductor package v3.3.

I am hoping someone out there can help me understand whether this is because I did not set a variable to obtain the backwards compatibility ( netCDF developed significantly after ANDI-MS was not longer actively supported) or if this something intrinsic to the R environment and packages and cannot be fixed this way...

Ray