Question: mZR: access "parentFile" tag
0
gravatar for cclark42
19 months ago by
cclark4210
cclark4210 wrote:

Is it possible to access the "parentFile" tag (as seen below) for an mzXML file with mZR?

l version="1.0" encoding="ISO-8859-1"?>
<mzXML xmlns="&lt;a href=" http:="" sashimi.sourceforge.net="" schema_revision="" mzXML_3.2"="" rel="nofollow">http://sashimi.sourceforge.net/schema_revision/mzXML_3.2"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xsi:schemaLocation="http://sashimi.sourceforge.net/schema_revision/mzXML_3.2 http://sashimi.sourceforge.net/schema_revision/mzXML_3.2/mzXML_idx_3.2.xsd">
  <msRun scanCount="6" startTime="PT0S" endTime="PT0S">
    <parentFile fileName="file://C:\Users\chase\Documents\MALDI\9-7-17\sm\0_A18\1\1SRef/fid" fileType="RAWData" fileSha1="5965ae8e78bef821459f64d903b498068c092679"/>
mzr mzr/ msnbase • 352 views
ADD COMMENTlink modified 19 months ago by Laurent Gatto1.2k • written 19 months ago by cclark4210
Answer: mZR: access "parentFile" tag
0
gravatar for Laurent Gatto
19 months ago by
Laurent Gatto1.2k
Belgium
Laurent Gatto1.2k wrote:

No, I don't think so. In addition, it looks like that tag is different in mzML files, which we tend to develop for now.

This is something that we could consider adding and shouldn't be too difficult using xml2 - PR on github welcome.

 

ADD COMMENTlink written 19 months ago by Laurent Gatto1.2k

Thanks Dr. Gatto... I figured as much.  I have a Shiny app that, after converting raw -> mzXML will allow creating spectral databases. So I'm interfacing an RSQLite database component now but lose provenance of the original raw files. (Maybe there's a way to directly serialize the mzXMl directly into a SQL blob from R?)

As for a PR- if you could point to the appropriate GH file(s) I can take a look if it's something I could help with.

Thanks!

RSQLite Code:

Access previously created RSQLite database

newDatabase <- DBI::dbConnect(RSQLite::SQLite(), paste0("SpectraLibrary/", "example", ".sqlite"))
db          <- dplyr::tbl(newDatabase, "IDBacDatabase")

db %>% dplyr::select(c(1,11:19)) %>% dplyr::select(-c(6,7)) #remove user-input-metadata columns
## # Source:   lazy query [?? x 8]
## # Database: sqlite 3.22.0
## #   [C:\Users\chase\Documents\GitHub\IDBac_App\inst\app\SpectraLibrary\example.sqlite]
##    Strain_ID    manufacturer  model  ionisation  analyzer Small_Molecule_~
##    <chr>        <chr>         <chr>  <chr>       <chr>               <int>
##  1 114A-2       Bruker Dalto~ Bruke~ matrix-ass~ time-of~               NA
## # ... with more rows, and 2 more variables: mzXML <blob>, rds <blob>
db %>% dplyr::select(c(1,18,19))
## # Source:   lazy query [?? x 3]
## # Database: sqlite 3.22.0
## #   [C:\Users\chase\Documents\GitHub\IDBac_App\inst\app\SpectraLibrary\example.sqlite]
##    Strain_ID                                               mzXML       rds
##    <chr>                                                  <blob>    <blob>
##  1 114A-2                                         <raw  7.14 MB> <raw 1 B>
## # ... with more rows

Display what used to be a mzXML file (that held 6 spectra)

oneSpec <- db %>% dplyr::filter(Strain_ID == "114A-2") %>% dplyr::select(mzXML) %>% dplyr::collect()
mzAccess <- lapply(oneSpec[[1]], function(x) memDecompress(x,type="gzip"))
mzAccess <- sapply(mzAccess, function(x) unserialize(x, NULL))

lapply(q3[[1]], function(x) dplyr::as_tibble(data.frame(x)))
## [[1]]
## # A tibble: 42,154 x 2
##       X1    X2
##    <dbl> <dbl>
##  1 1920. 36161
##  2 1920. 36226
##  3 1920. 36373
##  4 1921. 36579
##  5 1921. 36741
##  6 1921. 36738
##  7 1921. 36741
##  8 1922. 36878
##  9 1922. 37068
## 10 1922. 37028
## # ... with 42,144 more rows
## 
## [[2]]
## # A tibble: 253,781 x 2
##       X1    X2
##    <dbl> <dbl>
##  1  59.6     9
##  2  59.6     7
##  3  59.6     7
##  4  59.6    16
##  5  59.6    13
##  6  59.6    10
##  7  59.6    18
##  8  59.6    14
##  9  59.6     4
## 10  59.7    12
## # ... with 253,771 more rows
## 
## [[3]]
## # A tibble: 42,154 x 2
##       X1    X2
##    <dbl> <dbl>
##  1 1920. 18932
##  2 1920. 18886
##  3 1920. 18839
##  4 1921. 18796
##  5 1921. 18869
##  6 1921. 19063
##  7 1921. 18915
##  8 1922. 19000
##  9 1922. 19058
## 10 1922. 19000
## # ... with 42,144 more rows
## 
## [[4]]
## # A tibble: 253,781 x 2
##       X1    X2
##    <dbl> <dbl>
##  1  59.6    15
##  2  59.6    11
##  3  59.6    11
##  4  59.6    26
##  5  59.6    19
##  6  59.6    18
##  7  59.6     7
##  8  59.6    18
##  9  59.6    12
## 10  59.7    11
## # ... with 253,771 more rows
## 
## [[5]]
## # A tibble: 253,781 x 2
##       X1    X2
##    <dbl> <dbl>
##  1  59.6    10
##  2  59.6    22
##  3  59.6    18
##  4  59.6    12
##  5  59.6     6
##  6  59.6     8
##  7  59.6     7
##  8  59.6    15
##  9  59.6    13
## 10  59.7    10
## # ... with 253,771 more rows
## 
## [[6]]
## # A tibble: 42,154 x 2
##       X1    X2
##    <dbl> <dbl>
##  1 1920. 32558
##  2 1920. 32717
##  3 1920. 32893
##  4 1921. 33004
##  5 1921. 33049
##  6 1921. 33296
##  7 1921. 33363
##  8 1922. 33543
##  9 1922. 33828
## 10 1922. 33820
## # ... with 42,144 more rows
ADD REPLYlink written 19 months ago by cclark4210
1
I was rather thinking about accessing the relevant information by extracting it from the XML file directly (using the `XML` or `xml2` packages), rather than converting it to a database. The latter is an interesting approach which has advantages, but is a bit too heavy-handed to extract a single tag. I will give it a go and get back to you.
ADD REPLYlink written 19 months ago by Laurent Gatto1.2k
1

Sorry about that.. it was just the context of why I needed to access that element. The database code is for a current project. 

It looks like the package XML2 isn't part of mZR, but XML is in the 'Suggests' so I played with that. I really like that mZR doesn't pull files into memory until explicitly specified, and the code in this Gist stays with that premise.

I'm not sure what kind of API would be desired, one for just this 'parentFile' element or one that is more generic and can handle elements that aren't predefined by the API.

GIST:

ADD REPLYlink written 19 months ago by cclark4210
1

Yes, it's exactly something like that I was thinking of. Note that there's the fileNames function to extract the file name of the mz[X]ML file, but that can be different than the original file.

> library("MSnbase")
> ms <- readMSData("file.mzML", mode = "onDisk")
> (f <- fileNames(ms))
[1] "/home/lg390/tmp/file.mzML"
> library("XML")
> x <- xmlInternalTreeParse(basename(f))
> srcfile <- xmlRoot(x)[["mzML"]][["fileDescription"]][["sourceFileList"]][["sourceFile"]]
> xmlAttrs(srcfile)
                                       id
                                   "RAW1"
                                     name
                 "Thermo_Hela_PRTC_1.raw"
                                 location
"file://C:/Users/CCPAdmin/Desktop/lgatto"

It would be useful to have a function that extracts that source file. I'm happy to add it as soon as I have time. (Or you can send a PR, of course.)

ADD REPLYlink modified 19 months ago • written 19 months ago by Laurent Gatto1.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 423 users visited in the last hour