mZR: access "parentFile" tag
1
0
Entering edit mode
cclark42 ▴ 10
@cclark42-15988
Last seen 2.7 years ago

Is it possible to access the "parentFile" tag (as seen below) for an mzXML file with mZR?

l version="1.0" encoding="ISO-8859-1"?>
<mzXML xmlns="&lt;a href=" http:="" sashimi.sourceforge.net="" schema_revision="" mzXML_3.2"="" rel="nofollow">http://sashimi.sourceforge.net/schema_revision/mzXML_3.2"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xsi:schemaLocation="http://sashimi.sourceforge.net/schema_revision/mzXML_3.2 http://sashimi.sourceforge.net/schema_revision/mzXML_3.2/mzXML_idx_3.2.xsd">
  <msRun scanCount="6" startTime="PT0S" endTime="PT0S">
    <parentFile fileName="file://C:\Users\chase\Documents\MALDI\9-7-17\sm\0_A18\1\1SRef/fid" fileType="RAWData" fileSha1="5965ae8e78bef821459f64d903b498068c092679"/>
mZR mzr/ msnbase • 540 views
ADD COMMENT
0
Entering edit mode
@laurent-gatto-5645
Last seen 9 hours ago
Belgium

No, I don't think so. In addition, it looks like that tag is different in mzML files, which we tend to develop for now.

This is something that we could consider adding and shouldn't be too difficult using xml2 - PR on github welcome.

 

ADD COMMENT
0
Entering edit mode

Thanks Dr. Gatto... I figured as much.  I have a Shiny app that, after converting raw -> mzXML will allow creating spectral databases. So I'm interfacing an RSQLite database component now but lose provenance of the original raw files. (Maybe there's a way to directly serialize the mzXMl directly into a SQL blob from R?)

As for a PR- if you could point to the appropriate GH file(s) I can take a look if it's something I could help with.

Thanks!

RSQLite Code:

Access previously created RSQLite database

newDatabase <- DBI::dbConnect(RSQLite::SQLite(), paste0("SpectraLibrary/", "example", ".sqlite"))
db          <- dplyr::tbl(newDatabase, "IDBacDatabase")

db %>% dplyr::select(c(1,11:19)) %>% dplyr::select(-c(6,7)) #remove user-input-metadata columns
## # Source:   lazy query [?? x 8]
## # Database: sqlite 3.22.0
## #   [C:\Users\chase\Documents\GitHub\IDBac_App\inst\app\SpectraLibrary\example.sqlite]
##    Strain_ID    manufacturer  model  ionisation  analyzer Small_Molecule_~
##    <chr>        <chr>         <chr>  <chr>       <chr>               <int>
##  1 114A-2       Bruker Dalto~ Bruke~ matrix-ass~ time-of~               NA
## # ... with more rows, and 2 more variables: mzXML <blob>, rds <blob>
db %>% dplyr::select(c(1,18,19))
## # Source:   lazy query [?? x 3]
## # Database: sqlite 3.22.0
## #   [C:\Users\chase\Documents\GitHub\IDBac_App\inst\app\SpectraLibrary\example.sqlite]
##    Strain_ID                                               mzXML       rds
##    <chr>                                                  <blob>    <blob>
##  1 114A-2                                         <raw  7.14 MB> <raw 1 B>
## # ... with more rows

Display what used to be a mzXML file (that held 6 spectra)

oneSpec <- db %>% dplyr::filter(Strain_ID == "114A-2") %>% dplyr::select(mzXML) %>% dplyr::collect()
mzAccess <- lapply(oneSpec[[1]], function(x) memDecompress(x,type="gzip"))
mzAccess <- sapply(mzAccess, function(x) unserialize(x, NULL))

lapply(q3[[1]], function(x) dplyr::as_tibble(data.frame(x)))
## [[1]]
## # A tibble: 42,154 x 2
##       X1    X2
##    <dbl> <dbl>
##  1 1920. 36161
##  2 1920. 36226
##  3 1920. 36373
##  4 1921. 36579
##  5 1921. 36741
##  6 1921. 36738
##  7 1921. 36741
##  8 1922. 36878
##  9 1922. 37068
## 10 1922. 37028
## # ... with 42,144 more rows
## 
## [[2]]
## # A tibble: 253,781 x 2
##       X1    X2
##    <dbl> <dbl>
##  1  59.6     9
##  2  59.6     7
##  3  59.6     7
##  4  59.6    16
##  5  59.6    13
##  6  59.6    10
##  7  59.6    18
##  8  59.6    14
##  9  59.6     4
## 10  59.7    12
## # ... with 253,771 more rows
## 
## [[3]]
## # A tibble: 42,154 x 2
##       X1    X2
##    <dbl> <dbl>
##  1 1920. 18932
##  2 1920. 18886
##  3 1920. 18839
##  4 1921. 18796
##  5 1921. 18869
##  6 1921. 19063
##  7 1921. 18915
##  8 1922. 19000
##  9 1922. 19058
## 10 1922. 19000
## # ... with 42,144 more rows
## 
## [[4]]
## # A tibble: 253,781 x 2
##       X1    X2
##    <dbl> <dbl>
##  1  59.6    15
##  2  59.6    11
##  3  59.6    11
##  4  59.6    26
##  5  59.6    19
##  6  59.6    18
##  7  59.6     7
##  8  59.6    18
##  9  59.6    12
## 10  59.7    11
## # ... with 253,771 more rows
## 
## [[5]]
## # A tibble: 253,781 x 2
##       X1    X2
##    <dbl> <dbl>
##  1  59.6    10
##  2  59.6    22
##  3  59.6    18
##  4  59.6    12
##  5  59.6     6
##  6  59.6     8
##  7  59.6     7
##  8  59.6    15
##  9  59.6    13
## 10  59.7    10
## # ... with 253,771 more rows
## 
## [[6]]
## # A tibble: 42,154 x 2
##       X1    X2
##    <dbl> <dbl>
##  1 1920. 32558
##  2 1920. 32717
##  3 1920. 32893
##  4 1921. 33004
##  5 1921. 33049
##  6 1921. 33296
##  7 1921. 33363
##  8 1922. 33543
##  9 1922. 33828
## 10 1922. 33820
## # ... with 42,144 more rows
ADD REPLY
1
Entering edit mode
I was rather thinking about accessing the relevant information by extracting it from the XML file directly (using the `XML` or `xml2` packages), rather than converting it to a database. The latter is an interesting approach which has advantages, but is a bit too heavy-handed to extract a single tag. I will give it a go and get back to you.
ADD REPLY
1
Entering edit mode

Sorry about that.. it was just the context of why I needed to access that element. The database code is for a current project. 

It looks like the package XML2 isn't part of mZR, but XML is in the 'Suggests' so I played with that. I really like that mZR doesn't pull files into memory until explicitly specified, and the code in this Gist stays with that premise.

I'm not sure what kind of API would be desired, one for just this 'parentFile' element or one that is more generic and can handle elements that aren't predefined by the API.

GIST:

ADD REPLY
1
Entering edit mode

Yes, it's exactly something like that I was thinking of. Note that there's the fileNames function to extract the file name of the mz[X]ML file, but that can be different than the original file.

> library("MSnbase")
> ms <- readMSData("file.mzML", mode = "onDisk")
> (f <- fileNames(ms))
[1] "/home/lg390/tmp/file.mzML"
> library("XML")
> x <- xmlInternalTreeParse(basename(f))
> srcfile <- xmlRoot(x)[["mzML"]][["fileDescription"]][["sourceFileList"]][["sourceFile"]]
> xmlAttrs(srcfile)
                                       id
                                   "RAW1"
                                     name
                 "Thermo_Hela_PRTC_1.raw"
                                 location
"file://C:/Users/CCPAdmin/Desktop/lgatto"

It would be useful to have a function that extracts that source file. I'm happy to add it as soon as I have time. (Or you can send a PR, of course.)

ADD REPLY

Login before adding your answer.

Traffic: 393 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6