Isobar - generating IBSpectra object
2
0
Entering edit mode
queso012 • 0
@queso012-7191
Last seen 8.4 years ago
United States

Hi,

I'm having trouble at the first step of reading in data and generating a IBSpectra object. I am using the readIBSpectra() function. From the documentation the id.file parameter can be mzIdentml or .csv. I have both formats which I exported from a mascot search. When I try both mzid or csv, I receive the following error messages:

ib <- readIBSpectra("iTRAQ8plexSpectra",id.file=list.files(pattern=".mzid"), peaklist.file=list.files(pattern=".mgf"))

reading id file F004684_merged.mzid [type: mzid] ...Error in t.default(do.call(cbind, xpathApply(doc, paste0(root,"/x:AnalysisProtocolCollection/x:SpectrumIdentificationProtocol/x:ModificationParams/x:SearchModification"), : argument is not a matrix

ib <- readIBSpectra("iTRAQ8plexSpectra",id.file=list.files(pattern=".csv"), peaklist.file=list.files(pattern=".mgf"))

reading id file Mascot_search_results.csv [type: ibspectra] ... done
Error in [.data.frame(id.data, , .SPECTRUM.COLS["SPECTRUM"]) : 
undefined columns selected

I would like some assistance in understanding what the error means? Are there formatting issues? If so, what should the format be?

I have mgf files from proteowizard (for each individual sample) as well as one .mgf (consolidated) file from mascot distiller, that I assume contains information for all samples rolled into one .mgf file. Which is the best to use?

I appreciate you help, thanks!

sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-apple-darwin13.1.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base 

other attached packages:
[1] XML_3.98-1.1 isobar_1.10.0 plyr_1.8.1 Biobase_2.24.0 RColorBrewer_1.1-2

[6] DESeq2_1.4.5 RcppArmadillo_0.4.550.1.0 Rcpp_0.11.3 GenomicRanges_1.16.4 GenomeInfoDb_1.0.2

[11] IRanges_1.22.10 BiocGenerics_0.10.0 

loaded via a namespace (and not attached):
[1] annotate_1.42.1 AnnotationDbi_1.26.1 DBI_0.3.1 distr_2.5.3 genefilter_1.46.1

[6] geneplotter_1.42.0 grid_3.1.0 lattice_0.20-29 locfit_1.5-9.1 RSQLite_1.0.0

[11] sfsmisc_1.0-26 splines_3.1.0 startupmsg_0.9 stats4_3.1.0 survival_2.37-7

[16] SweaveListingUtils_0.6.2 tools_3.1.0 xtable_1.7-4 XVector_0.4.0
isobar mzid mgf proteomics itraq • 1.7k views
ADD COMMENT
1
Entering edit mode
@thomas-lin-pedersen-5941
Last seen 8.3 years ago
Copenhagen, Denmark

I do not have experience with isobar itself, so keep that in mind for the following...

 

The errors are different in nature but could very well be related. The first one (reading an mzIdentML file) indicates that the information found on the following xml path '/x:AnalysisProtocolCollection/x:SpectrumIdentificationProtocol/x:ModificationParams/x:SearchModification' is in another form than expected and can not simply be cbinded together. This propagates to the transpose function that expects a matrix but gets something else. The other error, as I can read it, indicates that the function expects a column that is not present in your data.

Before you jump to the conclusion that your data is malformated, keep in mind that mzIdentML is a very open format with lots of possibilities that are difficult to expect. In order to check the consistency of your mzIdentML file I would recommend installing mzID or mzR and use their parsing functionality to try and read in the file in question. If this fails it would indicate that your file is dodgy.

If that is not the case, then I hope the developer of isobar can chime in and tell about the specific expectations for id files in isobar.

hope this helps...

ADD COMMENT
0
Entering edit mode

 

Hi Thomas,

 

Thanks for your suggestions! So I was able to successfully parse my mzIdentML file using the mzID package. I was able to extract out the following fields:

> names(flatResults)
 [1] "spectrumid"                          "acquisitionnum"                      "calculatedmasstocharge"             
 [4] "chargestate"                         "experimentalmasstocharge"            "rank"                               
 [7] "passthreshold"                       "mascot:expectation value"            "mascot:score"                       
[10] "peptide shared in multiple proteins" "peptide unique to one protein"       "pepseq"                             
[13] "modified"                            "modification"                        "start"                              
[16] "end"                                 "pre"                                 "post"                               
[19] "isdecoy"                             "accession"                           "description"                        
[22] "databaseFile"                       

I'm waiting to hear back from the author, but perhaps isobar is looking for fields not found in the file. Some of the parameters mentioned in the error message do not seem to be present (Analysis protocol, search modification).

Anyway, I just wanted to follow up with your suggestion. Any other ideas are welcome.

Thanks again!

 

 

 

ADD REPLY
0
Entering edit mode
@laurent-gatto-5645
Last seen 6 hours ago
Belgium

As Thomas pointed out, the expectations of the isobar's reader (csv and mzid files) are not met. The csv file, for instance, does not contain the expected column name spectrum. The expected column names (not sure if all are mandatory, probably not) are:

.SPECTRUM.COLS <- c(PEPTIDE="peptide",MODIFSTRING="modif",
            CHARGE="charge",
            THEOMASS="theo.mass",EXPMASS="exp.mass",
            EXPMOZ="exp.moz",
            PRECURSOR.ERROR="precursor.error",
            PARENTINTENS="parent.intens",RT="retention.time",
            SPECTRUM="spectrum",SPECTRUM.QUANT="spectrum.quant",
            .ID.COLS,USEFORQUANT="use.for.quant",
            .PTM.COLS,
            DISSOCMETHOD="dissoc.method",
            PRECURSOR.PURITY="precursor.purity",
            SCANS="scans",SCANS.FROM="scan.from",SCANS.TO="scan.to",
            RAWFILE="raw.file",NMC="nmc",
            MASSDELTA.ABS="massdelta.abs",
            MASSDELTA.PPM="massdelta.ppm",
            SAMPLE="sample",FILE="file",NOTES="notes")

(from the IBSpectra-class.R file)

What are the column names of the csv files? Maybe you could fix these so that they match the expected .SPECTRUM.COLS above? This is not ideal but might get you somewhere.

ADD COMMENT

Login before adding your answer.

Traffic: 841 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6