Hi,
I have problem with RforProteomics workflow and wish to get help to solve it.
I am running through the part 6 "A comprehensive example" in manual "Using R for proteomics data analysis", and got this error when performed the step in "peptide identification":
The errors:
> output.files <- lapply(sub("\.gz", " ", files$fileName),
+ function(x)
+ {
+ param <- setParamValue(param, 'spectrum' , 'path' , value=x)
+ output.file <- tandem(param)
+ }
+ )
Loading spectra
Failed to read spectrum file: c_elegans_C_1_3_21Apr10_Draco_10-03-06.mzML.gz
Most likely cause: using a binary spectrum file.
Use dta, pkl, mgf, mzdata (v.1.05) or mzxml (v.2.0) files ONLY! (2)
loaded.
No input spectra met the acceptance criteria.
Loading spectra
Failed to read spectrum file: c_elegans_E_3_1_21Apr10_Draco_10-03-04.mzML.gz
Most likely cause: using a binary spectrum file.
Use dta, pkl, mgf, mzdata (v.1.05) or mzxml (v.2.0) files ONLY! (2)
loaded.
No input spectra met the acceptance criteria.
Loading spectra
Failed to read spectrum file: c_elegans_E_3_3_21Apr10_Draco_10-03-04.mzML.gz
Most likely cause: using a binary spectrum file.
Use dta, pkl, mgf, mzdata (v.1.05) or mzxml (v.2.0) files ONLY! (2)
loaded.
No input spectra met the acceptance criteria.
Loading spectra
Failed to read spectrum file: c_elegans_D_1_3_21Apr10_Draco_10-03-07.mzML.gz
Most likely cause: using a binary spectrum file.
Use dta, pkl, mgf, mzdata (v.1.05) or mzxml (v.2.0) files ONLY! (2)
loaded.
No input spectra met the acceptance criteria.
Loading spectra
Failed to read spectrum file: c_elegans_D_1_1_21Apr10_Draco_10-03-07.mzML.gz
Most likely cause: using a binary spectrum file.
Use dta, pkl, mgf, mzdata (v.1.05) or mzxml (v.2.0) files ONLY! (2)
loaded.
No input spectra met the acceptance criteria.
Loading spectra
Failed to read spectrum file: c_elegans_C_1_1_21Apr10_Draco_10-03-06.mzML.gz
Most likely cause: using a binary spectrum file.
Use dta, pkl, mgf, mzdata (v.1.05) or mzxml (v.2.0) files ONLY! (2)
loaded.
No input spectra met the acceptance criteria.
Loading spectra
Failed to read spectrum file: c_elegans_B_2_3_21Apr10_Draco_10-03-05.mzML.gz
Most likely cause: using a binary spectrum file.
Use dta, pkl, mgf, mzdata (v.1.05) or mzxml (v.2.0) files ONLY! (2)
loaded.
No input spectra met the acceptance criteria.
Loading spectra
Failed to read spectrum file: c_elegans_A_3_1_21Apr10_Draco_10-03-04.mzML.gz
Most likely cause: using a binary spectrum file.
Use dta, pkl, mgf, mzdata (v.1.05) or mzxml (v.2.0) files ONLY! (2)
loaded.
No input spectra met the acceptance criteria.
Loading spectra
Failed to read spectrum file: c_elegans_A_3_3_21Apr10_Draco_10-03-04.mzML.gz
Most likely cause: using a binary spectrum file.
Use dta, pkl, mgf, mzdata (v.1.05) or mzxml (v.2.0) files ONLY! (2)
loaded.
No input spectra met the acceptance criteria.
Loading spectra
Failed to read spectrum file: c_elegans_B_2_1_21Apr10_Draco_10-03-05.mzML.gz
Most likely cause: using a binary spectrum file.
Use dta, pkl, mgf, mzdata (v.1.05) or mzxml (v.2.0) files ONLY! (2)
loaded.
No input spectra met the acceptance criteria.
Thanks,
YP
Do you have the files in
files$fileName
in your working directory?Have you run
Not gunzipping the files would actually make sense in the light of your error message.
And finally, as already asked by email, please state the output of
sessionInfo()
.Hi, I am still having the same problem:
> files$fileName
[1] "c_elegans_C_1_3_21Apr10_Draco_10-03-06.mzML.gz" "c_elegans_E_3_1_21Apr10_Draco_10-03-04.mzML.gz"
[3] "c_elegans_E_3_3_21Apr10_Draco_10-03-04.mzML.gz" "c_elegans_D_1_3_21Apr10_Draco_10-03-07.mzML.gz"
[5] "c_elegans_D_1_1_21Apr10_Draco_10-03-07.mzML.gz" "c_elegans_C_1_1_21Apr10_Draco_10-03-06.mzML.gz"
[7] "c_elegans_B_2_3_21Apr10_Draco_10-03-05.mzML.gz" "c_elegans_A_3_1_21Apr10_Draco_10-03-04.mzML.gz"
[9] "c_elegans_A_3_3_21Apr10_Draco_10-03-04.mzML.gz" "c_elegans_B_2_1_21Apr10_Draco_10-03-05.mzML.gz"
> output.files <- lapply(sub("\.gz","",files$fileName),
+ function(x){
+ param <- setParamValue(param, 'spectrum' , 'path' , value=x)
+ output.file <- tandem(param)})
Loading spectra
Failed to read spectrum file: c_elegans_C_1_3_21Apr10_Draco_10-03-06.mzML
Most likely: an unsupported data file type:
Use cmn, dta, pkl, mgf, mzdata (v.1.05) or mzXML (v.2.0) files ONLY! (4)
loaded.
No input spectra met the acceptance criteria.
Loading spectra
Failed to read spectrum file: c_elegans_E_3_1_21Apr10_Draco_10-03-04.mzML
Most likely: an unsupported data file type:
Use cmn, dta, pkl, mgf, mzdata (v.1.05) or mzXML (v.2.0) files ONLY! (4)
Any clues to solve this?
Thanks
Have you gunzipped the files?
Hi,
I have run following scripts, still the same.
> if (!allfiles) {
+ library("R.utils")
+ sapply(list.files(pattern = "mzML.gz"), gunzip)
+ }
> output.files <- lapply(sub("\.gz","",files$fileName),
+ function(x){
+ param <- setParamValue(param, 'spectrum' , 'path' , value=x)
+ output.file <- tandem(param)})
Loading spectra
Failed to read spectrum file: c_elegans_C_1_3_21Apr10_Draco_10-03-06.mzML
Most likely: an unsupported data file type:
Use cmn, dta, pkl, mgf, mzdata (v.1.05) or mzXML (v.2.0) files ONLY! (4)
loaded.
No input spectra met the acceptance criteria.
Loading spectra
Using
and extracting the R code extracted from the vignette with
knitr::purl("RforProteomics.Rnw")
and executing the code from line 672
to line 757
works fine for me. In other words I can't reproduce your issue.
Before following up, please make sure you repeat all the steps above and you have an up-to-date installation of the packages.
Hi,
Thanks very much for the guidance. I managed to run through the demo data.
Now, I have problem with real public data (PXD002081). I have no idea why R been forced to shut down when performing this step:
The error message : <br>Fatal error:non-standard CODEC used for mzML peak data (CODEC type=zlib compression). <br>File cannot be intepreted.<br>
Please let me know if you need additional info.
Thanks,
YP
How would I know, you don't provide any information.
The script as shown below:
library("rpx")
id <- "PXD002081"
px <- PXDataset(id)
try(setInternet2(FALSE),silent=TRUE)
library("jsonlite")
addr <- "http://www.ebi.ac.uk:80/pride/ws/archive/%s/list/project/%s"
files <- fromJSON(sprintf(addr, "file", id))$list
assays <- fromJSON(sprintf(addr, "assay", id))$list
files <- subset(files, fileType == 'PEAK',
select = c("assayAccession","fileName"))
assays <- assays[,c("assayAccession",
"experimentalFactor",
"proteinCount",
"peptideCount",
"uniquePeptideCount",
"identifiedSpectrumCount",
"totalSpectrumCount")]
group <- sub(".*Name: Y-(.+?)-FF\.(\d)", "\1", assays$experimentalFactor)
splnm <- sub(".*Name: Y-(.+?)-FF\.(\d)", "\1_\2", assays$experimentalFactor)
assays <- with(assays, {data.frame(assayAccession,
phenotype=sub(".*Name: Y-(.+?)-FF\.(\d)",
"\1", experimentalFactor),
sampleName=sub(".*Name: Y-(.+?)-FF\.(\d)",
"\1_\2", experimentalFactor),
stringsAsFactors=F)})
files <- subset(files, assayAccession %in% assays$assayAccession)
files$datasetName <- sub('.mzML.gz','', files$fileName, fixed=TRUE)
meta <- merge(files[,c("assayAccession","datasetName")], assays)
rownames(meta) <- meta$datasetName
meta <- meta[order(meta$sampleName),]
rownames(meta) <- NULL
if (!allfiles) {
library("R.utils")
sapply(list.files(pattern = "mzML.gz"), gunzip)
}
library("Biostrings")
fasta_location <- "F:/ANALYSIS/PXD002081_mzML/Homo_sapiens.GRCh38.pep.all # downloaded from Ensembl
fwd.seqs <- readAAStringSet(fasta_location, format="fasta",
nrec=-1L, skip=0L, use.names=TRUE)
rev.seqs <- reverse(fwd.seqs)
names(rev.seqs) <- paste("XXX", names(rev.seqs), sep='_')
fwd.rev.seqs <- append( fwd.seqs, rev.seqs)
writeXStringSet(x=fwd.rev.seqs, filepath="h_sapiens_fwd_rev.fasta", format="fasta")
script: part 2
library("rTANDEM")
param <- setParamOrbitrap()
taxonomy <- rTTaxo(taxon="hsapiens",
format="peptide",
URL= "h_sapines_fwd_rev.fasta")
param <- setParamValue(param, 'list path', 'taxonomy information', taxonomy)
param <- setParamValue(param, 'protein', 'taxon', value='hsapiens')
def.input.path <- system.file("extdata/default_input.xml", package="rTANDEM")
param <- setParamValue(param, 'list path', 'default parameters',
value=def.input.path)
param <- setParamValue(param, "output", "message", "r-for-proteomics ")
param <- setParamValue(param, "refine", value="no")
library("parallel")
param <- setParamValue(param, "spectrum", "threads", detectCores())
output.files <- lapply(sub("\.gz","",files$fileName),
function(x){
param <- setParamValue(param, 'spectrum', 'path', value=x)
output.file <- tandem(param)})
# The error message pop up and the R was forced to shut down after step: output.file <- tandem(param)})
Thanks,
YP