Search
Question: RforProteomics workflow troubleshooting
0
gravatar for yockpingchow
17 months ago by
yockpingchow0 wrote:

Hi,

I have problem with RforProteomics workflow and wish to get help to solve it.

I am running through the part 6 "A comprehensive example" in manual "Using R for proteomics data analysis", and got this error  when performed the step in "peptide identification":

The errors:

> output.files <- lapply(sub("\\.gz", " ", files$fileName),
+ function(x)
+ {
+ param <- setParamValue(param,  'spectrum' ,  'path' , value=x)
+ output.file <- tandem(param)
+ }
+ )

Loading spectra

Failed to read spectrum file: c_elegans_C_1_3_21Apr10_Draco_10-03-06.mzML.gz
Most likely cause: using a binary spectrum file.
Use dta, pkl, mgf, mzdata (v.1.05) or mzxml (v.2.0) files ONLY! (2)

 loaded.
No input spectra met the acceptance criteria.
Loading spectra

Failed to read spectrum file: c_elegans_E_3_1_21Apr10_Draco_10-03-04.mzML.gz
Most likely cause: using a binary spectrum file.
Use dta, pkl, mgf, mzdata (v.1.05) or mzxml (v.2.0) files ONLY! (2)

 loaded.
No input spectra met the acceptance criteria.
Loading spectra

Failed to read spectrum file: c_elegans_E_3_3_21Apr10_Draco_10-03-04.mzML.gz
Most likely cause: using a binary spectrum file.
Use dta, pkl, mgf, mzdata (v.1.05) or mzxml (v.2.0) files ONLY! (2)

 loaded.
No input spectra met the acceptance criteria.
Loading spectra

Failed to read spectrum file: c_elegans_D_1_3_21Apr10_Draco_10-03-07.mzML.gz
Most likely cause: using a binary spectrum file.
Use dta, pkl, mgf, mzdata (v.1.05) or mzxml (v.2.0) files ONLY! (2)

 loaded.
No input spectra met the acceptance criteria.
Loading spectra

Failed to read spectrum file: c_elegans_D_1_1_21Apr10_Draco_10-03-07.mzML.gz
Most likely cause: using a binary spectrum file.
Use dta, pkl, mgf, mzdata (v.1.05) or mzxml (v.2.0) files ONLY! (2)

 loaded.
No input spectra met the acceptance criteria.
Loading spectra

Failed to read spectrum file: c_elegans_C_1_1_21Apr10_Draco_10-03-06.mzML.gz
Most likely cause: using a binary spectrum file.
Use dta, pkl, mgf, mzdata (v.1.05) or mzxml (v.2.0) files ONLY! (2)

 loaded.
No input spectra met the acceptance criteria.
Loading spectra

Failed to read spectrum file: c_elegans_B_2_3_21Apr10_Draco_10-03-05.mzML.gz
Most likely cause: using a binary spectrum file.
Use dta, pkl, mgf, mzdata (v.1.05) or mzxml (v.2.0) files ONLY! (2)

 loaded.
No input spectra met the acceptance criteria.
Loading spectra

Failed to read spectrum file: c_elegans_A_3_1_21Apr10_Draco_10-03-04.mzML.gz
Most likely cause: using a binary spectrum file.
Use dta, pkl, mgf, mzdata (v.1.05) or mzxml (v.2.0) files ONLY! (2)

 loaded.
No input spectra met the acceptance criteria.
Loading spectra

Failed to read spectrum file: c_elegans_A_3_3_21Apr10_Draco_10-03-04.mzML.gz
Most likely cause: using a binary spectrum file.
Use dta, pkl, mgf, mzdata (v.1.05) or mzxml (v.2.0) files ONLY! (2)

 loaded.
No input spectra met the acceptance criteria.
Loading spectra

Failed to read spectrum file: c_elegans_B_2_1_21Apr10_Draco_10-03-05.mzML.gz
Most likely cause: using a binary spectrum file.
Use dta, pkl, mgf, mzdata (v.1.05) or mzxml (v.2.0) files ONLY! (2)

 loaded.
No input spectra met the acceptance criteria.

Thanks,

YP

ADD COMMENTlink written 17 months ago by yockpingchow0

Do you have the files in files$fileName in your working directory?

    assayAccession                                       fileName
8            52425 c_elegans_C_1_3_21Apr10_Draco_10-03-06.mzML.gz
44           52437 c_elegans_E_3_1_21Apr10_Draco_10-03-04.mzML.gz
50           52439 c_elegans_E_3_3_21Apr10_Draco_10-03-04.mzML.gz
56           52441 c_elegans_D_1_3_21Apr10_Draco_10-03-07.mzML.gz
62           52443 c_elegans_D_1_1_21Apr10_Draco_10-03-07.mzML.gz
68           52445 c_elegans_C_1_1_21Apr10_Draco_10-03-06.mzML.gz
77           52448 c_elegans_B_2_3_21Apr10_Draco_10-03-05.mzML.gz
83           52450 c_elegans_A_3_1_21Apr10_Draco_10-03-04.mzML.gz
95           52454 c_elegans_A_3_3_21Apr10_Draco_10-03-04.mzML.gz
101          52456 c_elegans_B_2_1_21Apr10_Draco_10-03-05.mzML.gz

Have you run

if (!allfiles) {
    library("R.utils")
    sapply(list.files(pattern = "mzML.gz"), gunzip)
}

Not gunzipping the files would actually make sense in the light of your error message.

And finally, as already asked by email, please state the output of sessionInfo().

ADD REPLYlink written 17 months ago by Laurent Gatto840

Hi,  I am still having the same problem:

> files$fileName
 [1] "c_elegans_C_1_3_21Apr10_Draco_10-03-06.mzML.gz" "c_elegans_E_3_1_21Apr10_Draco_10-03-04.mzML.gz"
 [3] "c_elegans_E_3_3_21Apr10_Draco_10-03-04.mzML.gz" "c_elegans_D_1_3_21Apr10_Draco_10-03-07.mzML.gz"
 [5] "c_elegans_D_1_1_21Apr10_Draco_10-03-07.mzML.gz" "c_elegans_C_1_1_21Apr10_Draco_10-03-06.mzML.gz"
 [7] "c_elegans_B_2_3_21Apr10_Draco_10-03-05.mzML.gz" "c_elegans_A_3_1_21Apr10_Draco_10-03-04.mzML.gz"
 [9] "c_elegans_A_3_3_21Apr10_Draco_10-03-04.mzML.gz" "c_elegans_B_2_1_21Apr10_Draco_10-03-05.mzML.gz"

 

 

> output.files <- lapply(sub("\\.gz","",files$fileName),
+ function(x){
+ param <- setParamValue(param,  'spectrum' ,  'path' , value=x)
+ output.file <- tandem(param)})
Loading spectra

Failed to read spectrum file: c_elegans_C_1_3_21Apr10_Draco_10-03-06.mzML
Most likely: an unsupported data file type:
Use cmn, dta, pkl, mgf, mzdata (v.1.05) or mzXML (v.2.0) files ONLY! (4)

 loaded.
No input spectra met the acceptance criteria.
Loading spectra

Failed to read spectrum file: c_elegans_E_3_1_21Apr10_Draco_10-03-04.mzML
Most likely: an unsupported data file type:
Use cmn, dta, pkl, mgf, mzdata (v.1.05) or mzXML (v.2.0) files ONLY! (4)

 

Any clues to solve this?

Thanks

 

ADD REPLYlink written 17 months ago by yockpingchow0

Have you gunzipped the files?

ADD REPLYlink written 17 months ago by Laurent Gatto840

Hi,

I have run following scripts, still the same.

> if (!allfiles) {
+     library("R.utils")
+     sapply(list.files(pattern = "mzML.gz"), gunzip)
+ }


> output.files <- lapply(sub("\\.gz","",files$fileName),
+ function(x){
+ param <- setParamValue(param,  'spectrum' ,  'path' , value=x)
+ output.file <- tandem(param)})
Loading spectra

Failed to read spectrum file: c_elegans_C_1_3_21Apr10_Draco_10-03-06.mzML
Most likely: an unsupported data file type:
Use cmn, dta, pkl, mgf, mzdata (v.1.05) or mzXML (v.2.0) files ONLY! (4)

 loaded.
No input spectra met the acceptance criteria.
Loading spectra

 

ADD REPLYlink written 17 months ago by yockpingchow0

Using

> sessionInfo()
R version 3.3.0 Patched (2016-05-11 r70599)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.4 LTS

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] rTANDEM_1.12.0    data.table_1.9.6  Rcpp_0.12.5       XML_3.98-1.4     
[5] R.utils_2.3.0     R.oo_1.20.0       R.methodsS3_1.7.1 jsonlite_0.9.22  
[9] rpx_1.9.2        

loaded via a namespace (and not attached):
[1] compiler_3.3.0 tools_3.3.0    RCurl_1.95-4.8 curl_0.9.7     chron_2.3-47  
[6] bitops_1.0-6 

and extracting the R code extracted from the vignette with knitr::purl("RforProteomics.Rnw")

and executing the code from line 672

library("rpx")
id <- "PXD002161"
px <- PXDataset(id)

to line 757

output.files <- lapply(sub("\\.gz","",files$fileName),
                       function(x){
                           param <- setParamValue(param, 'spectrum', 'path', value=x)
                           output.file <- tandem(param)})

works fine for me. In other words I can't reproduce your issue.

Before following up, please make sure you repeat all the steps above and you have an up-to-date installation of the packages.

ADD REPLYlink written 17 months ago by Laurent Gatto840

Hi,

Thanks very much for the guidance. I managed to run through the demo data.

Now, I have problem with real public data (PXD002081). I have no idea why  R been forced to shut down when performing this step:

 

The error message : <br>Fatal error:non-standard CODEC used for mzML peak data (CODEC type=zlib compression). <br>File cannot be intepreted.<br>

 

Please let me know if you need additional info.

Thanks,

YP

ADD REPLYlink modified 17 months ago • written 17 months ago by yockpingchow0

How would I know, you don't provide any information.

ADD REPLYlink written 17 months ago by Laurent Gatto840

The script as shown below:

library("rpx")
id <- "PXD002081"
px <- PXDataset(id)
try(setInternet2(FALSE),silent=TRUE)
library("jsonlite")
addr <- "http://www.ebi.ac.uk:80/pride/ws/archive/%s/list/project/%s"
files <- fromJSON(sprintf(addr, "file", id))$list
assays <- fromJSON(sprintf(addr, "assay", id))$list

files <- subset(files, fileType == 'PEAK',
                select = c("assayAccession","fileName"))
assays <- assays[,c("assayAccession",
                    "experimentalFactor",
                    "proteinCount",
                    "peptideCount",
                    "uniquePeptideCount",
                    "identifiedSpectrumCount",
"totalSpectrumCount")]

group <- sub(".*Name: Y-(.+?)-FF\\.(\\d)", "\\1", assays$experimentalFactor)
splnm <- sub(".*Name: Y-(.+?)-FF\\.(\\d)", "\\1_\\2", assays$experimentalFactor)

assays <- with(assays, {data.frame(assayAccession,
                                   phenotype=sub(".*Name: Y-(.+?)-FF\\.(\\d)",
                                                 "\\1", experimentalFactor),
                                   sampleName=sub(".*Name: Y-(.+?)-FF\\.(\\d)",
                                                  "\\1_\\2", experimentalFactor),
                                   stringsAsFactors=F)})
                                   
files <- subset(files, assayAccession %in% assays$assayAccession)

files$datasetName <- sub('.mzML.gz','', files$fileName, fixed=TRUE)
meta <- merge(files[,c("assayAccession","datasetName")], assays)
rownames(meta) <- meta$datasetName
meta <- meta[order(meta$sampleName),]
rownames(meta) <- NULL

if (!allfiles) {
    library("R.utils")
    sapply(list.files(pattern = "mzML.gz"), gunzip)
}

library("Biostrings")
fasta_location <-  "F:/ANALYSIS/PXD002081_mzML/Homo_sapiens.GRCh38.pep.all # downloaded from Ensembl

fwd.seqs <- readAAStringSet(fasta_location, format="fasta",
                            nrec=-1L, skip=0L, use.names=TRUE)
rev.seqs <- reverse(fwd.seqs)
names(rev.seqs) <- paste("XXX", names(rev.seqs), sep='_')
fwd.rev.seqs <- append( fwd.seqs, rev.seqs)
writeXStringSet(x=fwd.rev.seqs, filepath="h_sapiens_fwd_rev.fasta", format="fasta")

 

 

ADD REPLYlink written 17 months ago by yockpingchow0

script: part 2

library("rTANDEM")
param <- setParamOrbitrap()
taxonomy <- rTTaxo(taxon="hsapiens",
                   format="peptide",
                   URL= "h_sapines_fwd_rev.fasta")
param <- setParamValue(param, 'list path', 'taxonomy information', taxonomy)
param <- setParamValue(param, 'protein', 'taxon', value='hsapiens')

def.input.path <- system.file("extdata/default_input.xml", package="rTANDEM")
param <- setParamValue(param, 'list path', 'default parameters',
                       value=def.input.path)

param <- setParamValue(param, "output", "message", "r-for-proteomics ")
param <- setParamValue(param, "refine", value="no")

library("parallel")
param <- setParamValue(param, "spectrum", "threads", detectCores())

output.files <- lapply(sub("\\.gz","",files$fileName),
                       function(x){
                           param <- setParamValue(param, 'spectrum', 'path', value=x)
output.file <- tandem(param)})

 

# The error message pop up and the R was forced to shut down after step: output.file <- tandem(param)})

 

Thanks,

YP

 

ADD REPLYlink written 17 months ago by yockpingchow0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 128 users visited in the last hour