RforProteomics workflow troubleshooting
0
0
Entering edit mode
@yockpingchow-10955
Last seen 7.8 years ago

Hi,

I have problem with RforProteomics workflow and wish to get help to solve it.

I am running through the part 6 "A comprehensive example" in manual "Using R for proteomics data analysis", and got this error  when performed the step in "peptide identification":

The errors:

> output.files <- lapply(sub("\\.gz", " ", files$fileName),
+ function(x)
+ {
+ param <- setParamValue(param,  'spectrum' ,  'path' , value=x)
+ output.file <- tandem(param)
+ }
+ )

Loading spectra

Failed to read spectrum file: c_elegans_C_1_3_21Apr10_Draco_10-03-06.mzML.gz
Most likely cause: using a binary spectrum file.
Use dta, pkl, mgf, mzdata (v.1.05) or mzxml (v.2.0) files ONLY! (2)

 loaded.
No input spectra met the acceptance criteria.
Loading spectra

Failed to read spectrum file: c_elegans_E_3_1_21Apr10_Draco_10-03-04.mzML.gz
Most likely cause: using a binary spectrum file.
Use dta, pkl, mgf, mzdata (v.1.05) or mzxml (v.2.0) files ONLY! (2)

 loaded.
No input spectra met the acceptance criteria.
Loading spectra

Failed to read spectrum file: c_elegans_E_3_3_21Apr10_Draco_10-03-04.mzML.gz
Most likely cause: using a binary spectrum file.
Use dta, pkl, mgf, mzdata (v.1.05) or mzxml (v.2.0) files ONLY! (2)

 loaded.
No input spectra met the acceptance criteria.
Loading spectra

Failed to read spectrum file: c_elegans_D_1_3_21Apr10_Draco_10-03-07.mzML.gz
Most likely cause: using a binary spectrum file.
Use dta, pkl, mgf, mzdata (v.1.05) or mzxml (v.2.0) files ONLY! (2)

 loaded.
No input spectra met the acceptance criteria.
Loading spectra

Failed to read spectrum file: c_elegans_D_1_1_21Apr10_Draco_10-03-07.mzML.gz
Most likely cause: using a binary spectrum file.
Use dta, pkl, mgf, mzdata (v.1.05) or mzxml (v.2.0) files ONLY! (2)

 loaded.
No input spectra met the acceptance criteria.
Loading spectra

Failed to read spectrum file: c_elegans_C_1_1_21Apr10_Draco_10-03-06.mzML.gz
Most likely cause: using a binary spectrum file.
Use dta, pkl, mgf, mzdata (v.1.05) or mzxml (v.2.0) files ONLY! (2)

 loaded.
No input spectra met the acceptance criteria.
Loading spectra

Failed to read spectrum file: c_elegans_B_2_3_21Apr10_Draco_10-03-05.mzML.gz
Most likely cause: using a binary spectrum file.
Use dta, pkl, mgf, mzdata (v.1.05) or mzxml (v.2.0) files ONLY! (2)

 loaded.
No input spectra met the acceptance criteria.
Loading spectra

Failed to read spectrum file: c_elegans_A_3_1_21Apr10_Draco_10-03-04.mzML.gz
Most likely cause: using a binary spectrum file.
Use dta, pkl, mgf, mzdata (v.1.05) or mzxml (v.2.0) files ONLY! (2)

 loaded.
No input spectra met the acceptance criteria.
Loading spectra

Failed to read spectrum file: c_elegans_A_3_3_21Apr10_Draco_10-03-04.mzML.gz
Most likely cause: using a binary spectrum file.
Use dta, pkl, mgf, mzdata (v.1.05) or mzxml (v.2.0) files ONLY! (2)

 loaded.
No input spectra met the acceptance criteria.
Loading spectra

Failed to read spectrum file: c_elegans_B_2_1_21Apr10_Draco_10-03-05.mzML.gz
Most likely cause: using a binary spectrum file.
Use dta, pkl, mgf, mzdata (v.1.05) or mzxml (v.2.0) files ONLY! (2)

 loaded.
No input spectra met the acceptance criteria.

Thanks,

YP

proteomics • 1.8k views
ADD COMMENT
0
Entering edit mode

Do you have the files in files$fileName in your working directory?

    assayAccession                                       fileName
8            52425 c_elegans_C_1_3_21Apr10_Draco_10-03-06.mzML.gz
44           52437 c_elegans_E_3_1_21Apr10_Draco_10-03-04.mzML.gz
50           52439 c_elegans_E_3_3_21Apr10_Draco_10-03-04.mzML.gz
56           52441 c_elegans_D_1_3_21Apr10_Draco_10-03-07.mzML.gz
62           52443 c_elegans_D_1_1_21Apr10_Draco_10-03-07.mzML.gz
68           52445 c_elegans_C_1_1_21Apr10_Draco_10-03-06.mzML.gz
77           52448 c_elegans_B_2_3_21Apr10_Draco_10-03-05.mzML.gz
83           52450 c_elegans_A_3_1_21Apr10_Draco_10-03-04.mzML.gz
95           52454 c_elegans_A_3_3_21Apr10_Draco_10-03-04.mzML.gz
101          52456 c_elegans_B_2_1_21Apr10_Draco_10-03-05.mzML.gz

Have you run

if (!allfiles) {
    library("R.utils")
    sapply(list.files(pattern = "mzML.gz"), gunzip)
}

Not gunzipping the files would actually make sense in the light of your error message.

And finally, as already asked by email, please state the output of sessionInfo().

ADD REPLY
0
Entering edit mode

Hi,  I am still having the same problem:

> files$fileName
 [1] "c_elegans_C_1_3_21Apr10_Draco_10-03-06.mzML.gz" "c_elegans_E_3_1_21Apr10_Draco_10-03-04.mzML.gz"
 [3] "c_elegans_E_3_3_21Apr10_Draco_10-03-04.mzML.gz" "c_elegans_D_1_3_21Apr10_Draco_10-03-07.mzML.gz"
 [5] "c_elegans_D_1_1_21Apr10_Draco_10-03-07.mzML.gz" "c_elegans_C_1_1_21Apr10_Draco_10-03-06.mzML.gz"
 [7] "c_elegans_B_2_3_21Apr10_Draco_10-03-05.mzML.gz" "c_elegans_A_3_1_21Apr10_Draco_10-03-04.mzML.gz"
 [9] "c_elegans_A_3_3_21Apr10_Draco_10-03-04.mzML.gz" "c_elegans_B_2_1_21Apr10_Draco_10-03-05.mzML.gz"

 

 

> output.files <- lapply(sub("\\.gz","",files$fileName),
+ function(x){
+ param <- setParamValue(param,  'spectrum' ,  'path' , value=x)
+ output.file <- tandem(param)})
Loading spectra

Failed to read spectrum file: c_elegans_C_1_3_21Apr10_Draco_10-03-06.mzML
Most likely: an unsupported data file type:
Use cmn, dta, pkl, mgf, mzdata (v.1.05) or mzXML (v.2.0) files ONLY! (4)

 loaded.
No input spectra met the acceptance criteria.
Loading spectra

Failed to read spectrum file: c_elegans_E_3_1_21Apr10_Draco_10-03-04.mzML
Most likely: an unsupported data file type:
Use cmn, dta, pkl, mgf, mzdata (v.1.05) or mzXML (v.2.0) files ONLY! (4)

 

Any clues to solve this?

Thanks

 

ADD REPLY
0
Entering edit mode

Have you gunzipped the files?

ADD REPLY
0
Entering edit mode

Hi,

I have run following scripts, still the same.

> if (!allfiles) {
+     library("R.utils")
+     sapply(list.files(pattern = "mzML.gz"), gunzip)
+ }


> output.files <- lapply(sub("\\.gz","",files$fileName),
+ function(x){
+ param <- setParamValue(param,  'spectrum' ,  'path' , value=x)
+ output.file <- tandem(param)})
Loading spectra

Failed to read spectrum file: c_elegans_C_1_3_21Apr10_Draco_10-03-06.mzML
Most likely: an unsupported data file type:
Use cmn, dta, pkl, mgf, mzdata (v.1.05) or mzXML (v.2.0) files ONLY! (4)

 loaded.
No input spectra met the acceptance criteria.
Loading spectra

 

ADD REPLY
0
Entering edit mode

Using

> sessionInfo()
R version 3.3.0 Patched (2016-05-11 r70599)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.4 LTS

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] rTANDEM_1.12.0    data.table_1.9.6  Rcpp_0.12.5       XML_3.98-1.4     
[5] R.utils_2.3.0     R.oo_1.20.0       R.methodsS3_1.7.1 jsonlite_0.9.22  
[9] rpx_1.9.2        

loaded via a namespace (and not attached):
[1] compiler_3.3.0 tools_3.3.0    RCurl_1.95-4.8 curl_0.9.7     chron_2.3-47  
[6] bitops_1.0-6 

and extracting the R code extracted from the vignette with knitr::purl("RforProteomics.Rnw")

and executing the code from line 672

library("rpx")
id <- "PXD002161"
px <- PXDataset(id)

to line 757

output.files <- lapply(sub("\\.gz","",files$fileName),
                       function(x){
                           param <- setParamValue(param, 'spectrum', 'path', value=x)
                           output.file <- tandem(param)})

works fine for me. In other words I can't reproduce your issue.

Before following up, please make sure you repeat all the steps above and you have an up-to-date installation of the packages.

ADD REPLY
0
Entering edit mode

Hi,

Thanks very much for the guidance. I managed to run through the demo data.

Now, I have problem with real public data (PXD002081). I have no idea why  R been forced to shut down when performing this step:

 

The error message : <br>Fatal error:non-standard CODEC used for mzML peak data (CODEC type=zlib compression). <br>File cannot be intepreted.<br>

 

Please let me know if you need additional info.

Thanks,

YP

ADD REPLY
0
Entering edit mode

How would I know, you don't provide any information.

ADD REPLY
0
Entering edit mode

The script as shown below:

library("rpx")
id <- "PXD002081"
px <- PXDataset(id)
try(setInternet2(FALSE),silent=TRUE)
library("jsonlite")
addr <- "http://www.ebi.ac.uk:80/pride/ws/archive/%s/list/project/%s"
files <- fromJSON(sprintf(addr, "file", id))$list
assays <- fromJSON(sprintf(addr, "assay", id))$list

files <- subset(files, fileType == 'PEAK',
                select = c("assayAccession","fileName"))
assays <- assays[,c("assayAccession",
                    "experimentalFactor",
                    "proteinCount",
                    "peptideCount",
                    "uniquePeptideCount",
                    "identifiedSpectrumCount",
"totalSpectrumCount")]

group <- sub(".*Name: Y-(.+?)-FF\\.(\\d)", "\\1", assays$experimentalFactor)
splnm <- sub(".*Name: Y-(.+?)-FF\\.(\\d)", "\\1_\\2", assays$experimentalFactor)

assays <- with(assays, {data.frame(assayAccession,
                                   phenotype=sub(".*Name: Y-(.+?)-FF\\.(\\d)",
                                                 "\\1", experimentalFactor),
                                   sampleName=sub(".*Name: Y-(.+?)-FF\\.(\\d)",
                                                  "\\1_\\2", experimentalFactor),
                                   stringsAsFactors=F)})
                                   
files <- subset(files, assayAccession %in% assays$assayAccession)

files$datasetName <- sub('.mzML.gz','', files$fileName, fixed=TRUE)
meta <- merge(files[,c("assayAccession","datasetName")], assays)
rownames(meta) <- meta$datasetName
meta <- meta[order(meta$sampleName),]
rownames(meta) <- NULL

if (!allfiles) {
    library("R.utils")
    sapply(list.files(pattern = "mzML.gz"), gunzip)
}

library("Biostrings")
fasta_location <-  "F:/ANALYSIS/PXD002081_mzML/Homo_sapiens.GRCh38.pep.all # downloaded from Ensembl

fwd.seqs <- readAAStringSet(fasta_location, format="fasta",
                            nrec=-1L, skip=0L, use.names=TRUE)
rev.seqs <- reverse(fwd.seqs)
names(rev.seqs) <- paste("XXX", names(rev.seqs), sep='_')
fwd.rev.seqs <- append( fwd.seqs, rev.seqs)
writeXStringSet(x=fwd.rev.seqs, filepath="h_sapiens_fwd_rev.fasta", format="fasta")

 

 

ADD REPLY
0
Entering edit mode

script: part 2

library("rTANDEM")
param <- setParamOrbitrap()
taxonomy <- rTTaxo(taxon="hsapiens",
                   format="peptide",
                   URL= "h_sapines_fwd_rev.fasta")
param <- setParamValue(param, 'list path', 'taxonomy information', taxonomy)
param <- setParamValue(param, 'protein', 'taxon', value='hsapiens')

def.input.path <- system.file("extdata/default_input.xml", package="rTANDEM")
param <- setParamValue(param, 'list path', 'default parameters',
                       value=def.input.path)

param <- setParamValue(param, "output", "message", "r-for-proteomics ")
param <- setParamValue(param, "refine", value="no")

library("parallel")
param <- setParamValue(param, "spectrum", "threads", detectCores())

output.files <- lapply(sub("\\.gz","",files$fileName),
                       function(x){
                           param <- setParamValue(param, 'spectrum', 'path', value=x)
output.file <- tandem(param)})

 

# The error message pop up and the R was forced to shut down after step: output.file <- tandem(param)})

 

Thanks,

YP

 

ADD REPLY

Login before adding your answer.

Traffic: 640 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6