Question

RforProteomics workflow troubleshooting

0

Entering edit mode

yockpingchow • 0

@yockpingchow-10955

Last seen 8.8 years ago

Hi,

I have problem with RforProteomics workflow and wish to get help to solve it.

I am running through the part 6 "A comprehensive example" in manual "Using R for proteomics data analysis", and got this error when performed the step in "peptide identification":

The errors:

> output.files <- lapply(sub("\.gz", " ", files$fileName),
+ function(x)
+ {
+ param <- setParamValue(param, 'spectrum' , 'path' , value=x)
+ output.file <- tandem(param)
+ }
+ )

Loading spectra

Failed to read spectrum file: c_elegans_C_1_3_21Apr10_Draco_10-03-06.mzML.gz
Most likely cause: using a binary spectrum file.
Use dta, pkl, mgf, mzdata (v.1.05) or mzxml (v.2.0) files ONLY! (2)

loaded.
No input spectra met the acceptance criteria.
Loading spectra

Failed to read spectrum file: c_elegans_E_3_1_21Apr10_Draco_10-03-04.mzML.gz
Most likely cause: using a binary spectrum file.
Use dta, pkl, mgf, mzdata (v.1.05) or mzxml (v.2.0) files ONLY! (2)

loaded.
No input spectra met the acceptance criteria.
Loading spectra

Failed to read spectrum file: c_elegans_E_3_3_21Apr10_Draco_10-03-04.mzML.gz
Most likely cause: using a binary spectrum file.
Use dta, pkl, mgf, mzdata (v.1.05) or mzxml (v.2.0) files ONLY! (2)

loaded.
No input spectra met the acceptance criteria.
Loading spectra

Failed to read spectrum file: c_elegans_D_1_3_21Apr10_Draco_10-03-07.mzML.gz
Most likely cause: using a binary spectrum file.
Use dta, pkl, mgf, mzdata (v.1.05) or mzxml (v.2.0) files ONLY! (2)

loaded.
No input spectra met the acceptance criteria.
Loading spectra

Failed to read spectrum file: c_elegans_D_1_1_21Apr10_Draco_10-03-07.mzML.gz
Most likely cause: using a binary spectrum file.
Use dta, pkl, mgf, mzdata (v.1.05) or mzxml (v.2.0) files ONLY! (2)

loaded.
No input spectra met the acceptance criteria.
Loading spectra

Failed to read spectrum file: c_elegans_C_1_1_21Apr10_Draco_10-03-06.mzML.gz
Most likely cause: using a binary spectrum file.
Use dta, pkl, mgf, mzdata (v.1.05) or mzxml (v.2.0) files ONLY! (2)

loaded.
No input spectra met the acceptance criteria.
Loading spectra

Failed to read spectrum file: c_elegans_B_2_3_21Apr10_Draco_10-03-05.mzML.gz
Most likely cause: using a binary spectrum file.
Use dta, pkl, mgf, mzdata (v.1.05) or mzxml (v.2.0) files ONLY! (2)

loaded.
No input spectra met the acceptance criteria.
Loading spectra

Failed to read spectrum file: c_elegans_A_3_1_21Apr10_Draco_10-03-04.mzML.gz
Most likely cause: using a binary spectrum file.
Use dta, pkl, mgf, mzdata (v.1.05) or mzxml (v.2.0) files ONLY! (2)

loaded.
No input spectra met the acceptance criteria.
Loading spectra

Failed to read spectrum file: c_elegans_A_3_3_21Apr10_Draco_10-03-04.mzML.gz
Most likely cause: using a binary spectrum file.
Use dta, pkl, mgf, mzdata (v.1.05) or mzxml (v.2.0) files ONLY! (2)

loaded.
No input spectra met the acceptance criteria.
Loading spectra

Failed to read spectrum file: c_elegans_B_2_1_21Apr10_Draco_10-03-05.mzML.gz
Most likely cause: using a binary spectrum file.
Use dta, pkl, mgf, mzdata (v.1.05) or mzxml (v.2.0) files ONLY! (2)

loaded.
No input spectra met the acceptance criteria.

Thanks,

YP

proteomics • 2.2k views

ADD COMMENT • link 8.8 years ago yockpingchow • 0

0

Entering edit mode

Do you have the files in files$fileName in your working directory?

    assayAccession                                       fileName
8            52425 c_elegans_C_1_3_21Apr10_Draco_10-03-06.mzML.gz
44           52437 c_elegans_E_3_1_21Apr10_Draco_10-03-04.mzML.gz
50           52439 c_elegans_E_3_3_21Apr10_Draco_10-03-04.mzML.gz
56           52441 c_elegans_D_1_3_21Apr10_Draco_10-03-07.mzML.gz
62           52443 c_elegans_D_1_1_21Apr10_Draco_10-03-07.mzML.gz
68           52445 c_elegans_C_1_1_21Apr10_Draco_10-03-06.mzML.gz
77           52448 c_elegans_B_2_3_21Apr10_Draco_10-03-05.mzML.gz
83           52450 c_elegans_A_3_1_21Apr10_Draco_10-03-04.mzML.gz
95           52454 c_elegans_A_3_3_21Apr10_Draco_10-03-04.mzML.gz
101          52456 c_elegans_B_2_1_21Apr10_Draco_10-03-05.mzML.gz

Have you run

if (!allfiles) {
    library("R.utils")
    sapply(list.files(pattern = "mzML.gz"), gunzip)
}

Not gunzipping the files would actually make sense in the light of your error message.

And finally, as already asked by email, please state the output of sessionInfo().

ADD REPLY • link 8.8 years ago Laurent Gatto 1.6k

0

Entering edit mode

Hi, I am still having the same problem:

> files$fileName
[1] "c_elegans_C_1_3_21Apr10_Draco_10-03-06.mzML.gz" "c_elegans_E_3_1_21Apr10_Draco_10-03-04.mzML.gz"
[3] "c_elegans_E_3_3_21Apr10_Draco_10-03-04.mzML.gz" "c_elegans_D_1_3_21Apr10_Draco_10-03-07.mzML.gz"
[5] "c_elegans_D_1_1_21Apr10_Draco_10-03-07.mzML.gz" "c_elegans_C_1_1_21Apr10_Draco_10-03-06.mzML.gz"
[7] "c_elegans_B_2_3_21Apr10_Draco_10-03-05.mzML.gz" "c_elegans_A_3_1_21Apr10_Draco_10-03-04.mzML.gz"
[9] "c_elegans_A_3_3_21Apr10_Draco_10-03-04.mzML.gz" "c_elegans_B_2_1_21Apr10_Draco_10-03-05.mzML.gz"

> output.files <- lapply(sub("\.gz","",files$fileName),
+ function(x){
+ param <- setParamValue(param, 'spectrum' , 'path' , value=x)
+ output.file <- tandem(param)})
Loading spectra

Failed to read spectrum file: c_elegans_C_1_3_21Apr10_Draco_10-03-06.mzML
Most likely: an unsupported data file type:
Use cmn, dta, pkl, mgf, mzdata (v.1.05) or mzXML (v.2.0) files ONLY! (4)

loaded.
No input spectra met the acceptance criteria.
Loading spectra

Failed to read spectrum file: c_elegans_E_3_1_21Apr10_Draco_10-03-04.mzML
Most likely: an unsupported data file type:
Use cmn, dta, pkl, mgf, mzdata (v.1.05) or mzXML (v.2.0) files ONLY! (4)

Any clues to solve this?

Thanks

ADD REPLY • link 8.8 years ago yockpingchow • 0

0

Entering edit mode

Have you gunzipped the files?

ADD REPLY • link 8.8 years ago Laurent Gatto 1.6k

0

Entering edit mode

Hi,

I have run following scripts, still the same.

> if (!allfiles) {
+ library("R.utils")
+ sapply(list.files(pattern = "mzML.gz"), gunzip)
+ }

> output.files <- lapply(sub("\.gz","",files$fileName),
+ function(x){
+ param <- setParamValue(param, 'spectrum' , 'path' , value=x)
+ output.file <- tandem(param)})
Loading spectra

Failed to read spectrum file: c_elegans_C_1_3_21Apr10_Draco_10-03-06.mzML
Most likely: an unsupported data file type:
Use cmn, dta, pkl, mgf, mzdata (v.1.05) or mzXML (v.2.0) files ONLY! (4)

loaded.
No input spectra met the acceptance criteria.
Loading spectra

ADD REPLY • link 8.8 years ago yockpingchow • 0

0

Entering edit mode

Using

> sessionInfo()
R version 3.3.0 Patched (2016-05-11 r70599)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.4 LTS

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] rTANDEM_1.12.0    data.table_1.9.6  Rcpp_0.12.5       XML_3.98-1.4     
[5] R.utils_2.3.0     R.oo_1.20.0       R.methodsS3_1.7.1 jsonlite_0.9.22  
[9] rpx_1.9.2        

loaded via a namespace (and not attached):
[1] compiler_3.3.0 tools_3.3.0    RCurl_1.95-4.8 curl_0.9.7     chron_2.3-47  
[6] bitops_1.0-6

and extracting the R code extracted from the vignette with knitr::purl("RforProteomics.Rnw")

and executing the code from line 672

library("rpx")
id <- "PXD002161"
px <- PXDataset(id)

to line 757

output.files <- lapply(sub("\\.gz","",files$fileName),
                       function(x){
                           param <- setParamValue(param, 'spectrum', 'path', value=x)
                           output.file <- tandem(param)})

works fine for me. In other words I can't reproduce your issue.

Before following up, please make sure you repeat all the steps above and you have an up-to-date installation of the packages.

ADD REPLY • link 8.8 years ago Laurent Gatto 1.6k

0

Entering edit mode

Hi,

Thanks very much for the guidance. I managed to run through the demo data.

Now, I have problem with real public data (PXD002081). I have no idea why R been forced to shut down when performing this step:

The error message : <br>Fatal error:non-standard CODEC used for mzML peak data (CODEC type=zlib compression). <br>File cannot be intepreted.<br>

Please let me know if you need additional info.

Thanks,

YP

ADD REPLY • link 8.8 years ago yockpingchow • 0

0

Entering edit mode

How would I know, you don't provide any information.

ADD REPLY • link 8.8 years ago Laurent Gatto 1.6k

0

Entering edit mode

The script as shown below:

library("rpx")
id <- "PXD002081"
px <- PXDataset(id)
try(setInternet2(FALSE),silent=TRUE)
library("jsonlite")
addr <- "http://www.ebi.ac.uk:80/pride/ws/archive/%s/list/project/%s"
files <- fromJSON(sprintf(addr, "file", id))$list
assays <- fromJSON(sprintf(addr, "assay", id))$list

files <- subset(files, fileType == 'PEAK',
                select = c("assayAccession","fileName"))
assays <- assays[,c("assayAccession",
                    "experimentalFactor",
                    "proteinCount",
                    "peptideCount",
                    "uniquePeptideCount",
                    "identifiedSpectrumCount",
"totalSpectrumCount")]

group <- sub(".*Name: Y-(.+?)-FF\.(\d)", "\1", assays$experimentalFactor)
splnm <- sub(".*Name: Y-(.+?)-FF\.(\d)", "\1_\2", assays$experimentalFactor)

assays <- with(assays, {data.frame(assayAccession,
                                   phenotype=sub(".*Name: Y-(.+?)-FF\.(\d)",
                                                 "\1", experimentalFactor),
                                   sampleName=sub(".*Name: Y-(.+?)-FF\.(\d)",
                                                  "\1_\2", experimentalFactor),
                                   stringsAsFactors=F)})

files <- subset(files, assayAccession %in% assays$assayAccession)

files$datasetName <- sub('.mzML.gz','', files$fileName, fixed=TRUE)
meta <- merge(files[,c("assayAccession","datasetName")], assays)
rownames(meta) <- meta$datasetName
meta <- meta[order(meta$sampleName),]
rownames(meta) <- NULL

if (!allfiles) {
library("R.utils")
sapply(list.files(pattern = "mzML.gz"), gunzip)
}

library("Biostrings")
fasta_location <- "F:/ANALYSIS/PXD002081_mzML/Homo_sapiens.GRCh38.pep.all # downloaded from Ensembl

fwd.seqs <- readAAStringSet(fasta_location, format="fasta",
nrec=-1L, skip=0L, use.names=TRUE)
rev.seqs <- reverse(fwd.seqs)
names(rev.seqs) <- paste("XXX", names(rev.seqs), sep='_')
fwd.rev.seqs <- append( fwd.seqs, rev.seqs)
writeXStringSet(x=fwd.rev.seqs, filepath="h_sapiens_fwd_rev.fasta", format="fasta")

ADD REPLY • link 8.8 years ago yockpingchow • 0

0

Entering edit mode

script: part 2

library("rTANDEM")
param <- setParamOrbitrap()
taxonomy <- rTTaxo(taxon="hsapiens",
format="peptide",
URL= "h_sapines_fwd_rev.fasta")
param <- setParamValue(param, 'list path', 'taxonomy information', taxonomy)
param <- setParamValue(param, 'protein', 'taxon', value='hsapiens')

def.input.path <- system.file("extdata/default_input.xml", package="rTANDEM")
param <- setParamValue(param, 'list path', 'default parameters',
value=def.input.path)

param <- setParamValue(param, "output", "message", "r-for-proteomics ")
param <- setParamValue(param, "refine", value="no")

library("parallel")
param <- setParamValue(param, "spectrum", "threads", detectCores())

output.files <- lapply(sub("\.gz","",files$fileName),
function(x){
param <- setParamValue(param, 'spectrum', 'path', value=x)
output.file <- tandem(param)})

# The error message pop up and the R was forced to shut down after step: output.file <- tandem(param)})

Thanks,

YP

ADD REPLY • link 8.8 years ago yockpingchow • 0