Question

How to convert a xlsx data file to MSnSet format?

0

Entering edit mode

fgol ▴ 10

@fgol-19486

Last seen 4.0 years ago

Hello,

I have a “xlsx” quantitative dataset from proteome discoverer software (the xlsx matrix file contains rows as identified proteins and columns as different conditions/treatments, and matrix cells corresponds to spectral counts).

I need to convert my xlsx dataset to MSnSet format to be able to use msmsTests package for identifying differentially expressed proteins (probably msmsTests accepts only MSnSet format and not a spreadsheet). Would you please advise me that how can I convert my “xlsx” file to MSnSet?

Also, my second question is that how can I have the output of the below codes to an excel file?

> library(msmsTests)
> data(msms.spk)
> msms.spk
> View(exprs(msms.spk))

Thank you very much for any help.

msnbase msmstests proteomics lc-ms/ms msnID • 2.0k views

ADD COMMENT • link updated 5.2 years ago by Laurent Gatto 1.6k • written 5.2 years ago by fgol ▴ 10

1

Entering edit mode

Laurent Gatto 1.6k

@laurent-gatto-5645

Last seen 4 days ago

Belgium

You have two options.

Open your xlsx file with Excel and export it to csv, the use readMSnSet2 to load it into R as an MSnSet
Import your xlsx file into R as a data.frame and use that data.frame as input to readMSnSet2 to create an MSnSet

Have a look at ?readMSnSet2 or in the vignette for details about how to use it of feel free to ask on the support site once you have your csv or data.frame ready.

As for your second question, the output of that code chunk would display the quantitative data. That quantitative data is already present in your xlsx files. To export your data post-analysis back to a spreadsheet, you can use write.exprs to write the MSnSet data to a text-based spreadsheet like csv, and then open it with Excel.

ADD COMMENT • link 5.2 years ago Laurent Gatto 1.6k

1

Entering edit mode

That was great. I could create both files (csv and MSnSet files). Thanks a lot Laurent.

ADD REPLY • link 5.2 years ago fgol ▴ 10

0

Entering edit mode

fgol ▴ 10

@fgol-19486

Last seen 4.0 years ago

Hello gain,

I need to learn how I can create an MSnSet file from a csv file. So, as an example, I just wrote “msms.spk” expression dataset (from “msmsTests” package) to a csv file as bellow:

library(msmsTests)
data(msms.spk)
write.exprs(msms.spk, file="msms.txt", sep="\t")

Then, I manually exported msms.txt to msms.csv file. (ms1 data has 685 rows (protein IDs) and 19 columns (expression data for each treatment)) except the first column (containing protein IDs) and the first row (containing the name of treatments).

Then, to import msms.csv dataset to MSnSet instance, I ran the following codes:

> ms1 <- read.csv("msms.csv")
> dim(ms1)
[1] 685  20

> MAMA <- readMSnSet2(ms1, ecol = 2:19, fnames = 1, header=TRUE)

> dim(MAMA)
[1] 685  18

> head(fData(MAMA))
        Protein.ID Y500U600.006
YKL060C    YKL060C          238
YDR155C    YDR155C          183
YOL086C    YOL086C          221
YJR104C    YJR104C          152
YGR192C    YGR192C          145
YLR150W    YLR150W          115

> head(pData(MAMA))
data frame with 0 columns and 6 rows

> head(MAMA)
MSnSet (storageMode: lockedEnvironment)
assayData: 1 features, 18 samples 
  element names: exprs 
protocolData: none
phenoData: none
featureData
  featureNames: YKL060C
  fvarLabels: Protein.ID Y500U600.006
  fvarMetadata: labelDescription
experimentData: use 'experimentData(object)'
Annotation:  
- - - Processing information - - -
Subset [685,18][1,18] Thu Feb 14 16:01:07 2019 
 MSnbase version: 2.8.3

I do not know what is wrong in my codes because the output of the original dataset directly from msmsTests packages is different (specially in pData and fData) as it is shown the below:

> data(msms.spk)
> head(msms.spk)
MSnSet (storageMode: lockedEnvironment)
assayData: 1 features, 19 samples 
  element names: exprs 
protocolData: none
phenoData
  sampleNames: Y500U100_001 Y500U100_002 ... Y500U600_006 (19 total)
  varLabels: treat
  varMetadata: labelDescription
featureData: none
experimentData: use 'experimentData(object)'
  pubMedIds: http://www.ncbi.nlm.nih.gov/pubmed/23770383 
Annotation:  
- - - Processing information - - -
Subset [685,19][1,19] Wed Feb 13 20:10:53 2019 
 MSnbase version: 1.8.0

> fData(msms.spk)
data frame with 0 columns and 685 rows

> pData(msms.spk)
             treat
Y500U100_001  U100
Y500U100_002  U100
Y500U100_003  U100
Y500U100_004  U100
Y500U200_001  U200
Y500U200_002  U200
Y500U200_003  U200
Y500U200_004  U200
Y500U200_010  U200
Y500U200_011  U200
Y500U400_002  U400
Y500U400_003  U400
Y500U400_004  U400
Y500U600_001  U600
Y500U600_002  U600
Y500U600_003  U600
Y500U600_004  U600
Y500U600_005  U600
Y500U600_006  U600

I would highly appreciate if you can help me to solve my problem.

ADD COMMENT • link updated 5.2 years ago by Laurent Gatto 1.6k • written 5.2 years ago by fgol ▴ 10

score 2 · Accepted Answer · 2019-02-15

First of all, you can set sep = "," when exporting and MSnSet to a spreadsheet. Alternatively, you could also read the data in with readMSnSet2 and set sep = "\t".
In write.exprs, you can define to also export the feature variables - you can provide feature variables one by one, of all in one go with write.exprs(msms.spk, fcol = fvarLabels(msms.spk), file = "msms.csv", sep = ","). But note that the example data you use doesn't have any anyway.
Last point is that exporting and MSnSet to a spreadsheet will loose some data, in particular the pheno data, as you shown above.

Below is some example code that should help.

> library(msmsTests)
> library(msmsTests)
> library(MSnbase)
> data(msms.spk)
> dim(msms.spk)
[1] 685  19
> exprs(msms.spk)[1:5, 1:3]
        Y500U100_001 Y500U100_002 Y500U100_003
YKL060C          151          195          188
YDR155C          154          244          237
YOL086C           64           89          128
YJR104C          161          155          158
YGR192C          157          161          173
> head(pData(msms.spk))
             treat
Y500U100_001  U100
Y500U100_002  U100
Y500U100_003  U100
Y500U100_004  U100
Y500U200_001  U200
Y500U200_002  U200
> fData(msms.spk)
data frame with 0 columns and 685 rows
> write.exprs(msms.spk, fcol = fvarLabels(msms.spk), file = "msms.csv", sep = ",")
> getEcols("msms.csv", split = ",")
 [1] ""             "Y500U100_001" "Y500U100_002" "Y500U100_003" "Y500U100_004"
 [6] "Y500U200_001" "Y500U200_002" "Y500U200_003" "Y500U200_004" "Y500U200_010"
[11] "Y500U200_011" "Y500U400_002" "Y500U400_003" "Y500U400_004" "Y500U600_001"
[16] "Y500U600_002" "Y500U600_003" "Y500U600_004" "Y500U600_005" "Y500U600_006"
> x <- readMSnSet2("msms.csv", ecol = 2:20, fnames = 1)
> x
MSnSet (storageMode: lockedEnvironment)
assayData: 685 features, 19 samples 
  element names: exprs 
protocolData: none
phenoData: none
featureData
  featureNames: YKL060C YDR155C ... YBR081C (685 total)
  fvarLabels: X
  fvarMetadata: labelDescription
experimentData: use 'experimentData(object)'
Annotation:  
- - - Processing information - - -
 MSnbase version: 2.9.3 
> dim(x)
[1] 685  19
> exprs(x)[1:5, 1:3]
        Y500U100_001 Y500U100_002 Y500U100_003
YKL060C          151          195          188
YDR155C          154          244          237
YOL086C           64           89          128
YJR104C          161          155          158
YGR192C          157          161          173
> identical(exprs(msms.spk), exprs(x))
[1] TRUE
> ## BUT!
> pData(x)
data frame with 0 columns and 19 rows
> ## If you have pData as a data.frame (for example read from a spredsheet)
> ## with rownames identical to sampleNames of your MSnSet
> pData(x) <- pData(msms.spk)