Ion Mass to Charge Search R Function with MSnbase, Data frame as mass to charge input help
1
0
Entering edit mode
@djshipman2011-21483
Last seen 4.2 years ago

Hello,

I am working on developing an R script that utilizes the MSnbase R bioconductor package and am having a bit of trouble getting the MSnbase commands to operate as I wish inside of a custom R function. I have used this method in the past to retrieve intensity values from a raw MS data file with individual set mass to charge values but never been able to get it to quite work by pulling the inputted from a data frame with a list of mass to charge values then for it to return the summed intensities for the given input.

I will be using the MSnbase and plyr package in the following code, if these are not the right packages to approach this task, any recommendations would be appreciated.

I think it is best to first look at the code in the script to get an idea for what I am going for. Basically, I would like to input mass to charge values from a data frame and return a new column with the sum of the intensity’s values in a new column in the data frame from a MSnbase chromatogram function. As of right now I am working with a simple data set with 5 entries but in the future, I need to upscale this process to several thousand, therefore I cannot simple do this by hand. The problem I am having is I can not figure out how to do this with MSnbase with a data frame as the input for mass to charge values. I would like to get this to work where I can pull one mass to charge (MZHplus) value at a time from a data frame to search for the summed intensity for a given mass to charge value. I believe this to be a simple syntax error or something in my code that I am missing / using the wrong commands for the job I am attempting.

Here is the current code with comments that I am working with…

#Load libraries
library("MSnbase", lib.loc="~/R/win-library/3.6")
library(plyr)

#File to load - F1
#MSData.mzML

#Read raw MS mzML data file
#Note, according to forums the following error can be ignored...
# "Error in x$.self$finalize() : attempt to apply non-function"
msd <- readMSData("MSData.mzML", verbose = FALSE)

#Load the .csv file with peptide mz values for use in the following function
#Change input to desired list of mz to be used for ion search
peptideTable <- read.csv("test-data.csv")

#Creates the peptide_intensity_sum function
peptide_intensity_sum <- function(mz){
  #set up the rentionsion time range
  rtr <- c(1, 60000)
  #set up the mass to charge range
  minmz <- (mz – 0.015)
  maxmz <- (mz + 0.015)
  mzr <- c(minmz, maxmz)
  #Chromatogram query to get all intensities values from mass spec data
  chrs <- chromatogram(msd, rt = rtr, mz = mzr, aggregationFun = "sum", msLevel = 2)
  #Store intensities values in int var
  int <- intensity(chrs[1, 1])
  #Compute sum of intensities values, remove NA values
  summedInts <- sum(int, na.rm = TRUE)
  #Return intSum value
  return(summedInts)
  #Clean up function environment
  rm(c(minmz, maxmz, mzr, chrs, int, summedInts))
}

#Run the function above on the MZHplus value and place summed intensities into new column
#the following works but outputs calc value on for first entry only?
intSum <- mutate(peptideTable, peptideIntSum = peptide_intensity_sum(MZHplus))
write.csv(intSum, "intensities.csv")

Here is the Input peptide data...

ID Sequence Master.Protein.Accessions MZHplus

1 QNAQCLHGDIAQSQR Q99MJ9 1870.937

2 VGNLGLATSFFNER Q62095 1669.909

3 QLCDNAGFDATNILNK P80313 2084.105

4 IIDGGSGYLCEMEPVAHFGLGR Q8R555 2523.255

5 LSECLQEVYEPEWPGRDEANK O08539 2840.402

Here is the output…

ID Sequence Master.Protein.Accessions MZHplus peptideintSum

1 QNAQCLHGDIAQSQR Q99MJ9 1870.937 546252843

2 VGNLGLATSFFNER Q62095 1669.909 546252843

3 QLCDNAGFDATNILNK P80313 2084.105 546252843

4 IIDGGSGYLCEMEPVAHFGLGR Q8R555 2523.255 546252843

5 LSECLQEVYEPEWPGRDEANK O08539 2840.402 546252843

As can be seen in the outputted data frame above, the same entry “546252843” is placed in each row of the peptideIntSum column instead of different summed intensity values for each mass to charge value (MZHplus). I think this is a syntax error or something. I would just like the chromatogram intensity function to run one row at a time and return an input into the peptideIntSum column. Or maybe the MSnbase package cannot do this. Any help would be appreciated.

Thank you

Let me know if more information is needed. :)

Edit: added input and output, fixed file name.

MSnbase plyr • 900 views
ADD COMMENT
1
Entering edit mode

There's no input/output data to look at.

ADD REPLY
0
Entering edit mode

ah ok ill fix that. I tried adding images but ill do it as text. Thanks. Edit: it was a little tricky to get the input/output data frames entered as text. still learning the formatting for this forum. thanks :)

ADD REPLY
2
Entering edit mode
@laurent-gatto-5645
Last seen 3 days ago
Belgium

I think you need to call you function on each row:

peptideTable %>% 
   rowwise() %>%
   mutate(peptideIntSum = peptide_intensity_sum(MZHplus))

Additional comments:

  • In such situations, you can alwasy test your function by peptide_intensity_sum manually.
  • No need to clean-up at the end, the function environment will be garbage collected.
  • You could create all your chromatograms in one go, by vectorising your mz - 0.015 and mz + 0.015. You would get a Chromatograms object for all you mz slides in one go.
ADD COMMENT
0
Entering edit mode

Thank you so much. I was originally working with the dplyr package but thought it was causing issues. I knew I was just missing something. :)

ADD REPLY

Login before adding your answer.

Traffic: 734 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6