calling combineFeatures twice causes duplicate row.names
1
0
Entering edit mode
@jorisvanhoutven-14880
Last seen 3.1 years ago

I'm trying to summarize an MSnSet object from PSM-level to peptide-level, and then to protein-level. I do this in two steps because I want to normalize on the peptide level.

However, doing so

pepqnt <- combineFeatures(qnt, groupBy = fData(qnt)$sequence, fun = sum)
pepqnt_S <- normalise(pepqnt, "sum")
pepqnt_norm <- normalise(pepqnt_S, "quantiles.robust")
protqnt <- combineFeatures(pepqnt_norm, groupBy = fData(pepqnt_norm)$accession, fun = sum)

results in the following error:

Error in value[[3L]](cond) : duplicate 'row.names' are not allowed
  AnnotatedDataFrame 'initialize' could not update varMetadata:
  perhaps pData and varMetadata are inconsistent?
In addition: Warning message:
non-unique values when setting 'row.names': ‘CV.TMT6.126’, ‘CV.TMT6.127’, ‘CV.TMT6.128’, ‘CV.TMT6.129’, ‘CV.TMT6.130’, ‘CV.TMT6.131’

Why isn't this possible? How can I resolve this problem?

FYI: it is possible to go from the PSM-level straight to the protein-level without grouping by sequence first, but that's not what I want. Also, I can't verify whether pData is consistent or not because I can't find it anywhere.

Full code sample below:

library("RforProteomics")
library(MSnbase)
library(mzR)
library(Rcpp)
library(rpx)

px1 <- PXDataset("PXD000001")
mztab <- pxget(px1, "PXD000001_mztab.txt")
qnt_incl_NA <- readMzTabData(mztab, what = "PEP", version = "0.9")
sampleNames(qnt_incl_NA) <- reporterNames(TMT6)
qnt <- filterNA(qnt_incl_NA)
pepqnt <- combineFeatures(qnt, groupBy = fData(qnt)$sequence, fun = sum)
pepqnt_S <- normalise(pepqnt, "sum")
pepqnt_norm <- normalise(pepqnt_S, "quantiles.robust")
protqnt <- combineFeatures(pepqnt_norm, groupBy = fData(pepqnt_norm)$accession, fun = sum)
msnbase duplicate rforproteomics • 1.4k views
ADD COMMENT
1
Entering edit mode
@laurent-gatto-5645
Last seen 1 day ago
Belgium

Thank you for the report. The error happens because every time combineFeatures is called, is estimates the coefficient of variations for each column and adds these to the feature data. Here, these feature variable already exists when summarising the data from peptide to proteins.

I have opened an issue to sort this out, but in the meantime, there are two solutions

1. Set combineFeatures(..., CV = FALSE) at least in one call

2. Update the feature variable names after the first call:

px1 <- PXDataset("PXD000001")
mztab <- pxget(px1, "PXD000001_mztab.txt")
qnt_incl_NA <- readMzTabData(mztab, what = "PEP", version = "0.9")
sampleNames(qnt_incl_NA) <- reporterNames(TMT6)
qnt <- filterNA(qnt_incl_NA)
pepqnt <- combineFeatures(qnt, groupBy = fData(qnt)$sequence, fun = sum)
## update feature variable names to avoid error
i <- grep("CV.TMT", fvarLabels(pepqnt))
fvarLabels(pepqnt)[i] <- paste(fvarLabels(pepqnt)[i], "spectrum", sep = ".")
## proceed
pepqnt_S <- normalise(pepqnt, "sum")
pepqnt_norm <- normalise(pepqnt_S, "quantiles.robust")
protqnt <- combineFeatures(pepqnt_norm, groupBy = fData(pepqnt_norm)$accession, fun = sum)

Let me know if you have any other issues.

ADD COMMENT
0
Entering edit mode

Thanks Laurent, your solution #2 works!

Remarkably enough, solution #1 does not. It produces the same error. I'll also comment on the GitHub issue.

ADD REPLY
0
Entering edit mode

Thank you for your answer - I'll investigate the CV = FALSE issue.

ADD REPLY
1
Entering edit mode

It should be `cv = FALSE` in lower case - sorry for the confusion.

ADD REPLY

Login before adding your answer.

Traffic: 647 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6