calling combineFeatures twice causes duplicate row.names
1
0
Entering edit mode
@jorisvanhoutven-14880
Last seen 6 months ago

I'm trying to summarize an MSnSet object from PSM-level to peptide-level, and then to protein-level. I do this in two steps because I want to normalize on the peptide level.

However, doing so

pepqnt <- combineFeatures(qnt, groupBy = fData(qnt)$sequence, fun = sum) pepqnt_S <- normalise(pepqnt, "sum") pepqnt_norm <- normalise(pepqnt_S, "quantiles.robust") protqnt <- combineFeatures(pepqnt_norm, groupBy = fData(pepqnt_norm)$accession, fun = sum)

results in the following error:

Error in value[[3L]](cond) : duplicate 'row.names' are not allowed
AnnotatedDataFrame 'initialize' could not update varMetadata:
perhaps pData and varMetadata are inconsistent?
non-unique values when setting 'row.names': ‘CV.TMT6.126’, ‘CV.TMT6.127’, ‘CV.TMT6.128’, ‘CV.TMT6.129’, ‘CV.TMT6.130’, ‘CV.TMT6.131’

Why isn't this possible? How can I resolve this problem?

FYI: it is possible to go from the PSM-level straight to the protein-level without grouping by sequence first, but that's not what I want. Also, I can't verify whether pData is consistent or not because I can't find it anywhere.

Full code sample below:

library("RforProteomics")
library(MSnbase)
library(mzR)
library(Rcpp)
library(rpx)

px1 <- PXDataset("PXD000001")
mztab <- pxget(px1, "PXD000001_mztab.txt")
qnt_incl_NA <- readMzTabData(mztab, what = "PEP", version = "0.9")
sampleNames(qnt_incl_NA) <- reporterNames(TMT6)
qnt <- filterNA(qnt_incl_NA)
pepqnt <- combineFeatures(qnt, groupBy = fData(qnt)$sequence, fun = sum) pepqnt_S <- normalise(pepqnt, "sum") pepqnt_norm <- normalise(pepqnt_S, "quantiles.robust") protqnt <- combineFeatures(pepqnt_norm, groupBy = fData(pepqnt_norm)$accession, fun = sum)
msnbase duplicate rforproteomics • 586 views
1
Entering edit mode
@laurent-gatto-5645
Last seen 2 days ago
Belgium

Thank you for the report. The error happens because every time combineFeatures is called, is estimates the coefficient of variations for each column and adds these to the feature data. Here, these feature variable already exists when summarising the data from peptide to proteins.

I have opened an issue to sort this out, but in the meantime, there are two solutions

1. Set combineFeatures(..., CV = FALSE) at least in one call

2. Update the feature variable names after the first call:

px1 <- PXDataset("PXD000001")
mztab <- pxget(px1, "PXD000001_mztab.txt")
qnt_incl_NA <- readMzTabData(mztab, what = "PEP", version = "0.9")
sampleNames(qnt_incl_NA) <- reporterNames(TMT6)
qnt <- filterNA(qnt_incl_NA)
pepqnt <- combineFeatures(qnt, groupBy = fData(qnt)$sequence, fun = sum) ## update feature variable names to avoid error i <- grep("CV.TMT", fvarLabels(pepqnt)) fvarLabels(pepqnt)[i] <- paste(fvarLabels(pepqnt)[i], "spectrum", sep = ".") ## proceed pepqnt_S <- normalise(pepqnt, "sum") pepqnt_norm <- normalise(pepqnt_S, "quantiles.robust") protqnt <- combineFeatures(pepqnt_norm, groupBy = fData(pepqnt_norm)$accession, fun = sum)

Let me know if you have any other issues.

0
Entering edit mode

Thanks Laurent, your solution #2 works!

Remarkably enough, solution #1 does not. It produces the same error. I'll also comment on the GitHub issue.

0
Entering edit mode

Thank you for your answer - I'll investigate the CV = FALSE issue.

1
Entering edit mode

It should be cv = FALSE in lower case - sorry for the confusion.