Question: calling combineFeatures twice causes duplicate row.names
0
19 months ago by
joris.vanhoutven0 wrote:

I'm trying to summarize an MSnSet object from PSM-level to peptide-level, and then to protein-level. I do this in two steps because I want to normalize on the peptide level.

However, doing so

pepqnt <- combineFeatures(qnt, groupBy = fData(qnt)$sequence, fun = sum) pepqnt_S <- normalise(pepqnt, "sum") pepqnt_norm <- normalise(pepqnt_S, "quantiles.robust") protqnt <- combineFeatures(pepqnt_norm, groupBy = fData(pepqnt_norm)$accession, fun = sum)

results in the following error:

Error in value[[3L]](cond) : duplicate 'row.names' are not allowed
AnnotatedDataFrame 'initialize' could not update varMetadata:
perhaps pData and varMetadata are inconsistent?
non-unique values when setting 'row.names': ‘CV.TMT6.126’, ‘CV.TMT6.127’, ‘CV.TMT6.128’, ‘CV.TMT6.129’, ‘CV.TMT6.130’, ‘CV.TMT6.131’

Why isn't this possible? How can I resolve this problem?

FYI: it is possible to go from the PSM-level straight to the protein-level without grouping by sequence first, but that's not what I want. Also, I can't verify whether pData is consistent or not because I can't find it anywhere.

Full code sample below:

library("RforProteomics")
library(MSnbase)
library(mzR)
library(Rcpp)
library(rpx)

px1 <- PXDataset("PXD000001")
mztab <- pxget(px1, "PXD000001_mztab.txt")
qnt_incl_NA <- readMzTabData(mztab, what = "PEP", version = "0.9")
sampleNames(qnt_incl_NA) <- reporterNames(TMT6)
qnt <- filterNA(qnt_incl_NA)
pepqnt <- combineFeatures(qnt, groupBy = fData(qnt)$sequence, fun = sum) pepqnt_S <- normalise(pepqnt, "sum") pepqnt_norm <- normalise(pepqnt_S, "quantiles.robust") protqnt <- combineFeatures(pepqnt_norm, groupBy = fData(pepqnt_norm)$accession, fun = sum)
modified 19 months ago by Laurent Gatto1.2k • written 19 months ago by joris.vanhoutven0
Answer: calling combineFeatures twice causes duplicate row.names
1
19 months ago by
Laurent Gatto1.2k
Belgium
Laurent Gatto1.2k wrote:

Thank you for the report. The error happens because every time combineFeatures is called, is estimates the coefficient of variations for each column and adds these to the feature data. Here, these feature variable already exists when summarising the data from peptide to proteins.

I have opened an issue to sort this out, but in the meantime, there are two solutions

1. Set combineFeatures(..., CV = FALSE) at least in one call

2. Update the feature variable names after the first call:

px1 <- PXDataset("PXD000001")
mztab <- pxget(px1, "PXD000001_mztab.txt")
qnt_incl_NA <- readMzTabData(mztab, what = "PEP", version = "0.9")
sampleNames(qnt_incl_NA) <- reporterNames(TMT6)
qnt <- filterNA(qnt_incl_NA)
pepqnt <- combineFeatures(qnt, groupBy = fData(qnt)$sequence, fun = sum) ## update feature variable names to avoid error i <- grep("CV.TMT", fvarLabels(pepqnt)) fvarLabels(pepqnt)[i] <- paste(fvarLabels(pepqnt)[i], "spectrum", sep = ".") ## proceed pepqnt_S <- normalise(pepqnt, "sum") pepqnt_norm <- normalise(pepqnt_S, "quantiles.robust") protqnt <- combineFeatures(pepqnt_norm, groupBy = fData(pepqnt_norm)$accession, fun = sum)

Let me know if you have any other issues.

Thanks Laurent, your solution #2 works!

Remarkably enough, solution #1 does not. It produces the same error. I'll also comment on the GitHub issue.

Thank you for your answer - I'll investigate the CV = FALSE issue.

1

It should be cv = FALSE in lower case - sorry for the confusion.