Question: calling combineFeatures twice causes duplicate row.names
0
gravatar for joris.vanhoutven
19 months ago by
joris.vanhoutven0 wrote:

I'm trying to summarize an MSnSet object from PSM-level to peptide-level, and then to protein-level. I do this in two steps because I want to normalize on the peptide level.

However, doing so

pepqnt <- combineFeatures(qnt, groupBy = fData(qnt)$sequence, fun = sum)
pepqnt_S <- normalise(pepqnt, "sum")
pepqnt_norm <- normalise(pepqnt_S, "quantiles.robust")
protqnt <- combineFeatures(pepqnt_norm, groupBy = fData(pepqnt_norm)$accession, fun = sum)

results in the following error:

Error in value[[3L]](cond) : duplicate 'row.names' are not allowed
  AnnotatedDataFrame 'initialize' could not update varMetadata:
  perhaps pData and varMetadata are inconsistent?
In addition: Warning message:
non-unique values when setting 'row.names': ‘CV.TMT6.126’, ‘CV.TMT6.127’, ‘CV.TMT6.128’, ‘CV.TMT6.129’, ‘CV.TMT6.130’, ‘CV.TMT6.131’

Why isn't this possible? How can I resolve this problem?

FYI: it is possible to go from the PSM-level straight to the protein-level without grouping by sequence first, but that's not what I want. Also, I can't verify whether pData is consistent or not because I can't find it anywhere.

Full code sample below:

library("RforProteomics")
library(MSnbase)
library(mzR)
library(Rcpp)
library(rpx)

px1 <- PXDataset("PXD000001")
mztab <- pxget(px1, "PXD000001_mztab.txt")
qnt_incl_NA <- readMzTabData(mztab, what = "PEP", version = "0.9")
sampleNames(qnt_incl_NA) <- reporterNames(TMT6)
qnt <- filterNA(qnt_incl_NA)
pepqnt <- combineFeatures(qnt, groupBy = fData(qnt)$sequence, fun = sum)
pepqnt_S <- normalise(pepqnt, "sum")
pepqnt_norm <- normalise(pepqnt_S, "quantiles.robust")
protqnt <- combineFeatures(pepqnt_norm, groupBy = fData(pepqnt_norm)$accession, fun = sum)
ADD COMMENTlink modified 19 months ago by Laurent Gatto1.2k • written 19 months ago by joris.vanhoutven0
Answer: calling combineFeatures twice causes duplicate row.names
1
gravatar for Laurent Gatto
19 months ago by
Laurent Gatto1.2k
Belgium
Laurent Gatto1.2k wrote:

Thank you for the report. The error happens because every time combineFeatures is called, is estimates the coefficient of variations for each column and adds these to the feature data. Here, these feature variable already exists when summarising the data from peptide to proteins.

I have opened an issue to sort this out, but in the meantime, there are two solutions

1. Set combineFeatures(..., CV = FALSE) at least in one call

2. Update the feature variable names after the first call:

px1 <- PXDataset("PXD000001")
mztab <- pxget(px1, "PXD000001_mztab.txt")
qnt_incl_NA <- readMzTabData(mztab, what = "PEP", version = "0.9")
sampleNames(qnt_incl_NA) <- reporterNames(TMT6)
qnt <- filterNA(qnt_incl_NA)
pepqnt <- combineFeatures(qnt, groupBy = fData(qnt)$sequence, fun = sum)
## update feature variable names to avoid error
i <- grep("CV.TMT", fvarLabels(pepqnt))
fvarLabels(pepqnt)[i] <- paste(fvarLabels(pepqnt)[i], "spectrum", sep = ".")
## proceed
pepqnt_S <- normalise(pepqnt, "sum")
pepqnt_norm <- normalise(pepqnt_S, "quantiles.robust")
protqnt <- combineFeatures(pepqnt_norm, groupBy = fData(pepqnt_norm)$accession, fun = sum)

Let me know if you have any other issues.

ADD COMMENTlink modified 19 months ago • written 19 months ago by Laurent Gatto1.2k

Thanks Laurent, your solution #2 works!

Remarkably enough, solution #1 does not. It produces the same error. I'll also comment on the GitHub issue.

ADD REPLYlink written 19 months ago by joris.vanhoutven0

Thank you for your answer - I'll investigate the CV = FALSE issue.

ADD REPLYlink written 19 months ago by Laurent Gatto1.2k
1

It should be `cv = FALSE` in lower case - sorry for the confusion.

ADD REPLYlink written 19 months ago by Laurent Gatto1.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 130 users visited in the last hour