Question

calling combineFeatures twice causes duplicate row.names

0

Entering edit mode

joris.vanhoutven ▴ 10

@jorisvanhoutven-14880

Last seen 4.8 years ago

I'm trying to summarize an MSnSet object from PSM-level to peptide-level, and then to protein-level. I do this in two steps because I want to normalize on the peptide level.

However, doing so

pepqnt <- combineFeatures(qnt, groupBy = fData(qnt)$sequence, fun = sum)
pepqnt_S <- normalise(pepqnt, "sum")
pepqnt_norm <- normalise(pepqnt_S, "quantiles.robust")
protqnt <- combineFeatures(pepqnt_norm, groupBy = fData(pepqnt_norm)$accession, fun = sum)

results in the following error:

Error in value[[3L]](cond) : duplicate 'row.names' are not allowed
  AnnotatedDataFrame 'initialize' could not update varMetadata:
  perhaps pData and varMetadata are inconsistent?
In addition: Warning message:
non-unique values when setting 'row.names': ‘CV.TMT6.126’, ‘CV.TMT6.127’, ‘CV.TMT6.128’, ‘CV.TMT6.129’, ‘CV.TMT6.130’, ‘CV.TMT6.131’

Why isn't this possible? How can I resolve this problem?

FYI: it is possible to go from the PSM-level straight to the protein-level without grouping by sequence first, but that's not what I want. Also, I can't verify whether pData is consistent or not because I can't find it anywhere.

Full code sample below:

library("RforProteomics")
library(MSnbase)
library(mzR)
library(Rcpp)
library(rpx)

px1 <- PXDataset("PXD000001")
mztab <- pxget(px1, "PXD000001_mztab.txt")
qnt_incl_NA <- readMzTabData(mztab, what = "PEP", version = "0.9")
sampleNames(qnt_incl_NA) <- reporterNames(TMT6)
qnt <- filterNA(qnt_incl_NA)
pepqnt <- combineFeatures(qnt, groupBy = fData(qnt)$sequence, fun = sum)
pepqnt_S <- normalise(pepqnt, "sum")
pepqnt_norm <- normalise(pepqnt_S, "quantiles.robust")
protqnt <- combineFeatures(pepqnt_norm, groupBy = fData(pepqnt_norm)$accession, fun = sum)

msnbase duplicate rforproteomics • 2.1k views

ADD COMMENT • link updated 7.9 years ago by Laurent Gatto 1.6k • written 7.9 years ago by joris.vanhoutven ▴ 10

score 1 · Accepted Answer · 2018-01-26

1

Entering edit mode

Laurent Gatto 1.6k

@laurent-gatto-5645

Last seen 5 weeks ago

Belgium

Thank you for the report. The error happens because every time combineFeatures is called, is estimates the coefficient of variations for each column and adds these to the feature data. Here, these feature variable already exists when summarising the data from peptide to proteins.

I have opened an issue to sort this out, but in the meantime, there are two solutions

1. Set combineFeatures(..., CV = FALSE) at least in one call

2. Update the feature variable names after the first call:

px1 <- PXDataset("PXD000001")
mztab <- pxget(px1, "PXD000001_mztab.txt")
qnt_incl_NA <- readMzTabData(mztab, what = "PEP", version = "0.9")
sampleNames(qnt_incl_NA) <- reporterNames(TMT6)
qnt <- filterNA(qnt_incl_NA)
pepqnt <- combineFeatures(qnt, groupBy = fData(qnt)$sequence, fun = sum)
## update feature variable names to avoid error
i <- grep("CV.TMT", fvarLabels(pepqnt))
fvarLabels(pepqnt)[i] <- paste(fvarLabels(pepqnt)[i], "spectrum", sep = ".")
## proceed
pepqnt_S <- normalise(pepqnt, "sum")
pepqnt_norm <- normalise(pepqnt_S, "quantiles.robust")
protqnt <- combineFeatures(pepqnt_norm, groupBy = fData(pepqnt_norm)$accession, fun = sum)

Let me know if you have any other issues.

ADD COMMENT • link 7.9 years ago Laurent Gatto 1.6k

0

Entering edit mode

Thanks Laurent, your solution #2 works!

Remarkably enough, solution #1 does not. It produces the same error. I'll also comment on the GitHub issue.

ADD REPLY • link 7.9 years ago joris.vanhoutven ▴ 10

0

Entering edit mode

Thank you for your answer - I'll investigate the CV = FALSE issue.

ADD REPLY • link 7.9 years ago Laurent Gatto 1.6k

1

Entering edit mode

It should be `cv = FALSE` in lower case - sorry for the confusion.

ADD REPLY • link 7.9 years ago Laurent Gatto 1.6k