Removing duplicate probes from expressionset

0

Entering edit mode

Angela McDonald ▴ 10

@angela-mcdonald-5231

Last seen 9.6 years ago

Hello, I am wondering how to remove duplicate probes from an expression set in Bioconductor. I have tried to use nsFilter with no success. When I use the following: featureFilter(xenexp, require.entrez=TRUE, remove.dupEntrez=TRUE) The error I get is: Error in rowQ(exprs(imat), which) : cannot calculate order statistic on object with 2 columns The xenexp expression set includes two samples on the mgu74av2 array Thank you so much, Angela [[alternative HTML version deleted]]

mgu74av2 mgu74av2 • 1.7k views

ADD COMMENT • link updated 12.0 years ago by Martin Morgan 25k • written 12.0 years ago by Angela McDonald ▴ 10

0

Entering edit mode

Martin Morgan 25k

@martin-morgan-1513

Last seen 5 days ago

United States

On 04/14/2012 03:59 PM, Angela McDonald wrote: > Hello, > > I am wondering how to remove duplicate probes from an expression set in Bioconductor. I have tried to use nsFilter with no success. > > When I use the following: > > featureFilter(xenexp, require.entrez=TRUE, remove.dupEntrez=TRUE) > > The error I get is: > > Error in rowQ(exprs(imat), which) : > cannot calculate order statistic on object with 2 columns > > The xenexp expression set includes two samples on the mgu74av2 array Hi Angela -- featureFilter tries to identify which duplicate ENTREZ id to remove by identifying the probeset with the largest interquartile range. The interquartile range is not defined for a sample of size 2, leading to the error above. From looking at the source for featureFilter > featureFilter function (eset, require.entrez = TRUE, require.GOBP = FALSE, require.GOCC = FALSE, require.GOMF = FALSE, require.CytoBand = FALSE, remove.dupEntrez = TRUE, feature.exclude = "^AFFX") { [...] you'll see that duplicate probes are removed by the lines if (remove.dupEntrez) { uniqGenes <- findLargest(featureNames(eset), rowIQRs(eset), annotation(eset)) eset <- eset[uniqGenes, ] } so after consulting ?findLargest you could use some statistic other than rowIQRs (row inter-quartile range) to select which probeset to retain, e.g., using the 'sample.ExpressionSet' data and select probesets with the largest range for subsequent analysis data(sample.ExpressionSet) eset <- sample.ExpressionSet rng <- apply(exprs(eset), 1, function(x) diff(range(x))) uniqGenes <- findLargest(featureNames(eset), rng, annotation(eset)) eset <- eset[uniqGenes,] You're asking to remove duplicate Entrez gene identifiers, rather than duplicate probesets; it is not uncommon to perform analysis without removing duplicates, anticipating in the results that probesets from the same gene will be qualitatively similar in the signal that they convey. Also the small sample size restricts the type of analysis possible anyway, so the usual motivation for removing duplicates -- reducing number of statistical tests -- may not be relevant. Martin > > Thank you so much, > > Angela > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793

ADD COMMENT • link 12.0 years ago Martin Morgan 25k

Login before adding your answer.