Removing duplicate probes from expressionset
1
0
Entering edit mode
@angela-mcdonald-5231
Last seen 9.6 years ago
Hello, I am wondering how to remove duplicate probes from an expression set in Bioconductor. I have tried to use nsFilter with no success. When I use the following: featureFilter(xenexp, require.entrez=TRUE, remove.dupEntrez=TRUE) The error I get is: Error in rowQ(exprs(imat), which) : cannot calculate order statistic on object with 2 columns The xenexp expression set includes two samples on the mgu74av2 array Thank you so much, Angela [[alternative HTML version deleted]]
mgu74av2 mgu74av2 • 1.7k views
ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 5 days ago
United States
On 04/14/2012 03:59 PM, Angela McDonald wrote: > Hello, > > I am wondering how to remove duplicate probes from an expression set in Bioconductor. I have tried to use nsFilter with no success. > > When I use the following: > > featureFilter(xenexp, require.entrez=TRUE, remove.dupEntrez=TRUE) > > The error I get is: > > Error in rowQ(exprs(imat), which) : > cannot calculate order statistic on object with 2 columns > > The xenexp expression set includes two samples on the mgu74av2 array Hi Angela -- featureFilter tries to identify which duplicate ENTREZ id to remove by identifying the probeset with the largest interquartile range. The interquartile range is not defined for a sample of size 2, leading to the error above. From looking at the source for featureFilter > featureFilter function (eset, require.entrez = TRUE, require.GOBP = FALSE, require.GOCC = FALSE, require.GOMF = FALSE, require.CytoBand = FALSE, remove.dupEntrez = TRUE, feature.exclude = "^AFFX") { [...] you'll see that duplicate probes are removed by the lines if (remove.dupEntrez) { uniqGenes <- findLargest(featureNames(eset), rowIQRs(eset), annotation(eset)) eset <- eset[uniqGenes, ] } so after consulting ?findLargest you could use some statistic other than rowIQRs (row inter-quartile range) to select which probeset to retain, e.g., using the 'sample.ExpressionSet' data and select probesets with the largest range for subsequent analysis data(sample.ExpressionSet) eset <- sample.ExpressionSet rng <- apply(exprs(eset), 1, function(x) diff(range(x))) uniqGenes <- findLargest(featureNames(eset), rng, annotation(eset)) eset <- eset[uniqGenes,] You're asking to remove duplicate Entrez gene identifiers, rather than duplicate probesets; it is not uncommon to perform analysis without removing duplicates, anticipating in the results that probesets from the same gene will be qualitatively similar in the signal that they convey. Also the small sample size restricts the type of analysis possible anyway, so the usual motivation for removing duplicates -- reducing number of statistical tests -- may not be relevant. Martin > > Thank you so much, > > Angela > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793
ADD COMMENT

Login before adding your answer.

Traffic: 660 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6