objective criterion for identification of outlying arrays by pca
2
0
Entering edit mode
@richard-friedman-513
Last seen 7.1 years ago
Dear Bioconductor List, Does anyone know of an objective criterion for the identification of outlying arrays by pca? I usually do this subjectively. However the experimental investigator whom I am helping has a different subjective sense than I do, so that I wonder if there is a hard-and-fast criterion. Thanks and best wishes, Rich ------------------------------------------------------------ Richard A. Friedman, PhD Associate Research Scientist, Biomedical Informatics Shared Resource Herbert Irving Comprehensive Cancer Center (HICCC) Lecturer, Department of Biomedical Informatics (DBMI) Educational Coordinator, Center for Computational Biology and Bioinformatics (C2B2)/ National Center for Multiscale Analysis of Genomic Networks (MAGNet) Room 824 Irving Cancer Research Center Columbia University 1130 St. Nicholas Ave New York, NY 10032 (212)851-4765 (voice) friedman at cancercenter.columbia.edu http://cancercenter.columbia.edu/~friedman/ I am a Bayesian. When I see a multiple-choice question on a test and I don't know the answer I say "eeney-meaney-miney-moe". Rose Friedman, Age 14
Bayesian Cancer Bayesian Cancer • 858 views
0
Entering edit mode
@james-w-macdonald-5106
Last seen 2 days ago
United States
Hi Rich, On 11/2/2011 10:04 AM, Richard Friedman wrote: > Dear Bioconductor List, > > Does anyone know of an objective criterion for the identification > of outlying arrays > by pca? I don't know an objective criterion for this. However, unless the 'outlier' is ridiculously bad, you might be better off using array weights to down-weight the offending array(s). In limma, the arrayWeights() and arrayWeightsSimple() functions allow you to generate weights that you can then feed into lmFit(). Best, Jim > > I usually do this subjectively. However the experimental > investigator whom I am helping > has a different subjective sense than I do, so that I wonder if there > is a hard-and-fast criterion. > > Thanks and best wishes, > Rich > ------------------------------------------------------------ > Richard A. Friedman, PhD > Associate Research Scientist, > Biomedical Informatics Shared Resource > Herbert Irving Comprehensive Cancer Center (HICCC) > Lecturer, > Department of Biomedical Informatics (DBMI) > Educational Coordinator, > Center for Computational Biology and Bioinformatics (C2B2)/ > National Center for Multiscale Analysis of Genomic Networks (MAGNet) > Room 824 > Irving Cancer Research Center > Columbia University > 1130 St. Nicholas Ave > New York, NY 10032 > (212)851-4765 (voice) > friedman at cancercenter.columbia.edu > http://cancercenter.columbia.edu/~friedman/ > > I am a Bayesian. When I see a multiple-choice question on a test and I > don't > know the answer I say "eeney-meaney-miney-moe". > > Rose Friedman, Age 14 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
0
Entering edit mode
The Mahalanobis distance (also known as Hotelling's T^2 statistic) from the center of a D-dimensional principal component space (under some sensible null hypothesis) should follow a chi-squared distribution with D degrees of freedom. You can thus conduct a test for outliers based on the p-value associated with the chi-squared statistic. (We used this idea for QC in a serum proteomics study a long time ago: Coombes et al, Clin Chem 2003; 49:1615-23.) Kevin On 11/2/2011 9:11 AM, James W. MacDonald wrote: > Hi Rich, > > On 11/2/2011 10:04 AM, Richard Friedman wrote: >> Dear Bioconductor List, >> >> Does anyone know of an objective criterion for the identification >> of outlying arrays >> by pca? > > I don't know an objective criterion for this. However, unless the > 'outlier' is ridiculously bad, you might be better off using array > weights to down-weight the offending array(s). In limma, the > arrayWeights() and arrayWeightsSimple() functions allow you to > generate weights that you can then feed into lmFit(). > > Best, > > Jim > > >> >> I usually do this subjectively. However the experimental >> investigator whom I am helping >> has a different subjective sense than I do, so that I wonder if there >> is a hard-and-fast criterion. >> >> Thanks and best wishes, >> Rich >> ------------------------------------------------------------ >> Richard A. Friedman, PhD >> Associate Research Scientist, >> Biomedical Informatics Shared Resource >> Herbert Irving Comprehensive Cancer Center (HICCC) >> Lecturer, >> Department of Biomedical Informatics (DBMI) >> Educational Coordinator, >> Center for Computational Biology and Bioinformatics (C2B2)/ >> National Center for Multiscale Analysis of Genomic Networks (MAGNet) >> Room 824 >> Irving Cancer Research Center >> Columbia University >> 1130 St. Nicholas Ave >> New York, NY 10032 >> (212)851-4765 (voice) >> friedman at cancercenter.columbia.edu >> http://cancercenter.columbia.edu/~friedman/ >> >> I am a Bayesian. When I see a multiple-choice question on a test and >> I don't >> know the answer I say "eeney-meaney-miney-moe". >> >> Rose Friedman, Age 14 >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >
0
Entering edit mode
Dear Kevin and List, I read your paper with great interest but from the paper the method seems to be implemented mainly in Matlab. I am not a Matlab user, Is there a user-friendly R version that can be used with no more R-scripting on the part of the user than is typical of most bioconductor packages? Thanks and best wishes, Rich ------------------------------------------------------------ Richard A. Friedman, PhD Associate Research Scientist, Biomedical Informatics Shared Resource Herbert Irving Comprehensive Cancer Center (HICCC) Lecturer, Department of Biomedical Informatics (DBMI) Educational Coordinator, Center for Computational Biology and Bioinformatics (C2B2)/ National Center for Multiscale Analysis of Genomic Networks (MAGNet) Room 824 Irving Cancer Research Center Columbia University 1130 St. Nicholas Ave New York, NY 10032 (212)851-4765 (voice) friedman at cancercenter.columbia.edu http://cancercenter.columbia.edu/~friedman/ I am a Bayesian. When I see a multiple-choice question on a test and I don't know the answer I say "eeney-meaney-miney-moe". Rose Friedman, Age 14 On Nov 2, 2011, at 11:12 AM, Kevin R. Coombes wrote: > The Mahalanobis distance (also known as Hotelling's T^2 statistic) > from the center of a D-dimensional principal component space (under > some sensible null hypothesis) should follow a chi-squared > distribution with D degrees of freedom. You can thus conduct a test > for outliers based on the p-value associated with the chi-squared > statistic. (We used this idea for QC in a serum proteomics study a > long time ago: Coombes et al, Clin Chem 2003; 49:1615-23.) > > Kevin > > On 11/2/2011 9:11 AM, James W. MacDonald wrote: >> Hi Rich, >> >> On 11/2/2011 10:04 AM, Richard Friedman wrote: >>> Dear Bioconductor List, >>> >>> Does anyone know of an objective criterion for the >>> identification of outlying arrays >>> by pca? >> >> I don't know an objective criterion for this. However, unless the >> 'outlier' is ridiculously bad, you might be better off using array >> weights to down-weight the offending array(s). In limma, the >> arrayWeights() and arrayWeightsSimple() functions allow you to >> generate weights that you can then feed into lmFit(). >> >> Best, >> >> Jim >> >> >>> >>> I usually do this subjectively. However the experimental >>> investigator whom I am helping >>> has a different subjective sense than I do, so that I wonder if >>> there is a hard-and-fast criterion. >>> >>> Thanks and best wishes, >>> Rich >>> ------------------------------------------------------------ >>> Richard A. Friedman, PhD >>> Associate Research Scientist, >>> Biomedical Informatics Shared Resource >>> Herbert Irving Comprehensive Cancer Center (HICCC) >>> Lecturer, >>> Department of Biomedical Informatics (DBMI) >>> Educational Coordinator, >>> Center for Computational Biology and Bioinformatics (C2B2)/ >>> National Center for Multiscale Analysis of Genomic Networks (MAGNet) >>> Room 824 >>> Irving Cancer Research Center >>> Columbia University >>> 1130 St. Nicholas Ave >>> New York, NY 10032 >>> (212)851-4765 (voice) >>> friedman at cancercenter.columbia.edu >>> http://cancercenter.columbia.edu/~friedman/ >>> >>> I am a Bayesian. When I see a multiple-choice question on a test >>> and I don't >>> know the answer I say "eeney-meaney-miney-moe". >>> >>> Rose Friedman, Age 14 >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >>
0
Entering edit mode
\documentclass{article} \usepackage{graphicx} \usepackage{hyperref} \usepackage{cite} \pagestyle{myheadings} \markright{maha-test} \setlength{\topmargin}{0in} \setlength{\textheight}{8in} \setlength{\textwidth}{6.5in} \setlength{\oddsidemargin}{0in} \setlength{\evensidemargin}{0in} \def\rcode#1{\texttt{#1}} \def\fref#1{\textbf{Figure~\ref{#1}}} \def\tref#1{\textbf{Table~\ref{#1}}} \def\sref#1{\textbf{Section~\ref{#1}}} \title{PCA, Mahalnobis Distance, and Outliers} \author{Kevin R. Coombes} \date{4 November 2011} \begin{document} <<echo=false>>= options(width=88) options(SweaveHooks = list(fig = function() par(bg='white'))) if (!file.exists("Figures")) dir.create("Figures") @ \SweaveOpts{prefix.string=Figures/02-AML-27plex, eps=FALSE} \maketitle \tableofcontents \section{Simulated Data} We simulate a dataset. <<simdata>>= set.seed(564684) nSamples <- 30 nGenes <- 3000 dataset <- matrix(rnorm(nSamples*nGenes), ncol=nSamples, nrow=nGenes) dimnames(dataset) <- list(paste("G", 1:nGenes, sep=''), paste("S", 1:nSamples, sep='')) @ Now we make two of the entries into distinct outliers. <<liars>>= nShift <- 300 affected <- sample(nGenes, nShift) dataset[affected,1] <- dataset[affected,1] + rnorm(nShift, 1, 1) dataset[affected,2] <- dataset[affected,2] + rnorm(nShift, 1, 1) @ \section{PCA} We start with a principal components analysis (PCA) of this dataset. A plot of the samples against the first two principal components (PCs) shows two very clear outliers (\fref{spca1}). <<spca>>= library(ClassDiscovery) spca <- SamplePCA(dataset) @ \begin{figure} <<fig=true,echo=false>>= plot(spca) @ \caption{Principal components plot of the samples.} \label{spca1} \end{figure} We want to explore the possibility of an outlier more formally. First, we look at the cumulative amount of variance explained by the PCs: <<pc>>= round(cumsum(spca@variances)/sum(spca@variances), digits=2) @ We see that we need $20$ components in order to explain $70\%$ of the variation in the data. Next, we compute the Mahalanobis distance of each sample from the center of an $N$-dimensional principal component space. For each sample, we use the following function to compute its distance from the center of the space defined by the remaining samples. <<maha>>= mahalanobis <- function(spca, N) { ss <- spca@scores[, 1:N] maha <- sapply(1:nrow(ss), function(i) { v <- matrix(1/apply(ss[-i,], 2, var), ncol=1) as.vector(ss[i,]^2 %*% v) }) names(maha) <- rownames(spca@scores) pmaha <- 1-pchisq(maha, N) data.frame(statistic=maha, p.value=pmaha) } @ We apply this function using different numbers of components between $2$ and $20$. <<maah20>>= maha2 <- mahalanobis(spca, 2) maha5 <- mahalanobis(spca, 5) maha10 <- mahalanobis(spca, 10) maha20 <- mahalanobis(spca, 20) myd <- data.frame(maha2, maha5, maha10, maha20) colnames(myd) <- paste("N", rep(c(2, 5, 10, 20), each=2), rep(c(".statistic", ".p.value"), 4), sep='') @ The theory says that, under the null hypothesis that all samples arise from the same multivariate normal distribution, the distance from the center of a $d$-dimensional PC space should follow a chi-squared distribution with $d$ degrees of freedom. This theory lets us compute $p$-values associated with the Mahalanobis distances for each sample (\tref{maha}). <<results=tex, echo="FALSE">>= library(xtable) xtable(myd, digits=c(0, rep(c(1, 4),4)), align=paste("|l|",paste(rep("r",8), collapse=''),"|",sep=''), label="maha", caption=paste("Mahalanobis distance (with unadjusted p-values)", "of each sample from the center of", "N-dimensional principal component space.")) @ We see that the samples S1 and S2 are outliers, at least when we look at the first $2$, $5$, or, $10$ components. However, sample S2 is not quite significant (at the $5\%$ level) when we get out to $20$ components. This can occur when there are multiple outliers because of the inflated'' variance estimates coming from the outliers themselves. \clearpage \section{A Second Round} Now we repeat the PCA after removing the one definite outlier. Sample S2 still stands out as not like the others'' (\fref{spca2}). <<spca>>= reduced <- dataset[,-1] dim(reduced) spca <- SamplePCA(reduced) round(cumsum(spca@variances)/sum(spca@variances), digits=2) @ \begin{figure} <<fig=true,echo=false>>= plot(spca) @ \caption{Principal components plot of the normal control samples, after omitting an extreme outlier.} \label{spca2} \end{figure} And we can recompute the mahalanobis distances (\tref{maha2}). Here we see that evne out at the level of $20$ components, this sample remains an outlier. <<redmaha>>= maha20 <- mahalanobis(spca, 20) @ <<echo=false,results=tex>>= xtable(maha20, digits=c(0, 1, 4), align="|l|rr|", label="maha2", caption=paste("Mahalanobis distance (with unadjusted p-values)", "of each sample from the center of", "20-dimensional principal component space.")) @ \clearpage \section{A Final Round} We repeat the analysis after removing one more outlier. <<spca>>= red2 <- reduced[,-1] dim(red2) spca <- SamplePCA(red2) round(cumsum(spca@variances)/sum(spca@variances), digits=2) @ \begin{figure} <<fig=true,echo=false>>= plot(spca) @ \caption{Principal components plot of the normal control samples, after omitting an extreme outlier.} \label{spca3} \end{figure} And we can recompute the mahalanobis distances (\tref{maha3}). At this point, there are no outliers. <<redmaha>>= maha20 <- mahalanobis(spca, 20) @ <<echo=false,results=tex>>= xtable(maha20, digits=c(0, 1, 4), align="|l|rr|", label="maha3", caption=paste("Mahalanobis distance (with unadjusted p-values)", "of each sample from the center of", "20-dimensional principal component space.")) @ \section{Appendix} This analysis was performed in the following directory: <<getwd>>= getwd() @ This analysis was performed in the following software environment: <<si>>= sessionInfo() @ \end{document}
0
Entering edit mode
@vincent-j-carey-jr-4
Last seen 11 days ago
United States
you can read about formally calibrated outlier assessment for microarrays in http://bioinformatics.oxfordjournals.org/content/25/1/48 On Wed, Nov 2, 2011 at 10:04 AM, Richard Friedman < friedman@cancercenter.columbia.edu> wrote: > Dear Bioconductor List, > > Does anyone know of an objective criterion for the identification > of outlying arrays > by pca? > > I usually do this subjectively. However the experimental > investigator whom I am helping > has a different subjective sense than I do, so that I wonder if there is a > hard-and-fast criterion. > > Thanks and best wishes, > Rich > ------------------------------**------------------------------ > Richard A. Friedman, PhD > Associate Research Scientist, > Biomedical Informatics Shared Resource > Herbert Irving Comprehensive Cancer Center (HICCC) > Lecturer, > Department of Biomedical Informatics (DBMI) > Educational Coordinator, > Center for Computational Biology and Bioinformatics (C2B2)/ > National Center for Multiscale Analysis of Genomic Networks (MAGNet) > Room 824 > Irving Cancer Research Center > Columbia University > 1130 St. Nicholas Ave > New York, NY 10032 > (212)851-4765 (voice) > friedman@cancercenter.**columbia.edu <friedman@cancercenter.columbia.edu> > http://cancercenter.columbia.**edu/~friedman/<http: cancercenter.co="" lumbia.edu="" %7efriedman=""/> > > I am a Bayesian. When I see a multiple-choice question on a test and I > don't > know the answer I say "eeney-meaney-miney-moe". > > Rose Friedman, Age 14 > > ______________________________**_________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.et="" hz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: http://news.gmane.org/gmane.** > science.biology.informatics.**conductor<http: news.gmane.org="" gmane.="" science.biology.informatics.conductor=""> > [[alternative HTML version deleted]]
0
Entering edit mode
On Nov 2, 2011, at 10:16 AM, Vincent Carey wrote: > you can read about formally calibrated outlier assessment for > microarrays in > > http://bioinformatics.oxfordjournals.org/content/25/1/48 Dear Vincent and List, I read your paper with great interest. i will implement it for datasets for Affymetrix Chips with mismatch probes, to which it is geared, along with the weighting method suggested by Jim MacDonald, for arrays which pass the test. However a Gene ST1.0 dataset has come up with this kind of rejection question. Has you method be adapted to Gene ST1.0 arrays? If not, do you know of a method analogous to yours which can be used for ST1.0s. Thanks and best wishes, Rich ----------------------------------------------------------- Richard A. Friedman, PhD Associate Research Scientist, Biomedical Informatics Shared Resource Herbert Irving Comprehensive Cancer Center (HICCC) Lecturer, Department of Biomedical Informatics (DBMI) Educational Coordinator, Center for Computational Biology and Bioinformatics (C2B2)/ National Center for Multiscale Analysis of Genomic Networks (MAGNet) Room 824 Irving Cancer Research Center Columbia University 1130 St. Nicholas Ave New York, NY 10032 (212)851-4765 (voice) friedman at cancercenter.columbia.edu http://cancercenter.columbia.edu/~friedman/ I am a Bayesian. When I see a multiple-choice question on a test and I don't know the answer I say "eeney-meaney-miney-moe". Rose Friedman, Age 14 > > On Wed, Nov 2, 2011 at 10:04 AM, Richard Friedman <friedman at="" cancercenter.columbia.edu=""> > wrote: > Dear Bioconductor List, > > Does anyone know of an objective criterion for the > identification of outlying arrays > by pca? > > I usually do this subjectively. However the experimental > investigator whom I am helping > has a different subjective sense than I do, so that I wonder if > there is a hard-and-fast criterion. > -
0
Entering edit mode
On Fri, Nov 4, 2011 at 10:26 AM, Richard Friedman < friedman@cancercenter.columbia.edu> wrote: > > On Nov 2, 2011, at 10:16 AM, Vincent Carey wrote: > > you can read about formally calibrated outlier assessment for microarrays >> in >> >> http://bioinformatics.**oxfordjournals.org/content/25/**1/48<http: bioinformatics.oxfordjournals.org="" content="" 25="" 1="" 48=""> >> > > Dear Vincent and List, > > I read your paper with great interest. i will implement it for > datasets for Affymetrix Chips with > mismatch probes, to which it is geared, along with the weighting method > suggested by Jim MacDonald, for arrays which pass the test. However a Gene > ST1.0 dataset has come up with this kind of rejection question. Has you > method be adapted to Gene ST1.0 arrays? If not, do you know of > a method analogous to yours which can be used for ST1.0s. > There is a fair amount of generality in arrayMvout but it has not all been exercised to the same degree. The simplest uses work from affyBatch or lumiBatch instances and compute QA statistics tailored to the respective platforms. If you have a data frame of array-specific QA statistics for some other platform, arrayOutliers() will perform calibrated multivariate outlier detection with these measures. For the early affy arrays, NUSE and RLE statistics from affyPLM play a role; I don't know if these are readily computed for 1.0 ST arrays at this time. arrayMvout uses Mahalanobis-distance based procedures, but the distance is robustified by inward peeling to deal with masking problems that can arise with multiple outliers. Let's not forget the Bioconductor mdqc package, which also deals with a differently robustified Mahalanobis distance for this purpose. > > Thanks and best wishes, > Rich > ------------------------------**----------------------------- > Richard A. Friedman, PhD > Associate Research Scientist, > Biomedical Informatics Shared Resource > Herbert Irving Comprehensive Cancer Center (HICCC) > Lecturer, > Department of Biomedical Informatics (DBMI) > Educational Coordinator, > Center for Computational Biology and Bioinformatics (C2B2)/ > National Center for Multiscale Analysis of Genomic Networks (MAGNet) > Room 824 > Irving Cancer Research Center > Columbia University > 1130 St. Nicholas Ave > New York, NY 10032 > (212)851-4765 (voice) > friedman@cancercenter.**columbia.edu <friedman@cancercenter.columbia.edu> > http://cancercenter.columbia.**edu/~friedman/<http: cancercenter.co="" lumbia.edu="" %7efriedman=""/> > > I am a Bayesian. When I see a multiple-choice question on a test and I > don't > know the answer I say "eeney-meaney-miney-moe". > > Rose Friedman, Age 14 > > > > > > >> On Wed, Nov 2, 2011 at 10:04 AM, Richard Friedman <friedman@cancercenter.>> **columbia.edu <friedman@cancercenter.columbia.edu>> wrote: >> Dear Bioconductor List, >> >> Does anyone know of an objective criterion for the identification >> of outlying arrays >> by pca? >> >> I usually do this subjectively. However the experimental >> investigator whom I am helping >> has a different subjective sense than I do, so that I wonder if there is >> a hard-and-fast criterion. >> >> > > > - > > > > > > > [[alternative HTML version deleted]]
0
Entering edit mode
oligo, as Richard already tried, does offer some affyPLM features (NUSE+RLE) for (but not only) ST arrays. At the probeset level, however, NUSE boxplots are very asymmetric and I believe this is associated to the control probesets (I need to make some time to check this and allow some filtering if this is the case). b On 4 November 2011 15:03, Vincent Carey <stvjc at="" channing.harvard.edu=""> wrote: > On Fri, Nov 4, 2011 at 10:26 AM, Richard Friedman < > friedman at cancercenter.columbia.edu> wrote: > >> >> On Nov 2, 2011, at 10:16 AM, Vincent Carey wrote: >> >> ?you can read about formally calibrated outlier assessment for microarrays >>> in >>> >>> http://bioinformatics.**oxfordjournals.org/content/25/**1/48<http: bioinformatics.oxfordjournals.org="" content="" 25="" 1="" 48=""> >>> >> >> Dear Vincent and List, >> >> ? ? ? ?I read your paper with great interest. i will implement it for >> datasets for Affymetrix Chips with >> mismatch probes, to which it is geared, along with the weighting method >> suggested by Jim MacDonald, for arrays which pass the test. However a Gene >> ST1.0 dataset has come up with this kind of rejection question. Has you >> method be adapted to Gene ST1.0 arrays? If not, do you know of >> a method analogous to yours which can be used for ST1.0s. >> > > There is a fair amount of generality in arrayMvout but it has not all been > exercised to the same degree. ?The simplest uses work from affyBatch or > lumiBatch instances and compute QA statistics tailored to the respective > platforms. ?If you have a data frame of array-specific QA statistics for > some other platform, arrayOutliers() will perform calibrated multivariate > outlier detection with these measures. ?For the early affy arrays, NUSE and > RLE statistics from affyPLM play a role; I don't know if these are readily > computed for 1.0 ST arrays at this time. ?arrayMvout uses > Mahalanobis-distance based procedures, but the distance is robustified by > inward peeling to deal with masking problems that can arise with multiple > outliers. > > Let's not forget the ?Bioconductor mdqc package, which also deals with a > differently robustified Mahalanobis distance for this purpose. > > >> >> Thanks and best wishes, >> Rich >> ------------------------------**----------------------------- >> Richard A. Friedman, PhD >> Associate Research Scientist, >> Biomedical Informatics Shared Resource >> Herbert Irving Comprehensive Cancer Center (HICCC) >> Lecturer, >> Department of Biomedical Informatics (DBMI) >> Educational Coordinator, >> Center for Computational Biology and Bioinformatics (C2B2)/ >> National Center for Multiscale Analysis of Genomic Networks (MAGNet) >> Room 824 >> Irving Cancer Research Center >> Columbia University >> 1130 St. Nicholas Ave >> New York, NY 10032 >> (212)851-4765 (voice) >> friedman at cancercenter.**columbia.edu <friedman at="" cancercenter.columbia.edu=""> >> http://cancercenter.columbia.**edu/~friedman/<http: cancercenter.c="" olumbia.edu="" %7efriedman=""/> >> >> I am a Bayesian. When I see a multiple-choice question on a test and I >> don't >> know the answer I say "eeney-meaney-miney-moe". >> >> Rose Friedman, Age 14 >> >> >> >> >> >> >>> On Wed, Nov 2, 2011 at 10:04 AM, Richard Friedman <friedman at="" cancercenter.="">>> **columbia.edu <friedman at="" cancercenter.columbia.edu="">> wrote: >>> Dear Bioconductor List, >>> >>> ? ? ? Does anyone know of an objective criterion for the identification >>> of outlying arrays >>> by pca? >>> >>> ? ? ? I usually do this subjectively. However the experimental >>> investigator whom I am helping >>> has a different subjective sense than I do, so that I wonder if there is >>> a hard-and-fast criterion. >>> >>> >> >> >> - >> >> >> >> >> >> >> > > ? ? ? ?[[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >
0
Entering edit mode
Benilton, Vincent, and List: Will the extreme asymmetry of the NUSE plot compromise its utility in arrayMvout ? Thanks and best wishes, Rich On Nov 4, 2011, at 11:24 AM, Benilton Carvalho wrote: > oligo, as Richard already tried, does offer some affyPLM features > (NUSE+RLE) for (but not only) ST arrays. At the probeset level, > however, NUSE boxplots are very asymmetric and I believe this is > associated to the control probesets (I need to make some time to check > this and allow some filtering if this is the case). > > b > > On 4 November 2011 15:03, Vincent Carey <stvjc at="" channing.harvard.edu=""> > wrote: >> On Fri, Nov 4, 2011 at 10:26 AM, Richard Friedman < >> friedman at cancercenter.columbia.edu> wrote: >> >>> >>> On Nov 2, 2011, at 10:16 AM, Vincent Carey wrote: >>> >>> you can read about formally calibrated outlier assessment for >>> microarrays >>>> in >>>> >>>> http://bioinformatics.**oxfordjournals.org/content/25/**1/48<http :="" bioinformatics.oxfordjournals.org="" content="" 25="" 1="" 48="">>>> > >>>> >>> >>> Dear Vincent and List, >>> >>> I read your paper with great interest. i will implement it >>> for >>> datasets for Affymetrix Chips with >>> mismatch probes, to which it is geared, along with the weighting >>> method >>> suggested by Jim MacDonald, for arrays which pass the test. >>> However a Gene >>> ST1.0 dataset has come up with this kind of rejection question. >>> Has you >>> method be adapted to Gene ST1.0 arrays? If not, do you know of >>> a method analogous to yours which can be used for ST1.0s. >>> >> >> There is a fair amount of generality in arrayMvout but it has not >> all been >> exercised to the same degree. The simplest uses work from >> affyBatch or >> lumiBatch instances and compute QA statistics tailored to the >> respective >> platforms. If you have a data frame of array-specific QA >> statistics for >> some other platform, arrayOutliers() will perform calibrated >> multivariate >> outlier detection with these measures. For the early affy arrays, >> NUSE and >> RLE statistics from affyPLM play a role; I don't know if these are >> readily >> computed for 1.0 ST arrays at this time. arrayMvout uses >> Mahalanobis-distance based procedures, but the distance is >> robustified by >> inward peeling to deal with masking problems that can arise with >> multiple >> outliers. >> >> Let's not forget the Bioconductor mdqc package, which also deals >> with a >> differently robustified Mahalanobis distance for this purpose. >> >> >>> >>> Thanks and best wishes, >>> Rich >>> ------------------------------**----------------------------- >>> Richard A. Friedman, PhD >>> Associate Research Scientist, >>> Biomedical Informatics Shared Resource >>> Herbert Irving Comprehensive Cancer Center (HICCC) >>> Lecturer, >>> Department of Biomedical Informatics (DBMI) >>> Educational Coordinator, >>> Center for Computational Biology and Bioinformatics (C2B2)/ >>> National Center for Multiscale Analysis of Genomic Networks (MAGNet) >>> Room 824 >>> Irving Cancer Research Center >>> Columbia University >>> 1130 St. Nicholas Ave >>> New York, NY 10032 >>> (212)851-4765 (voice) >>> friedman at cancercenter.**columbia.edu <friedman at="" cancercenter.columbia.edu="">>> > >>> http://cancercenter.columbia.**edu/~friedman/<http: cancercenter.="" columbia.edu="" %7efriedman="">>> > >>> >>> I am a Bayesian. When I see a multiple-choice question on a test >>> and I >>> don't >>> know the answer I say "eeney-meaney-miney-moe". >>> >>> Rose Friedman, Age 14 >>> >>> >>> >>> >>> >>> >>>> On Wed, Nov 2, 2011 at 10:04 AM, Richard Friedman >>>> <friedman at="" cancercenter.="">>>> **columbia.edu <friedman at="" cancercenter.columbia.edu="">> wrote: >>>> Dear Bioconductor List, >>>> >>>> Does anyone know of an objective criterion for the >>>> identification >>>> of outlying arrays >>>> by pca? >>>> >>>> I usually do this subjectively. However the experimental >>>> investigator whom I am helping >>>> has a different subjective sense than I do, so that I wonder if >>>> there is >>>> a hard-and-fast criterion. >>>> >>>> >>> >>> >>> - >>> >>> >>> >>> >>> >>> >>> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >>
0
Entering edit mode
Maybe, package xps could be useful. It offers many quality measures, which you can also use with Gene ST 1.0 arrays, including NUSE, RLE, pseudo-images, RNA degradation plots, COI-plots, MAD-plots, pca-plots, etc, see the vignette xps.pdf. Best regards Christian _._._._._._._._._._._._._._._._._._ C.h.r.i.s.t.i.a.n S.t.r.a.t.o.w.a V.i.e.n.n.a A.u.s.t.r.i.a e.m.a.i.l: cstrato at aon.at _._._._._._._._._._._._._._._._._._ On 11/4/11 4:03 PM, Vincent Carey wrote: > On Fri, Nov 4, 2011 at 10:26 AM, Richard Friedman< > friedman at cancercenter.columbia.edu> wrote: > >> >> On Nov 2, 2011, at 10:16 AM, Vincent Carey wrote: >> >> you can read about formally calibrated outlier assessment for microarrays >>> in >>> >>> http://bioinformatics.**oxfordjournals.org/content/25/**1/48<http: bioinformatics.oxfordjournals.org="" content="" 25="" 1="" 48=""> >>> >> >> Dear Vincent and List, >> >> I read your paper with great interest. i will implement it for >> datasets for Affymetrix Chips with >> mismatch probes, to which it is geared, along with the weighting method >> suggested by Jim MacDonald, for arrays which pass the test. However a Gene >> ST1.0 dataset has come up with this kind of rejection question. Has you >> method be adapted to Gene ST1.0 arrays? If not, do you know of >> a method analogous to yours which can be used for ST1.0s. >> > > There is a fair amount of generality in arrayMvout but it has not all been > exercised to the same degree. The simplest uses work from affyBatch or > lumiBatch instances and compute QA statistics tailored to the respective > platforms. If you have a data frame of array-specific QA statistics for > some other platform, arrayOutliers() will perform calibrated multivariate > outlier detection with these measures. For the early affy arrays, NUSE and > RLE statistics from affyPLM play a role; I don't know if these are readily > computed for 1.0 ST arrays at this time. arrayMvout uses > Mahalanobis-distance based procedures, but the distance is robustified by > inward peeling to deal with masking problems that can arise with multiple > outliers. > > Let's not forget the Bioconductor mdqc package, which also deals with a > differently robustified Mahalanobis distance for this purpose. > > >> >> Thanks and best wishes, >> Rich >> ------------------------------**----------------------------- >> Richard A. Friedman, PhD >> Associate Research Scientist, >> Biomedical Informatics Shared Resource >> Herbert Irving Comprehensive Cancer Center (HICCC) >> Lecturer, >> Department of Biomedical Informatics (DBMI) >> Educational Coordinator, >> Center for Computational Biology and Bioinformatics (C2B2)/ >> National Center for Multiscale Analysis of Genomic Networks (MAGNet) >> Room 824 >> Irving Cancer Research Center >> Columbia University >> 1130 St. Nicholas Ave >> New York, NY 10032 >> (212)851-4765 (voice) >> friedman at cancercenter.**columbia.edu<friedman at="" cancercenter.columbia.edu=""> >> http://cancercenter.columbia.**edu/~friedman/<http: cancercenter.c="" olumbia.edu="" %7efriedman=""/> >> >> I am a Bayesian. When I see a multiple-choice question on a test and I >> don't >> know the answer I say "eeney-meaney-miney-moe". >> >> Rose Friedman, Age 14 >> >> >> >> >> >> >>> On Wed, Nov 2, 2011 at 10:04 AM, Richard Friedman<friedman at="" cancercenter.="">>> **columbia.edu<friedman at="" cancercenter.columbia.edu="">> wrote: >>> Dear Bioconductor List, >>> >>> Does anyone know of an objective criterion for the identification >>> of outlying arrays >>> by pca? >>> >>> I usually do this subjectively. However the experimental >>> investigator whom I am helping >>> has a different subjective sense than I do, so that I wonder if there is >>> a hard-and-fast criterion. >>> >>> >> >> >> - >> >> >> >> >> >> >> > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >