Question on filtering in the Category package
3
0
Entering edit mode
Boel Brynedal ▴ 200
@boel-brynedal-2091
Last seen 9.7 years ago
Dear list, I have a theoretical question regarding Filtering on Variation in the Category package. I've performed an analysis that closely resembles the Vignette, but I am still a bit uncertain about the filtering. In the Vignette the following code is used: lowQ<-rowQ(eset,floor(0.25*NumArrays)) upQ<-rowQ(eset,ceiling(0.75*NumArrays)) iqrs<-upQ-lowQ select<-(upQ-lowQ)>0.5 My question is, why is this filtering necessary? I have performed my analysis without filtering, and the results where strange. My guess is that this filtering is intended to eliminate the probe- sets that aren't expressed at all (and would cause category's containing them to be associated). But the reason for eliminating the probe-sets with the highest variability is less clear for me. Would these include probe- sets where something has gone wrong, or probe-sets that are not expressed at all in some, but not all, arrays? What have I missed? What kind of filtering are you using, and why? Is there an article out there discussing the variability, and cause of the variability, on arrays? Any comments would be helpful. Thank you! Best, Boel Brynedal --~*~**~***~*~***~**~*~-- Boel Brynedal, MSc, PhD student Karolinska Institutet Department of Clinical neuroscience Karolinska University hospital Huddinge Division of Neurology, R54 141 86 Stockholm SWEDEN Phone: +46 8 585 819 27 Fax: +46 8 585 870 80 E-mail: boel.brynedal at ki.se
• 693 views
ADD COMMENT
0
Entering edit mode
Seth Falcon ★ 7.4k
@seth-falcon-992
Last seen 9.7 years ago
Hi Boel, Boel Brynedal <boel.brynedal at="" ki.se=""> writes: > I have a theoretical question regarding Filtering on Variation in the > Category package. I've performed an analysis that closely resembles the > Vignette, but I am still a bit uncertain about the filtering. > In the Vignette the following code is used: > lowQ<-rowQ(eset,floor(0.25*NumArrays)) > upQ<-rowQ(eset,ceiling(0.75*NumArrays)) > iqrs<-upQ-lowQ > select<-(upQ-lowQ)>0.5 > > My question is, why is this filtering necessary? I have performed my > analysis without filtering, and the results where strange. If you inflate your universe of possible genes with genes that essentially cannot end up in your selected gene list, then you will get strange results. > My guess is that this filtering is intended to eliminate the probe- sets > that aren't expressed at all (and would cause category's containing them > to be associated). But the reason for eliminating the probe-sets with > the highest variability is less clear for me. Would these include probe- > sets where something has gone wrong, or probe-sets that are not > expressed at all in some, but not all, arrays? > What have I missed? probesets with _low_ variance across samples are eliminated. The high variance ones are kept. Take a look at the GOstats vignette from a recent version. There is a new function nsFilter() that makes the filtering easier to perform and the vignette and man page discuss some details. nsFitler is in the genefilter package. You didn't tell us sessionInfo(), but I think you are not using the current release of R and BioC packages. It would be good to upgrade. + seth -- Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center BioC: http://bioconductor.org/ Blog: http://userprimary.net/user/
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 5 hours ago
United States
Hi Boel, Boel Brynedal wrote: > Dear list, > > I have a theoretical question regarding Filtering on Variation in the > Category package. I've performed an analysis that closely resembles the > Vignette, but I am still a bit uncertain about the filtering. > In the Vignette the following code is used: > lowQ<-rowQ(eset,floor(0.25*NumArrays)) > upQ<-rowQ(eset,ceiling(0.75*NumArrays)) > iqrs<-upQ-lowQ > select<-(upQ-lowQ)>0.5 > > My question is, why is this filtering necessary? I have performed my > analysis without filtering, and the results where strange. > My guess is that this filtering is intended to eliminate the probe- sets > that aren't expressed at all (and would cause category's containing them > to be associated). But the reason for eliminating the probe-sets with > the highest variability is less clear for me. Would these include probe- > sets where something has gone wrong, or probe-sets that are not > expressed at all in some, but not all, arrays? > What have I missed? I think you misunderstand the filtering being done here. This doesn't remove probesets with variance greater than the 75th percentile. Instead, it selects probesets with an inter-quartile range greater than 0.5. This is a non-parametric estimate of the variance for each probeset, and won't be adversely affected by outliers (unless you have lots of them, in which case they really aren't outliers ;-D). This is a pretty reasonable way to filter probesets, as it protects against a single outlier making it look like there is a lot of variability in the expression values. Best, Jim > > What kind of filtering are you using, and why? > > Is there an article out there discussing the variability, and cause of > the variability, on arrays? > > Any comments would be helpful. > Thank you! > > Best, > Boel Brynedal > > > --~*~**~***~*~***~**~*~-- > Boel Brynedal, MSc, PhD student > Karolinska Institutet > Department of Clinical neuroscience > > Karolinska University hospital Huddinge > Division of Neurology, R54 > 141 86 Stockholm > SWEDEN > Phone: +46 8 585 819 27 > Fax: +46 8 585 870 80 > E-mail: boel.brynedal at ki.se > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623
ADD COMMENT
0
Entering edit mode
@wolfgang-huber-3550
Last seen 25 days ago
EMBL European Molecular Biology Laborat…
Dear Boel, as you say, the filtering is intended to eliminate the probe-sets whose targets are not expressed and hence that are uninformative / only contribute noise. > But the reason for eliminating the probe-sets with > the highest variability is less clear for me. Perhaps you misunderstood the below example code, it only eliminates low-variability probesets and keeps the high-variability ones. Please give it a second look. Best wishes Wolfgang ------------------------------------------------------------------ Wolfgang Huber EBI/EMBL Cambridge UK http://www.ebi.ac.uk/huber Boel Brynedal ha scritto: > Dear list, > > I have a theoretical question regarding Filtering on Variation in the > Category package. I've performed an analysis that closely resembles the > Vignette, but I am still a bit uncertain about the filtering. > In the Vignette the following code is used: > lowQ<-rowQ(eset,floor(0.25*NumArrays)) > upQ<-rowQ(eset,ceiling(0.75*NumArrays)) > iqrs<-upQ-lowQ > select<-(upQ-lowQ)>0.5 > > My question is, why is this filtering necessary? I have performed my > analysis without filtering, and the results where strange. > My guess is that this filtering is intended to eliminate the probe- sets > that aren't expressed at all (and would cause category's containing them > to be associated). But the reason for eliminating the probe-sets with > the highest variability is less clear for me. Would these include probe- > sets where something has gone wrong, or probe-sets that are not > expressed at all in some, but not all, arrays? > What have I missed? > > What kind of filtering are you using, and why? > > Is there an article out there discussing the variability, and cause of > the variability, on arrays? > > Any comments would be helpful. > Thank you! > > Best, > Boel Brynedal
ADD COMMENT

Login before adding your answer.

Traffic: 570 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6