IQR implementation in varFiler and nsFilter (genefilter package)
1
0
Entering edit mode
James F. Reid ▴ 610
@james-f-reid-3148
Last seen 9.6 years ago
Dear list, I have noticed that nsFilter and varFilter from the genefilter package implement their respective default variance function (var.func = IQR) in different ways and I don't know if this is intended or not. The IQR function in nsFilter uses an apply IQR on the rows of the matrix whereas varFilter uses its own rowIQRs function which lead to different results. If this is intended I think it should be made clearer in the help page since both functions use the same default parameters for variance filtering. Here is an example with the Biobase sample.ExpressionSet followed by it's sessionInfo() Best, James Reid. > library("Biobase") Welcome to Bioconductor Vignettes contain introductory material. To view, type 'openVignette()'. To cite Bioconductor, see 'citation("Biobase")' and for packages 'citation(pkgname)'. > library("genefilter") > > data(sample.ExpressionSet) > > ## nsFilter using only var.filter > nsF <- nsFilter(sample.ExpressionSet, + require.entrez = FALSE, + remove.dupEntrez = FALSE, + feature.exclude = FALSE) > varF <- varFilter(sample.ExpressionSet) > > nrow(nsF$eset) == nrow(varF) Features TRUE > length(intersect(featureNames(nsF$eset), featureNames(varF))) [1] 245 > sessionInfo() R version 2.9.0 (2009-04-17) x86_64-redhat-linux-gnu locale: LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US .UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_N AME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTI FICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] genefilter_1.24.0 Biobase_2.4.1 loaded via a namespace (and not attached): [1] annotate_1.22.0 AnnotationDbi_1.6.0 DBI_0.2-4 [4] RSQLite_0.7-1 splines_2.9.0 survival_2.35-4 [7] tools_2.9.0 xtable_1.5-5
Biobase genefilter Biobase genefilter • 1.6k views
ADD COMMENT
0
Entering edit mode
Patrick Aboyoun ★ 1.6k
@patrick-aboyoun-6734
Last seen 9.6 years ago
United States
James, Thanks for pointing out this inconsistency in varFilter and nsFilter when var.func = IQR. I just checked in changes to genefilter in the BioC 2.4 (release) and BioC 2.5 (devel) branches that brings nsFilter in- line with varFilter. Since quantiles do not have a rigid definition, as the type argument to the quantile() function demonstrates, the varFilter and nsFilter defines IQR as rowQ(eset, ceiling(0.75 * ncol(eset))) - rowQ(eset, floor(0.25 * ncol(eset))) since this IQR calculation is relatively fast to compute and tends to work well when IQR-based filtering is appropriate. As with before, end-users can enter their own var.func, which could represent a different calculation of IQR. Cheers, Patrick James F. Reid wrote: > Dear list, > > I have noticed that nsFilter and varFilter from the genefilter package > implement their respective default variance function (var.func = IQR) > in different ways and I don't know if this is intended or not. The IQR > function in nsFilter uses an apply IQR on the rows of the matrix > whereas varFilter uses its own rowIQRs function which lead to > different results. > If this is intended I think it should be made clearer in the help page > since both functions use the same default parameters for variance > filtering. > > Here is an example with the Biobase sample.ExpressionSet followed by > it's sessionInfo() > > Best, > James Reid. > > > library("Biobase") > > Welcome to Bioconductor > > Vignettes contain introductory material. To view, type > 'openVignette()'. To cite Bioconductor, see > 'citation("Biobase")' and for packages 'citation(pkgname)'. > > > library("genefilter") > > > > data(sample.ExpressionSet) > > > > ## nsFilter using only var.filter > > nsF <- nsFilter(sample.ExpressionSet, > + require.entrez = FALSE, > + remove.dupEntrez = FALSE, > + feature.exclude = FALSE) > > varF <- varFilter(sample.ExpressionSet) > > > > nrow(nsF$eset) == nrow(varF) > Features > TRUE > > length(intersect(featureNames(nsF$eset), featureNames(varF))) > [1] 245 > > sessionInfo() > > R version 2.9.0 (2009-04-17) > x86_64-redhat-linux-gnu > > locale: > LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_ US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC _NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDEN TIFICATION=C > > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] genefilter_1.24.0 Biobase_2.4.1 > > loaded via a namespace (and not attached): > [1] annotate_1.22.0 AnnotationDbi_1.6.0 DBI_0.2-4 > [4] RSQLite_0.7-1 splines_2.9.0 survival_2.35-4 > [7] tools_2.9.0 xtable_1.5-5 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT

Login before adding your answer.

Traffic: 762 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6