nsFilter cutoff
1
0
Entering edit mode
james perkins ▴ 300
@james-perkins-2675
Last seen 9.6 years ago
Hi, I am finding the nsFilter IQR cutoff somewhat confusing. It says it is using IQR with a default cutoff of 0.5. This gives the impression that if you line up the data and take the value between the 0.25 and 0.75 quantiles, you would keep the probeset if this value was < 0.5 However this is not the case, so I would like to know how exactly does this work? Regards, James
• 956 views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 20 hours ago
United States
Hi James, james perkins wrote: > Hi, > > I am finding the nsFilter IQR cutoff somewhat confusing. > > It says it is using IQR with a default cutoff of 0.5. > > This gives the impression that if you line up the data and take the > value between the 0.25 and 0.75 quantiles, you would keep the probeset > if this value was < 0.5 > > However this is not the case, so I would like to know how exactly does > this work? Actually it _is_ the case - perhaps you misunderstand something. First, get all probesets with an IQR > 0.5 > T1 <- apply(exprs(sample.ExpressionSet), 1, IQR) > 0.5 Now do the same using nsFilter() > T2 <- nsFilter(sample.ExpressionSet, FALSE, filterByQuantile = FALSE, feature.exclude="", remove.dupEntrez = FALSE) Are they the same? > all.equal(featureNames(sample.ExpressionSet)[T1], featureNames(T2$eset)) [1] TRUE Best, Jim > > Regards, > > James > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623
ADD COMMENT
0
Entering edit mode
Hi James I meant when we have filterByQuantile as TRUE. In this case it seems to behave differently, and I can't figure out why, and I don't want to guess! Regards, Jim James W. MacDonald wrote: > Hi James, > > james perkins wrote: >> Hi, >> >> I am finding the nsFilter IQR cutoff somewhat confusing. >> >> It says it is using IQR with a default cutoff of 0.5. >> >> This gives the impression that if you line up the data and take the >> value between the 0.25 and 0.75 quantiles, you would keep the >> probeset if this value was < 0.5 >> >> However this is not the case, so I would like to know how exactly >> does this work? > > Actually it _is_ the case - perhaps you misunderstand something. > > First, get all probesets with an IQR > 0.5 > > T1 <- apply(exprs(sample.ExpressionSet), 1, IQR) > 0.5 > > Now do the same using nsFilter() > > T2 <- nsFilter(sample.ExpressionSet, FALSE, filterByQuantile = > FALSE, feature.exclude="", remove.dupEntrez = FALSE) > > Are they the same? > > all.equal(featureNames(sample.ExpressionSet)[T1], > featureNames(T2$eset)) > [1] TRUE > > Best, > > Jim > > > >> >> Regards, >> >> James >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD REPLY
0
Entering edit mode
Hi James, james perkins wrote: > Hi James > > I meant when we have filterByQuantile as TRUE. In this case it seems to > behave differently, and I can't figure out why, and I don't want to guess! OK. That's a different question. The details section of the help page explains this: Note that by default the numerical-filter cutoff is interpreted as a quantile, so leaving the default values intact would filter out 50% of the genes remaining at this stage. If you prefer to set the cutoff at some absolute threshold, change the value of 'varByQuantile' to 'FALSE', and modify 'var.cutoff' accordingly. And looking at the code should help further: if (var.filter) { esetIqr <- apply(exprs(eset), 1, var.func) if (filterByQuantile) { if (0 < var.cutoff && var.cutoff < 1) { var.cutoff = quantile(esetIqr, var.cutoff) } else stop("Cutoff Quantile has to be between 0 and 1.") } selected <- esetIqr > var.cutoff So if you leave varByQuantile = TRUE then after you do the annotation-based filtering (GO, Entrez Gene, AFFX probesets, duplicates), you will take what remains and filter out 50%. Does that help? Best, Jim > > Regards, > > Jim > > James W. MacDonald wrote: >> Hi James, >> >> james perkins wrote: >>> Hi, >>> >>> I am finding the nsFilter IQR cutoff somewhat confusing. >>> >>> It says it is using IQR with a default cutoff of 0.5. >>> >>> This gives the impression that if you line up the data and take the >>> value between the 0.25 and 0.75 quantiles, you would keep the >>> probeset if this value was < 0.5 >>> >>> However this is not the case, so I would like to know how exactly >>> does this work? >> >> Actually it _is_ the case - perhaps you misunderstand something. >> >> First, get all probesets with an IQR > 0.5 >> > T1 <- apply(exprs(sample.ExpressionSet), 1, IQR) > 0.5 >> >> Now do the same using nsFilter() >> > T2 <- nsFilter(sample.ExpressionSet, FALSE, filterByQuantile = >> FALSE, feature.exclude="", remove.dupEntrez = FALSE) >> >> Are they the same? >> > all.equal(featureNames(sample.ExpressionSet)[T1], >> featureNames(T2$eset)) >> [1] TRUE >> >> Best, >> >> Jim >> >> >> >>> >>> Regards, >>> >>> James >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623
ADD REPLY
0
Entering edit mode
Yes that makes perfect sense now. I thought this might be the case, but the additional filtering (by having Entrez id for example) meant that I didn't have half the number of initial probesets, which threw me a little. Thanks and regards, Jim James W. MacDonald wrote: > Hi James, > > james perkins wrote: >> Hi James >> >> I meant when we have filterByQuantile as TRUE. In this case it seems >> to behave differently, and I can't figure out why, and I don't want >> to guess! > > OK. That's a different question. The details section of the help page > explains this: > > Note that by default the numerical-filter cutoff is interpreted as > a quantile, so leaving the default values intact would filter out > 50% of the genes remaining at this stage. If you prefer to set the > cutoff at some absolute threshold, change the value of > 'varByQuantile' to 'FALSE', and modify 'var.cutoff' accordingly. > > And looking at the code should help further: > > > if (var.filter) { > esetIqr <- apply(exprs(eset), 1, var.func) > if (filterByQuantile) { > if (0 < var.cutoff && var.cutoff < 1) { > var.cutoff = quantile(esetIqr, var.cutoff) > } > else stop("Cutoff Quantile has to be between 0 and 1.") > } > selected <- esetIqr > var.cutoff > > So if you leave varByQuantile = TRUE then after you do the > annotation-based filtering (GO, Entrez Gene, AFFX probesets, > duplicates), you will take what remains and filter out 50%. > > Does that help? > > Best, > > Jim > > >> >> Regards, >> >> Jim >> >> James W. MacDonald wrote: >>> Hi James, >>> >>> james perkins wrote: >>>> Hi, >>>> >>>> I am finding the nsFilter IQR cutoff somewhat confusing. >>>> >>>> It says it is using IQR with a default cutoff of 0.5. >>>> >>>> This gives the impression that if you line up the data and take the >>>> value between the 0.25 and 0.75 quantiles, you would keep the >>>> probeset if this value was < 0.5 >>>> >>>> However this is not the case, so I would like to know how exactly >>>> does this work? >>> >>> Actually it _is_ the case - perhaps you misunderstand something. >>> >>> First, get all probesets with an IQR > 0.5 >>> > T1 <- apply(exprs(sample.ExpressionSet), 1, IQR) > 0.5 >>> >>> Now do the same using nsFilter() >>> > T2 <- nsFilter(sample.ExpressionSet, FALSE, filterByQuantile = >>> FALSE, feature.exclude="", remove.dupEntrez = FALSE) >>> >>> Are they the same? >>> > all.equal(featureNames(sample.ExpressionSet)[T1], >>> featureNames(T2$eset)) >>> [1] TRUE >>> >>> Best, >>> >>> Jim >>> >>> >>> >>>> >>>> Regards, >>>> >>>> James >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at stat.math.ethz.ch >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >
ADD REPLY

Login before adding your answer.

Traffic: 497 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6