Question

Inconsistent Subseting using norm2Filter in FlowCore

0

Entering edit mode

peterfoster ▴ 20

@peterfoster-7470

Last seen 8.6 years ago

United States

I recently discovered that the application of (at least) `norm2Filter()` is not consistent when replicated. I've pasted an example below. In the example dataset the differences are small--just a few events. In my much larger experimental datasets, however, the number of events changes by the hundreds and can significantly alter some of the downstream analysis.

## Loading example data
dat <- read.FCS(system.file("extdata","0877408774.B08", package="flowCore"))
n2f <- norm2Filter(filterId="myNorm2Filter", x=list("FSC-H", "SSC-H"), scale.factor=1)
xyplot(`FSC-H`~`SSC-H`, data=dat, filter=n2f, smooth=FALSE, xbin=256, stats=TRUE)
## Same filter, inconsistent subsetting.
sapply(1:15, function(x) {  fres <- Subset(dat, n2f); return(nrow(fres))  })

I soon realized that if I `set.seed()` prior to the subset, the issue goes away, and the same number of events (and presumably the same ones) are returned each time.

sapply(1:15, function(x) {  set.seed(1); fres <- Subset(dat, n2f); return(nrow(fres))  })

Is this because the `Subset()` command in combination with the `norm2Filter()` is using some kind of "training set" which is randomly selected? How can I modify the `norm2Filter()` and/or `Subset()` functions to use the WHOLE dataset so that my analysis is not sensitive to the RNG?

flowcore gate subsetting filter • 1.3k views

ADD COMMENT • link updated 8.6 years ago by Jiang, Mike ★ 1.3k • written 8.6 years ago by peterfoster ▴ 20

score 0 · Answer 1 · 2015-09-18

`%in%` method for `norm2filter` (the actual computing engine dispatched by 'Subset' method) uses 'CovMcd' function ('rrcov' package) to estimate the covariance matrix. 'CovMcd' does use random seed to sample data by default. I don't think we should change that behavior.

What you did was right: set seed explicitly before 'Subset` call.