Question: Inconsistent Subseting using norm2Filter in FlowCore
0
3.7 years ago by
peterfoster20
United States
peterfoster20 wrote:

I recently discovered that the application of (at least) norm2Filter() is not consistent when replicated.  I've pasted an example below.  In the example dataset the differences are small--just a few events.  In my much larger experimental datasets, however, the number of events changes by the hundreds and can significantly alter some of the downstream analysis.

n2f <- norm2Filter(filterId="myNorm2Filter", x=list("FSC-H", "SSC-H"), scale.factor=1)
xyplot(FSC-H~SSC-H, data=dat, filter=n2f, smooth=FALSE, xbin=256, stats=TRUE)
## Same filter, inconsistent subsetting.
sapply(1:15, function(x) {  fres <- Subset(dat, n2f); return(nrow(fres))  })

I soon realized that if I set.seed() prior to the subset, the issue goes away, and the same number of events (and presumably the same ones) are returned each time.

sapply(1:15, function(x) {  set.seed(1); fres <- Subset(dat, n2f); return(nrow(fres))  })

Is this because the Subset() command in combination with the norm2Filter() is using some kind of "training set" which is randomly selected?  How can I modify the norm2Filter() and/or Subset() functions to use the WHOLE dataset so that my analysis is not sensitive to the RNG?

flowcore filter subsetting gate • 502 views
modified 3.7 years ago by Jiang, Mike1.2k • written 3.7 years ago by peterfoster20
Answer: Inconsistent Subseting using norm2Filter in FlowCore
0
3.7 years ago by
Jiang, Mike1.2k
Jiang, Mike1.2k wrote:

%in% method for norm2filter (the actual computing engine dispatched by 'Subset' method) uses 'CovMcd' function ('rrcov' package) to estimate the covariance matrix. 'CovMcd' does use random seed to sample data by default. I don't think we should change that behavior.

What you did was right: set seed explicitly before 'Subset` call.