Search
Question: WARNING: remove set.seed usage in R code
0
3 months ago by
xyluo19910 wrote:

Dear All,

Bioccheck gave me a warning of " WARNING: Remove set.seed usage in R code" because I used set.seed(seed) in my R function to reproduce results, where "seed" is an argument of my R function and can be specified by the user. To bypass Bioccheck, I have to remove the set.seed(seed) in my R function, but I want to let the user have the option to select a seed in my R function. How should I resolve this issue?

Thank you very much for your help!

Best,

Xiangyu

modified 3 months ago by shepherl ♦♦ 830 • written 3 months ago by xyluo19910
1
3 months ago by
shepherl ♦♦ 830
United States
shepherl ♦♦ 830 wrote:

Generally we recommend the set.seed be done in the documentation and outside the function. Not only does this clearly display to the user that a seed is used, but an explanation of why the seed is used can also be provided to the user.


x <- function(){ some code}

set.seed(123)

x()


You could keep the seed argument in your functions and clearly document. When you are submitting your package to the issue tracker, explain to the reviewer why a seed is set and your justification for keeping it in the function. It will be at your reviewers discretion if this will be allowed or not and they may insist on the former solution.

2

I often perform simulations with randomly generated data to test the performance of various algorithms. I usually generate some data, test the method and compute some measure of performance; and repeat this for several iterations to ensure that I get representative estimates of the metric of interest. At one point, I noticed that the standard deviation of my metrics was extremely low. Why? Because someone had put set.seed inside their function, which affects the entire R session after the function call - this meant that my "randomly" generated data was always the same after the second iteration!

In short, it's always easy for users to call set.seed if they want to. But putting the set.seed inside functions can quietly lead to surprising side-effects in downstream code involving randomness. Moreover, it's much harder to "uncall" set.seed. Hence the advice from BiocCheck to not put set.seed inside the function.

The proper way would be to test whether .Random.seed exists, save and restore it upon exit. In my packages (which live on CRAN, not Bioconductor) I also tend to allow the user to request that the random seed not be set within the function, by supplying NULL to the argument randomSeed below.

foo = function(..., randomSeed=1)
{
if (!is.null(randomSeed)) {
if (exists(".Random.seed")) {
savedSeed = .Random.seed
on.exit(.Random.seed <<-savedSeed)
}
set.seed(randomSeed)
}
actual code...
}

It seems by setting the random seed (I don't know the context, so could be off-base here) you're somehow overstating the reproducibility of foo() in the manner illustrated by Aaron's anecdote; it seems better to have NULL as the default?

Artificial, but

f = function() {
.Random.seed <- 1
function() {
seed <- .Random.seed
on.exit(.Random.seed <<- seed)
rnorm(10)
}
}


modifies the .Random.seed of the generator.

set.seed(123)
xx <- .Random.seed
res <- f()()
identical(xx, .Random.seed)  # FALSE


Maybe it's safer (since the user can manipulate the parent environment but not the location of .GlobalEnv in the search() path) with

f = function() {
.Random.seed <- 1
function() {
seed <- get(".Random.seed", 1)
on.exit(assign(".Random.seed", seed, 1))
rnorm(10)
}
}


You're right, I didn't think of the possibility of calling code defining its own .Random.seed (which is probably not very frequent but certainly possible).