Question

WARNING: remove set.seed usage in R code

0

Entering edit mode

xyluo1991 • 0

@xyluo1991-16247

Last seen 5.7 years ago

Dear All,

Bioccheck gave me a warning of " WARNING: Remove set.seed usage in R code" because I used set.seed(seed) in my R function to reproduce results, where "seed" is an argument of my R function and can be specified by the user. To bypass Bioccheck, I have to remove the set.seed(seed) in my R function, but I want to let the user have the option to select a seed in my R function. How should I resolve this issue?

Thank you very much for your help!

Best,

Xiangyu

R bioccheck • 4.2k views

ADD COMMENT • link updated 5.8 years ago by shepherl 3.8k • written 5.8 years ago by xyluo1991 • 0

score 1 · Answer 1 · 2018-06-27

1

Entering edit mode

shepherl 3.8k

@lshep

Last seen 9 hours ago

United States

Generally we recommend the set.seed be done in the documentation and outside the function. Not only does this clearly display to the user that a seed is used, but an explanation of why the seed is used can also be provided to the user.


x <- function(){ some code}

 set.seed(123)

x()

You could keep the seed argument in your functions and clearly document. When you are submitting your package to the issue tracker, explain to the reviewer why a seed is set and your justification for keeping it in the function. It will be at your reviewers discretion if this will be allowed or not and they may insist on the former solution.

ADD COMMENT • link 5.8 years ago shepherl 3.8k

3

Entering edit mode

In addition to Lori's answer, here's a little anecdote.

I often perform simulations with randomly generated data to test the performance of various algorithms. I usually generate some data, test the method and compute some measure of performance; and repeat this for several iterations to ensure that I get representative estimates of the metric of interest. At one point, I noticed that the standard deviation of my metrics was extremely low. Why? Because someone had put set.seed inside their function, which affects the entire R session after the function call - this meant that my "randomly" generated data was always the same after the second iteration!

In short, it's always easy for users to call set.seed if they want to. But putting the set.seed inside functions can quietly lead to surprising side-effects in downstream code involving randomness. Moreover, it's much harder to "uncall" set.seed. Hence the advice from BiocCheck to not put set.seed inside the function.

ADD REPLY • link 5.8 years ago Aaron Lun ★ 28k

0

Entering edit mode

The proper way would be to test whether .Random.seed exists, save and restore it upon exit. In my packages (which live on CRAN, not Bioconductor) I also tend to allow the user to request that the random seed not be set within the function, by supplying NULL to the argument randomSeed below.

foo = function(..., randomSeed=1)
{
    if (!is.null(randomSeed)) {
        if (exists(".Random.seed")) {
            savedSeed = .Random.seed
            on.exit(.Random.seed <<-savedSeed)
        }
        set.seed(randomSeed)
    }
    actual code...
}

ADD REPLY • link 5.8 years ago Peter Langfelder ★ 3.0k

0

Entering edit mode

It seems by setting the random seed (I don't know the context, so could be off-base here) you're somehow overstating the reproducibility of foo() in the manner illustrated by Aaron's anecdote; it seems better to have NULL as the default?

Artificial, but

f = function() {
    .Random.seed <- 1
    function() {
        seed <- .Random.seed
        on.exit(.Random.seed <<- seed)
        rnorm(10)
    }
}

modifies the .Random.seed of the generator.

set.seed(123)
xx <- .Random.seed
res <- f()()
identical(xx, .Random.seed)  # FALSE

Maybe it's safer (since the user can manipulate the parent environment but not the location of .GlobalEnv in the search() path) with

f = function() {
    .Random.seed <- 1
    function() {
        seed <- get(".Random.seed", 1)
        on.exit(assign(".Random.seed", seed, 1))
        rnorm(10)
    }
}

ADD REPLY • link 5.8 years ago Martin Morgan 25k

0

Entering edit mode

You're right, I didn't think of the possibility of calling code defining its own .Random.seed (which is probably not very frequent but certainly possible).

ADD REPLY • link 5.8 years ago Peter Langfelder ★ 3.0k

0

Entering edit mode

Thank you very much for the great replies! I have removed the set.seed within my function and explicitly state it outside. I greatly appreciate these helpful comment!

ADD REPLY • link 5.8 years ago xyluo1991 • 0