Question

How to plot benchmark result of R/ Bioconductor package (getting curve or line graph)?

0

Entering edit mode

Jurat Shahidin ▴ 80

@jurat-shahidin-9488

Last seen 4.1 years ago

Chicago, IL, USA

Dear all :

I am trying to generate plot (either curve or line graph) for benchmark result of the Bioconductor package that I developed, and intend to show the overall performance of my package respect to other software tool (implemented in C#). I profiled all my code by using profvis package in Rstudio, but I need nice plot instead. How can I make this happen ? Any way to get this sort of plot ? Any idea ?

However, In my workflow, I implemented list of function which ultimately contributes to indicate overall performance of my packages. Briefly speaking, here is the pipeline:

read peak file -> clean data -> find overlapping -> check overlapping requirement -> first level filtration -> fisher method -> second level filtration -> export result - > visualize output -> THE END;

Edit :

I tried rbenchmark package to produce benchmarkk result in this way :

benchmark(
    s1=myFunc1,
    s2=myFunc2,
    s3=myFunc3,
    ...
    s10=myFunc10,
    order="elapsed", replications=2
)

which gives benchmark metric respect to runtime of evaluating each function. Based on this result, how can I get nice plot (either line graph or curve) ? Any idea ?

Each pipeline has corresponding well-purposed R function that accepting different parameter. I want to get only one plot where X axis shows number of input peak files, Y axis show run time of my package that analyzing each peak file. I am lack of idea how to generate rather explicit plot (line graph or curve) that indicate performance of my packages that accept list of peak files as an input. Any idea to make this happen easily ? What's the starting point to evaluate R package performance which can be determined by contribution of several R functions ? How can I get desired curve plot ? Thanks in advance :)

Best regards :

Jurat

r benchmark performance ggplot2 • 1.4k views

ADD COMMENT • link 7.3 years ago Jurat Shahidin ▴ 80

0

Entering edit mode

What do you mean by nice plot? Do you wan to see how much it takes when you increase your input? Then you need to run the benchmark with several inputs (maybe just sections of a file, or increasing files) and plot a time vs length of input with ggplot2. You could also estimate the big-O of your package/algorithm and avoid calculating the benchmark.

ADD REPLY • link 7.3 years ago Lluís Revilla Sancho ▴ 730

0

Entering edit mode

Dear Lluis :

Thanks for your helpful respond. Yes, I want to see the plot run-time against number of features in each file. I did bench mark the function with giving several input files, but I am not happy with the resulted plot. I did this way.

This is the benchmark result data.frame by using rbenchmark :

benchResult <- data.frame(
    test=c("s5","s1","s6","s9","s2","s3","s4","s7","s8","s10"),
    replications=c(10,10,10,10,10,10,10,10,10,10),
    elapsed=c(0.10,0.11,0.30,0.32,0.75,0.98,3.43,8.07,13.22,30.48),
    relative=c(1.0,1.1,3.0,3.2,7.5,9.8,34.3,80.7,132.2,304.8),
    user.self=c(0.11,0.03,0.30,0.33,0.75, 0.73,3.36,8.07,13.21,27.70),
    sys.self=c(0.00,0.02,0.00,0.00,0.00,0.09,0.00,0.00,0.00,0.31)
)


library(ggplot2)
ggplot(benchResult, aes(x = forcats::fct_inorder(test), y = elapsed, group = 1)) + 
  geom_line() + 
  xlab("test")

but resulted plot is still not desired. Could you reproduce your thought with intuitive example to get explicit plot ? How to estimate big-O of R/BioConductor package ? Thank you.

ADD REPLY • link 7.3 years ago Jurat Shahidin ▴ 80

0

Entering edit mode

You can read about big-O notation in Wikipedia. You are plotting each function and the time it takes, not the number of features in each file. You could add a column where you indicate the number of features used for each function and plot using it. But if in each function you are using an even increasing number of features it seems that your function escalates at exponential rate, so your O notation is O(n^2). If you want a better O you would need to modify your package/algorithm/function.

ADD REPLY • link 7.3 years ago Lluís Revilla Sancho ▴ 730

0

Entering edit mode

Is that possible to reproduce your thought with few simple example ? I checked out stackoverflow about estimating big-O of R package, not much helpful post out there. Could you continue your statement with example ? Thanks

ADD REPLY • link 7.3 years ago Jurat Shahidin ▴ 80

0

Entering edit mode

>On <- function(n){x<- 0; for (i in n) { x + i}}

>On2 <- function(n) {x<-0; for (i in n) { for (i in n) {x + i}}}

>benchmark(On(1:100), On2(1:100), On(1:200), On2(1:200), replications = 50)
        test replications elapsed relative user.self sys.self user.child sys.child
1  On(1:100)           50   0.002      1.0     0.001     0.00          0         0
3  On(1:200)           50   0.004      2.0     0.004     0.00          0         0
2 On2(1:100)           50   0.184     92.0     0.172     0.01          0         0
4 On2(1:200)           50   0.709    354.5     0.708     0.00          0         0

As you can see as I double the input On the time increase is also the double, while On2 the time increase is much higher.

ADD REPLY • link 7.3 years ago Lluís Revilla Sancho ▴ 730