Question

how to use apply instead of nested for loop

0

Entering edit mode

KB ▴ 50

@k-8495

Last seen 2.2 years ago

United States

Hello,

I have a piece of code that is shown below. The code is a nested for loop where the inner loop is dependent on the outer loop. The outer most loop "i" value loops for every chromosome. The "j" value loops for every cytoband in a chromosome. The "k" value loops for every sample.

for (i = 1: NumberOfChromosomes) {

    for (j = 1: NumberOfCytobandsInEachChromosome) {

      for (k = 1: TotalNumberOfSamples) {

         z_1 = NULL  #reset this value before the calculation for every k 

         # do something with z_1 to get z_2

         # do something with z_2 to get z_3

         x[k, j] = z_3 #store the output value into a matrix

      } # end of k loop

      y [[i]] = x

    } # end of j loop

} # end of i loop

I would like to make this code faster and more efficient. Could anyone suggest a way to use one of the apply functions on this ? I have used apply (lapply and mapply) before, but never on nested for loops, so not sure how to do this.

Any help would be great. Thank you.

apply for loop • 4.7k views

ADD COMMENT • link 9.4 years ago KB ▴ 50

score 0 · Answer 1 · 2015-11-30

0

Entering edit mode

Aaron Lun ★ 28k

@alun

Last seen 2 hours ago

The city by the bay

Firstly, your question is about general R rather than Bioconductor packages, so this site isn't the right forum for it.

Secondly, the code within the braces of each for looks wrong; you should replace the = with in.

Thirdly, check out Circle (Chapter) 4 of "The R Inferno". Briefly put, apply is just syntactic sugar for the for loop. The function still runs a for loop under the hood. So really, you won't get any speed increase from switching to apply. Depending on what your skill level is, and what the tasks are inside the nested loop, you might consider rewriting the loop in C/C++ with Rcpp. Of course, if the tasks to be performed are compute-intensive, then any speed loss due to looping is probably negligible in the wider scheme of things.

ADD COMMENT • link 9.4 years ago Aaron Lun ★ 28k

1

Entering edit mode

One thing that the apply functions do for you is to pre-allocate space for the result; if the code had set y = list(), then y[[i]] = x would run in quadratic time. This can be seen even in this simple example

> y = list(); i = 1; while (TRUE) { y[[i]] = i; i = i + 1L; if (i %% 10000 == 0) print(Sys.time()) }
[1] "2015-11-30 19:14:15 EST"
[1] "2015-11-30 19:14:17 EST"
[1] "2015-11-30 19:14:21 EST"
[1] "2015-11-30 19:14:26 EST"
[1] "2015-11-30 19:14:34 EST"

Also, sometimes it seems that writing something as an apply() makes it more obvious how it should be vectorized (a single function call, rather than iteration), and vectorization is where real speed benefits can occur.

ADD REPLY • link 9.4 years ago Martin Morgan 25k

0

Entering edit mode

Good point. Plus, it also makes it easier to switch to parallelized versions like bplapply.

That said, trying to cram a complicated piece of code into a function to use in apply doesn't seem ideal for readability.

ADD REPLY • link 9.4 years ago Aaron Lun ★ 28k

score 0 · Answer 2 · 2015-12-01

0

Entering edit mode

KB ▴ 50

@k-8495

Last seen 2.2 years ago

United States

Thank you for the feedback.

The reason I asked the question was because I have found significant improvement in speed when I switch from for loop to apply. And the reason I posted the question here because the application is bioinformatics and deals with chromosomes and cytobands.

I will try the pre-allocation with y = list(), and also look into implement this code with bplapply .

Thank you.

ADD COMMENT • link 9.4 years ago KB ▴ 50

0

Entering edit mode

My advice was meant the other way -- pre-allocate with y = vector("list", NumberOfChromosomes).

ADD REPLY • link 9.4 years ago Martin Morgan 25k

0

Entering edit mode

So I just checked the code, and this pre-allocation is already being done. Even with this vectorization, this entire loop takes several minutes to complete.

I am looking for anyway to make this run faster (without having to recode this in C/C++). Thank you.

ADD REPLY • link 9.4 years ago KB ▴ 50

1

Entering edit mode

The time is likely being spent in the code you've commented out, # do something with ... so there's not much scope for help... Also it won't really help to post a lot of code; if the code is complicated you should try to simplify it as much as possible, and try and convey what it is you are trying to do in words. Update your original post, rather than adding more comments or another answer. Post a comment (on your original post) when you've made the update.

ADD REPLY • link 9.4 years ago Martin Morgan 25k