Loop vs apply in R

I have heard quite a several times that apply function is faster than loop function in R. Loop function is said to be inefficient, though in certain situation loop is the only way.

Let’s compare between loop function and apply function in R.

First, make a very big fake data contain a list of vector.

set.seed(2021)
xlist <- list(col1 = rnorm(10000000), 
              col2 = rnorm(10000000),
              col3 = rnorm(100000000),
              col4 = rnorm(1000000)) # this will take a few seconds

Then, calculate the mean of each vector using for loop().

ptm <- proc.time() #-- start the clock

mean_loop <- vector("list", 0) # place holder for a value
for (i in seq_along(xlist)) {
  mean_loop[[i]] <- mean(xlist[[i]])
}

proc.time() - ptm #-- stop the clock (time in seconds)
##    user  system elapsed 
##    0.38    0.00    0.37

Now, using lapply() function.

ptm <- proc.time() #-- start the clock

mean_apply <- lapply(xlist, mean)

proc.time() - ptm #-- stop the clock
##    user  system elapsed 
##    0.34    0.00    0.35

So, lapply() is a little bit faster. Obviously, with a very big dataset and a more complicated objective, lapply() is the right choice, but for a “normal” size dataset, the use of any of the two functions probably do not make much different.

Tengku Muhammad Hanis
Tengku Muhammad Hanis
Lead academic trainer

My research interests include medical statistics and machine learning application.

Related