Loop vs apply in R
I have heard quite a several times that apply function is faster than loop function in R. Loop function is said to be inefficient, though in certain situation loop is the only way.
Let’s compare between loop function and apply function in R.
First, make a very big fake data contain a list of vector.
set.seed(2021)
xlist <- list(col1 = rnorm(10000000),
col2 = rnorm(10000000),
col3 = rnorm(100000000),
col4 = rnorm(1000000)) # this will take a few seconds
Then, calculate the mean of each vector using for loop()
.
ptm <- proc.time() #-- start the clock
mean_loop <- vector("list", 0) # place holder for a value
for (i in seq_along(xlist)) {
mean_loop[[i]] <- mean(xlist[[i]])
}
proc.time() - ptm #-- stop the clock (time in seconds)
## user system elapsed
## 0.38 0.00 0.37
Now, using lapply()
function.
ptm <- proc.time() #-- start the clock
mean_apply <- lapply(xlist, mean)
proc.time() - ptm #-- stop the clock
## user system elapsed
## 0.34 0.00 0.35
So, lapply()
is a little bit faster. Obviously, with a very big dataset and a more complicated objective, lapply()
is the right choice, but for a “normal” size dataset, the use of any of the two functions probably do not make much different.