A summary of forcats package

I just watched a youtube video by Andrew Couch about his commonly used function in readr, stringr, and forcats packages. Although, I have used forcats package before, I realised that I have not fully utilised all of its function.

So, in this post, I have summarised main function of forcats that I find useful in my day-to-day R coding. Basically, more like a note to myself.

Main functions

We will use mtcars data to demonstrate each function. forcats is part of tiyverse packages. So, it will load, once we load the tidyverse packages.

## Rows: 32
## Columns: 11
## $ mpg  <dbl> 21.0, 21.0, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.8, 19.2, 17.8,~
## $ cyl  <dbl> 6, 6, 4, 6, 8, 6, 8, 4, 4, 6, 6, 8, 8, 8, 8, 8, 8, 4, 4, 4, 4, 8,~
## $ disp <dbl> 160.0, 160.0, 108.0, 258.0, 360.0, 225.0, 360.0, 146.7, 140.8, 16~
## $ hp   <dbl> 110, 110, 93, 110, 175, 105, 245, 62, 95, 123, 123, 180, 180, 180~
## $ drat <dbl> 3.90, 3.90, 3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.92, 3.92, 3.92,~
## $ wt   <dbl> 2.620, 2.875, 2.320, 3.215, 3.440, 3.460, 3.570, 3.190, 3.150, 3.~
## $ qsec <dbl> 16.46, 17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20.00, 22.90, 18~
## $ vs   <dbl> 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0,~
## $ am   <dbl> 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0,~
## $ gear <dbl> 4, 4, 4, 3, 3, 3, 3, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 4, 4, 4, 3, 3,~
## $ carb <dbl> 4, 4, 1, 1, 2, 1, 4, 2, 2, 4, 4, 3, 3, 3, 4, 4, 4, 1, 2, 1, 1, 2,~

There are 9 forcats functions that I think very useful.

  1. factor()

factor() changes variable type into a factor or categorical type

mtcars$carb <- factor(mtcars$carb)
## Rows: 32
## Columns: 11
## $ mpg  <dbl> 21.0, 21.0, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.8, 19.2, 17.8,~
## $ cyl  <dbl> 6, 6, 4, 6, 8, 6, 8, 4, 4, 6, 6, 8, 8, 8, 8, 8, 8, 4, 4, 4, 4, 8,~
## $ disp <dbl> 160.0, 160.0, 108.0, 258.0, 360.0, 225.0, 360.0, 146.7, 140.8, 16~
## $ hp   <dbl> 110, 110, 93, 110, 175, 105, 245, 62, 95, 123, 123, 180, 180, 180~
## $ drat <dbl> 3.90, 3.90, 3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.92, 3.92, 3.92,~
## $ wt   <dbl> 2.620, 2.875, 2.320, 3.215, 3.440, 3.460, 3.570, 3.190, 3.150, 3.~
## $ qsec <dbl> 16.46, 17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20.00, 22.90, 18~
## $ vs   <dbl> 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0,~
## $ am   <dbl> 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0,~
## $ gear <dbl> 4, 4, 4, 3, 3, 3, 3, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 4, 4, 4, 3, 3,~
## $ carb <fct> 4, 4, 1, 1, 2, 1, 4, 2, 2, 4, 4, 3, 3, 3, 4, 4, 4, 1, 2, 1, 1, 2,~
  1. fct_inorder()

This function sorts factor levels based on the order of appearance in the dataset.

levels(mtcars$carb) # original levels
## [1] "1" "2" "3" "4" "6" "8"
fct_inorder(mtcars$carb) # levels based on the order of appearance
##  [1] 4 4 1 1 2 1 4 2 2 4 4 3 3 3 4 4 4 1 2 1 1 2 2 4 2 1 2 2 4 6 8 2
## Levels: 4 1 2 3 6 8
  1. fct_infreq()

This function sorts factor levels based on the frequency of values.

fct_count(mtcars$carb) # this is forcats function as well, count factor level
## # A tibble: 6 x 2
##   f         n
##   <fct> <int>
## 1 1         7
## 2 2        10
## 3 3         3
## 4 4        10
## 5 6         1
## 6 8         1
levels(mtcars$carb) # original levels
## [1] "1" "2" "3" "4" "6" "8"
fct_infreq(mtcars$carb) # levels based on the frequency values
##  [1] 4 4 1 1 2 1 4 2 2 4 4 3 3 3 4 4 4 1 2 1 1 2 2 4 2 1 2 2 4 6 8 2
## Levels: 2 4 1 3 6 8
  1. fct_relevel()

This function can be used to change the order manually.

levels(mtcars$carb) # original levels
## [1] "1" "2" "3" "4" "6" "8"
fct_relevel(mtcars$carb, c("8", "6", "4", "3", "2", "1")) # manually changed new levels
##  [1] 4 4 1 1 2 1 4 2 2 4 4 3 3 3 4 4 4 1 2 1 1 2 2 4 2 1 2 2 4 6 8 2
## Levels: 8 6 4 3 2 1

fct_relevel() can also be used to change one factor level only.

levels(mtcars$carb) # original levels
## [1] "1" "2" "3" "4" "6" "8"
fct_relevel(mtcars$carb, "8", after = 2) # change level 8 to the third place
##  [1] 4 4 1 1 2 1 4 2 2 4 4 3 3 3 4 4 4 1 2 1 1 2 2 4 2 1 2 2 4 6 8 2
## Levels: 1 2 8 3 4 6
  1. fct_reorder()

This function changes the order based on another variable. Let’s change variable carb’s levels based on value of variable disp.

levels(mtcars$carb) # original levels
## [1] "1" "2" "3" "4" "6" "8"
fct_reorder(mtcars$carb, mtcars$disp, .fun = sum, .desc = TRUE) # new level based on disp value
##  [1] 4 4 1 1 2 1 4 2 2 4 4 3 3 3 4 4 4 1 2 1 1 2 2 4 2 1 2 2 4 6 8 2
## Levels: 4 2 1 3 8 6
mtcars %>% 
  group_by(carb) %>% 
  summarise(sum_disp = sum(disp)) %>% 
  arrange(desc(sum_disp)) # this is basically what we do with fct_reorder() above
## # A tibble: 6 x 2
##   carb  sum_disp
##   <fct>    <dbl>
## 1 4        3088.
## 2 2        2082.
## 3 1         940.
## 4 3         827.
## 5 8         301 
## 6 6         145

Additionally, fct_reorder() can be used with plotting as well.

# Original plot
ggplot(mtcars, aes(x = carb, y = disp)) +

# Plot with changed levels
mtcars %>% 
  mutate(carb = fct_reorder(carb, disp, .fun = sum, .desc = TRUE)) %>% 
  ggplot(aes(x = carb, y = disp)) +

  1. fct_lump()

This function lumps factor levels into other factors. There are 5 variants of this function:

  • fct_lump()
  • fct_lump_min()
  • fct_lump_n()
  • fct_lump_lowfreq()

The remaining one variant is fct_lump_prop(). It is not in the example below as I do not find it useful at least for my current R coding routine.

fct_lump() automatically lump small frequency factor group into one group.

fct_count(mtcars$carb) # this is forcats function as well, count factor level
## # A tibble: 6 x 2
##   f         n
##   <fct> <int>
## 1 1         7
## 2 2        10
## 3 3         3
## 4 4        10
## 5 6         1
## 6 8         1
fct_lump(mtcars$carb) %>% fct_count() 
## # A tibble: 4 x 2
##   f         n
##   <fct> <int>
## 1 1         7
## 2 2        10
## 3 4        10
## 4 Other     5

fct_lump_min() lump factor group into one group based on the given value.

table(fct_lump_min(mtcars$carb, min = 2)) # group 6 and 8 lump into one group
##     1     2     3     4 Other 
##     7    10     3    10     2

fct_lump_n() lump all level except for the n most frequent factor groups.

table(fct_lump_n(mtcars$carb, n = 2)) # 2 frequent group only, others in one group
##     2     4 Other 
##    10    10    12

fct_lump_lowfreq() lump small frequent groups into one group, while making sure that particular one group is still the smallest.

table(fct_lump_lowfreq(mtcars$carb, other_level = "low")) # group low is still the smallest
##   1   2   4 low 
##   7  10  10   5
  1. fct_other()

fct_other() is much like fct_lump(), except we manually choose which factor groups to be combined.

table(fct_other(mtcars$carb, keep = c("8", "6"))) 
##     6     8 Other 
##     1     1    30
  1. fct_recode()

This function is used to rename or relabel the factor group.

table(fct_recode(mtcars$carb, hanis = "8")) 
##     1     2     3     4     6 hanis 
##     7    10     3    10     1     1
  1. fct_relabel()

fct_relabel() is extremely useful if we want to rename quite a number of factor groups.

table(mtcars$carb) # original groups
##  1  2  3  4  6  8 
##  7 10  3 10  1  1
table(fct_relabel(mtcars$carb, ~ c("abu", "ali", "chong", "siti", "krish", "lee"))) # new named groups
##   abu   ali chong  siti krish   lee 
##     7    10     3    10     1     1


Tengku Muhammad Hanis
Tengku Muhammad Hanis
Lead academic trainer

My research interests include medical statistics and machine learning application.
