Case when statements are very handy to lower the granularity of a categorical variable for example. Let’s say you are dealing with a dataset made of a list of companies along with their company sizes. The column reporting the sizes is too granular for you and contains values such as 1 to 9 or 100 to 259 enployees. You want to bucket the companies into small, medium or big and the case_when() function from the dplyr library can help!

First let’s create a dataframe

``````companies = c('Company A', 'Company B', 'Company C', 'Company D', 'Company E', 'Company F', 'Company G')
sizes = c('1-9', '10-49', '50-99', '100-249', '250-499', '500-999', '1000+')
data = data.frame(companies, sizes)
``````

Introducing The casewhen() function from the dplyr library

``````library(dplyr)

data\$sizes_2 = case_when(
data\$sizes == "1-9" ~ "Small",
data\$sizes == "10-49" ~ "Small",
data\$sizes == "50-99" ~ "Small",
data\$sizes == "100-249" ~ "Medium",
data\$sizes == "250-499" ~ "Medium",
data\$sizes == "500-999" ~ "Medium",
data\$sizes == "1000+" ~ "Big",
)
``````  companies   sizes sizes_2