6.3 Categorical Variables
6.3.1 Manipulate String Columns
reg_dict %>% mutate(def = sapply(strsplit(def,"\\."), "[[", 2) ) split a string column and select the 2nd item.
# alternatively, use separate(def, into, sep, remove=TRUE)
reg_dict %>% separate(def, c("cli_key", "yr_key"), ".")separate() function does the opposite of unite(): it splits one column into multiple columns using either a regular expression or character positions.
unite() pastes together existing string columns.
world_unite = world %>%
unite("con_reg", continent:region_un, sep = ":", remove = TRUE)
# remove indicates if the original columns should be removed6.3.2 Factors
forcats is one of the components package in tidyverse; it is useful for working with categorical variables (factors).
fct_expand: add additional levels to a factor.fct_drop: drop unused levels.fct_relevel: change the order of a factor by hand.
fct_expand(f, ..., after = Inf) add additional levels to a factor.
fa factor...additional levels to add to the factor.afterposition to place the new level(s).
f <- factor(sample(letters[1:3], 20, replace = TRUE))
f
#> [1] c a b a b b a c c b b b b c c c a b b c
#> Levels: a b c
fct_expand(f, "d", "e", "f")
#> [1] c a b a b b a c c b b b b c c c a b b c
#> Levels: a b c d e f
fct_expand(f, letters[1:6])
#> [1] c a b a b b a c c b b b b c c c a b b c
#> Levels: a b c d e f
fct_expand(f, "Z", after = 0)
#> [1] c a b a b b a c c b b b b c c c a b b c
#> Levels: Z a b cfct_drop(f, only = NULL) drop unused levels.