Factors

Author

Jeffrey R. Stevens

Published

March 20, 2023

For these exercises, we’ll use the dog breed traits data set.

  1. Load tidyverse, import dog_breed_traits_clean.csv to traits.
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.0     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
traits <- read_csv(here::here("data/dog_breed_traits_clean.csv"), show_col_types = FALSE)
set.seed(12)
breeds <- sample(traits$breed)
  1. Convert both coat_type and coat_length into factors using across() and save as traits2.
traits2 <- traits |> 
  mutate(across(contains("coat"), factor))
  1. Check the levels for both columns, one using a pipe and one without using a pipe.
levels(traits2$coat_type)
[1] "Corded"   "Curly"    "Double"   "Hairless" "Rough"    "Silky"    "Smooth"  
[8] "Wavy"     "Wiry"    
traits2 |> 
  pull(coat_length) |> 
  levels()
[1] "Long"   "Medium" "Short" 
  1. Reorder the levels for coat_length to be Short, Medium, Long (reassigned to traits2) and then check the levels.
traits2 <- traits2 |> 
  mutate(coat_length = fct_relevel(coat_length, "Short", "Medium", "Long"))
levels(traits2$coat_length)
[1] "Short"  "Medium" "Long"  
  1. Reorder the levels for coat_type to be in the order of the most to least frequent coat type and then check the levels.
traits2 <- traits2 |> 
  mutate(coat_type = fct_infreq(coat_type))
levels(traits2$coat_type)
[1] "Smooth"   "Double"   "Wiry"     "Silky"    "Curly"    "Wavy"     "Corded"  
[8] "Rough"    "Hairless"
  1. Relabel coat_length to be Stubby, Mid, and Lush rather than Short, Medium, and Long.
traits2 <- traits2 |> 
  mutate(coat_length = fct_recode(coat_length, "Stubby" = "Short",
                                  "Mid" = "Medium",
                                  "Lush" = "Long"))
levels(traits2$coat_length)
[1] "Stubby" "Mid"    "Lush"  
  1. The new AKC standard subsumes Rough coats with Wiry coats and Silky with Wavy. Please update the coat_type variable accordingly.
traits2 <- traits2 |> 
  mutate(coat_type = fct_collapse(coat_type, Wiry = c("Rough", "Wiry"),
                                  Wavy = c("Silky", "Wavy")))
levels(traits2$coat_type)
[1] "Smooth"   "Double"   "Wiry"     "Wavy"     "Curly"    "Corded"   "Hairless"