Mutating columns

Author

Jeffrey R. Stevens

Published

February 15, 2023

For these exercises, we’ll use the dog breed traits data set, so import that from https://jeffreyrstevens.quarto.pub/dpavir/data/dog_breed_traits.csv (if you don’t already have it) and assign it to traits.

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.0     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
traits <- read_csv(here::here("data/dog_breed_traits.csv"))
Rows: 195 Columns: 17
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (3): Breed, Coat Type, Coat Length
dbl (14): Affectionate With Family, Good With Young Children, Good With Othe...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
  1. View traits to see what it looks like.
head(traits)
# A tibble: 6 × 17
  Breed     Affectionate With Fa…¹ Good With Young Chil…² `Good With Other Dogs`
  <chr>                      <dbl>                  <dbl>                  <dbl>
1 Retrieve…                      5                      5                      5
2 French B…                      5                      5                      4
3 German S…                      5                      5                      3
4 Retrieve…                      5                      5                      5
5 Bulldogs                       4                      3                      3
6 Poodles                        5                      5                      3
# ℹ abbreviated names: ¹​`Affectionate With Family`, ²​`Good With Young Children`
# ℹ 13 more variables: `Shedding Level` <dbl>, `Coat Grooming Frequency` <dbl>,
#   `Drooling Level` <dbl>, `Coat Type` <chr>, `Coat Length` <chr>,
#   `Openness To Strangers` <dbl>, `Playfulness Level` <dbl>,
#   `Watchdog/Protective Nature` <dbl>, `Adaptability Level` <dbl>,
#   `Trainability Level` <dbl>, `Energy Level` <dbl>, `Barking Level` <dbl>,
#   `Mental Stimulation Needs` <dbl>
  1. Reassign traits with only the columns Breed through Coat Length.
traits <- select(traits, Breed:`Coat Length`)
  1. Reassign traits removing the Drooling Level column. That’s gross.
traits <- select(traits, -`Drooling Level`)
  1. What terrible column names! Reassign traits and change the column names to "breed", "affectionate", "children", "other_dogs", "shedding", "grooming", "coat_type", "coat_length". Note, use the colnames() function rather than select() or rename() since you already have the full vector of names.
colnames(traits) <- c("breed", "affectionate", "children", "other_dogs", "shedding", "grooming", "coat_type", "coat_length")
  1. The ratings are supposed to run from 0 to 4 rather than 1 to 5. Change the affectionate column by subtracting 1 from the original numbers to rescale the values. Don’t reassign traits.
mutate(traits, affectionate = affectionate - 1)
# A tibble: 195 × 8
   breed            affectionate children other_dogs shedding grooming coat_type
   <chr>                   <dbl>    <dbl>      <dbl>    <dbl>    <dbl> <chr>    
 1 Retrievers (Lab…            4        5          5        4        2 Double   
 2 French Bulldogs             4        5          4        3        1 Smooth   
 3 German Shepherd…            4        5          3        4        2 Double   
 4 Retrievers (Gol…            4        5          5        4        2 Double   
 5 Bulldogs                    3        3          3        3        3 Smooth   
 6 Poodles                     4        5          3        1        4 Curly    
 7 Beagles                     2        5          5        3        2 Smooth   
 8 Rottweilers                 4        3          3        3        1 Smooth   
 9 Pointers (Germa…            4        5          4        3        2 Smooth   
10 Dachshunds                  4        3          4        2        2 Smooth   
# ℹ 185 more rows
# ℹ 1 more variable: coat_length <chr>
  1. Actually, all of the ratings need to be rescaled. Subtract 1 from all of the ratings columns by using across().
mutate(traits, across(affectionate:grooming, ~ .x - 1))
# A tibble: 195 × 8
   breed            affectionate children other_dogs shedding grooming coat_type
   <chr>                   <dbl>    <dbl>      <dbl>    <dbl>    <dbl> <chr>    
 1 Retrievers (Lab…            4        4          4        3        1 Double   
 2 French Bulldogs             4        4          3        2        0 Smooth   
 3 German Shepherd…            4        4          2        3        1 Double   
 4 Retrievers (Gol…            4        4          4        3        1 Double   
 5 Bulldogs                    3        2          2        2        2 Smooth   
 6 Poodles                     4        4          2        0        3 Curly    
 7 Beagles                     2        4          4        2        1 Smooth   
 8 Rottweilers                 4        2          2        2        0 Smooth   
 9 Pointers (Germa…            4        4          3        2        1 Smooth   
10 Dachshunds                  4        2          3        1        1 Smooth   
# ℹ 185 more rows
# ℹ 1 more variable: coat_length <chr>
  1. Create a new column called coat that combines the coat_type and coat_length columns by pasting the values of those two columns separated by -.
mutate(traits, coat = paste(coat_type, coat_length, sep = "-"))
# A tibble: 195 × 9
   breed            affectionate children other_dogs shedding grooming coat_type
   <chr>                   <dbl>    <dbl>      <dbl>    <dbl>    <dbl> <chr>    
 1 Retrievers (Lab…            5        5          5        4        2 Double   
 2 French Bulldogs             5        5          4        3        1 Smooth   
 3 German Shepherd…            5        5          3        4        2 Double   
 4 Retrievers (Gol…            5        5          5        4        2 Double   
 5 Bulldogs                    4        3          3        3        3 Smooth   
 6 Poodles                     5        5          3        1        4 Curly    
 7 Beagles                     3        5          5        3        2 Smooth   
 8 Rottweilers                 5        3          3        3        1 Smooth   
 9 Pointers (Germa…            5        5          4        3        2 Smooth   
10 Dachshunds                  5        3          4        2        2 Smooth   
# ℹ 185 more rows
# ℹ 2 more variables: coat_length <chr>, coat <chr>
  1. Create a new column called shed that dichotomizes shedding such that values of 3 and above are “A lot” and values below 3 are “Not much”. Do you need to account for missing data?
mutate(traits, shed = ifelse(shedding > 2, "A lot", "Not much"))
# A tibble: 195 × 9
   breed            affectionate children other_dogs shedding grooming coat_type
   <chr>                   <dbl>    <dbl>      <dbl>    <dbl>    <dbl> <chr>    
 1 Retrievers (Lab…            5        5          5        4        2 Double   
 2 French Bulldogs             5        5          4        3        1 Smooth   
 3 German Shepherd…            5        5          3        4        2 Double   
 4 Retrievers (Gol…            5        5          5        4        2 Double   
 5 Bulldogs                    4        3          3        3        3 Smooth   
 6 Poodles                     5        5          3        1        4 Curly    
 7 Beagles                     3        5          5        3        2 Smooth   
 8 Rottweilers                 5        3          3        3        1 Smooth   
 9 Pointers (Germa…            5        5          4        3        2 Smooth   
10 Dachshunds                  5        3          4        2        2 Smooth   
# ℹ 185 more rows
# ℹ 2 more variables: coat_length <chr>, shed <chr>
  1. Use rowwise() to calculate the mean rating for the children and other_dogs columns in a column called mean_rating.
rowwise(traits) %>%
  mutate(mean_rating = mean(children, other_dogs))
# A tibble: 195 × 9
# Rowwise: 
   breed            affectionate children other_dogs shedding grooming coat_type
   <chr>                   <dbl>    <dbl>      <dbl>    <dbl>    <dbl> <chr>    
 1 Retrievers (Lab…            5        5          5        4        2 Double   
 2 French Bulldogs             5        5          4        3        1 Smooth   
 3 German Shepherd…            5        5          3        4        2 Double   
 4 Retrievers (Gol…            5        5          5        4        2 Double   
 5 Bulldogs                    4        3          3        3        3 Smooth   
 6 Poodles                     5        5          3        1        4 Curly    
 7 Beagles                     3        5          5        3        2 Smooth   
 8 Rottweilers                 5        3          3        3        1 Smooth   
 9 Pointers (Germa…            5        5          4        3        2 Smooth   
10 Dachshunds                  5        3          4        2        2 Smooth   
# ℹ 185 more rows
# ℹ 2 more variables: coat_length <chr>, mean_rating <dbl>
  1. Create a column called coat_type2 that categorizes the coat_type values in the following way and puts it after coat_type:
mutate(traits, coat_type2 = case_when(
  coat_type %in% c("Smooth", "Silky", "Wavy") ~ "very petable",
  coat_type %in% c("Wiry", "Hairless", "Rough", "Corded") ~ "not petable",
  coat_type %in% c("Double", "Curly") ~ "petable"),
  .after = coat_type
)
# A tibble: 195 × 9
   breed affectionate children other_dogs shedding grooming coat_type coat_type2
   <chr>        <dbl>    <dbl>      <dbl>    <dbl>    <dbl> <chr>     <chr>     
 1 Retr…            5        5          5        4        2 Double    petable   
 2 Fren…            5        5          4        3        1 Smooth    very peta…
 3 Germ…            5        5          3        4        2 Double    petable   
 4 Retr…            5        5          5        4        2 Double    petable   
 5 Bull…            4        3          3        3        3 Smooth    very peta…
 6 Pood…            5        5          3        1        4 Curly     petable   
 7 Beag…            3        5          5        3        2 Smooth    very peta…
 8 Rott…            5        3          3        3        1 Smooth    very peta…
 9 Poin…            5        5          4        3        2 Smooth    very peta…
10 Dach…            5        3          4        2        2 Smooth    very peta…
# ℹ 185 more rows
# ℹ 1 more variable: coat_length <chr>