Pivoting data

Author

Jeffrey R. Stevens

Published

February 24, 2023

For these exercises, we’ll use a new clean version of the dog breed traits data set.

  1. Import data from https://jeffreyrstevens.quarto.pub/dpavir/data/dog_breed_traits_clean.csv and assign to traits.
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.0     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
traits <- read_csv("https://jeffreyrstevens.quarto.pub/dpavir/data/dog_breed_traits_clean.csv")
Rows: 197 Columns: 8
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): breed, coat_type, coat_length
dbl (5): affectionate, children, other_dogs, shedding, grooming

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
  1. Create traits2 where we delete the coat columns, so we only have breed and ratings data.
traits2 <- traits |> 
  select(-contains("coat"))
  1. Is traits2 tidy?

  2. Is traits2 in wide or long format?

  3. Reshape traits2 so that all of the ratings scores are in a single column called rating with a column labeling what kind of rating it is called scale. Assign this to traits3.

traits3 <- traits2 |> 
  pivot_longer(affectionate:grooming, names_to = "scale", values_to = "rating")
  1. How would we check if traits3 has the expected number of rows?

  2. Create traits4 by removing the rows with affectionate, children, and other_dogs as values of scale.

traits4 <- traits3 |> 
  filter(!scale %in% c("affectionate", "children", "other_dogs"))
  1. Spread out the data into wide format with separate columns for the shedding and grooming data, then create a new column diff that subtracts grooming from shedding ratings.
traits4 |>
  pivot_wider(id_cols = breed, names_from = scale, values_from = rating) |> 
  mutate(diff = shedding - grooming)
# A tibble: 197 × 4
   breed                         shedding grooming  diff
   <chr>                            <dbl>    <dbl> <dbl>
 1 Retrievers (Labrador)                4        2     2
 2 French Bulldogs                      3        1     2
 3 German Shepherd Dogs                 4        2     2
 4 Retrievers (Golden)                  4        2     2
 5 Bulldogs                             3        3     0
 6 Poodles                              1        4    -3
 7 Beagles                              3        2     1
 8 Rottweilers                          3        1     2
 9 Pointers (German Shorthaired)        3        2     1
10 Dachshunds                           2        2     0
# ℹ 187 more rows