# Separating and uniting data

Author

Jeffrey R. Stevens

Published

February 27, 2023

For these exercises, we’ll use a new clean version of the dog breed traits data set.

1. Load tidyverse, import data from https://jeffreyrstevens.quarto.pub/dpavir/data/dog_breed_traits_clean.csv, and assign it to `traits`.
``library(tidyverse)``
``````── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.0     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors``````
``traits <- read_csv("https://jeffreyrstevens.quarto.pub/dpavir/data/dog_breed_traits_clean.csv", show_col_types = FALSE)``
1. Create `traits2` which adds a coat column that combines coat_type and coat_length into single column delimited by “-”.
``````traits2 <- traits |>
unite("coat", contains("coat_"), sep = "-")``````
1. Split the coat column into type and length and keep the original coat column.
``````traits2 |>
separate(coat, into = c("type", "length"), remove = FALSE)``````
``````# A tibble: 197 × 9
breed   affectionate children other_dogs shedding grooming coat  type  length
<chr>          <dbl>    <dbl>      <dbl>    <dbl>    <dbl> <chr> <chr> <chr>
1 Retrie…            5        5          5        4        2 Doub… Doub… Short
2 French…            5        5          4        3        1 Smoo… Smoo… Short
3 German…            5        5          3        4        2 Doub… Doub… Medium
4 Retrie…            5        5          5        4        2 Doub… Doub… Medium
5 Bulldo…            4        3          3        3        3 Smoo… Smoo… Short
6 Poodles            5        5          3        1        4 Curl… Curly Long
7 Beagles            3        5          5        3        2 Smoo… Smoo… Short
8 Rottwe…            5        3          3        3        1 Smoo… Smoo… Short
9 Pointe…            5        5          4        3        2 Smoo… Smoo… Short
10 Dachsh…            5        3          4        2        2 Smoo… Smoo… Short
# ℹ 187 more rows``````
1. Create `traits3` from `traits` that (1) removes the coat columns, (2) turns the ratings columns into long format, and (3) removes the children row for Bulldogs.
``````traits3 <- traits |>
select(-contains("coat_")) |>
pivot_longer(affectionate:grooming, names_to = "scale", values_to = "rating") |>
filter(breed != "Bulldogs" | scale != "children")``````
1. Create `traits4` from `traits3` that ensures a complete data set with all five ratings for all breeds (and fills in missing combinations with `NA`) and check for the missing Bulldog children row.
``````traits4 <- traits3 |>
complete(breed, scale)``````
1. How could we copy the rating from the previous row into the Bulldog children row to replace the `NA`? (Note this is not a good idea in this case!)
``````traits4 |>
fill(rating) |>
filter(breed == "Bulldogs")``````
``````# A tibble: 5 × 3
breed    scale        rating
<chr>    <chr>         <dbl>
1 Bulldogs affectionate      4
2 Bulldogs children          4
3 Bulldogs grooming          3
4 Bulldogs other_dogs        3
5 Bulldogs shedding          3``````
1. From `traits`, generate all combinations of coat type and length observed in the data, excluding `NA`.
``````traits |>
expand(nesting(coat_type, coat_length)) |>
drop_na()``````
``````# A tibble: 18 × 2
coat_type coat_length
<chr>     <chr>
1 Corded    Long
2 Corded    Medium
3 Curly     Long
4 Curly     Medium
5 Double    Long
6 Double    Medium
7 Double    Short
8 Hairless  Short
9 Rough     Long
10 Rough     Medium
11 Silky     Long
12 Silky     Medium
13 Smooth    Medium
14 Smooth    Short
15 Wavy      Long
16 Wavy      Medium
17 Wiry      Medium
18 Wiry      Short      ``````
1. From `traits`, generate all possible combinations of coat type and length, excluding `NA`.
``````traits |>
expand(coat_type, coat_length) |>
drop_na()``````
``````# A tibble: 27 × 2
coat_type coat_length
<chr>     <chr>
1 Corded    Long
2 Corded    Medium
3 Corded    Short
4 Curly     Long
5 Curly     Medium
6 Curly     Short
7 Double    Long
8 Double    Medium
9 Double    Short
10 Hairless  Long
# ℹ 17 more rows``````