Merging columns

Author

Jeffrey R. Stevens

Published

March 1, 2023

For these exercises, we’ll use the dog breed traits data set along with the data on breed popularity rankings.

Load tidyverse, download and import dog_breed_traits_clean.csv to traits, and import dog_breed_ranks.csv to ranks. Make sure to download both files from the website, as they have changed or are new.

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.0     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

traits <- read_csv(here::here("data/dog_breed_traits_clean.csv"), show_col_types = FALSE)
ranks <- read_csv(here::here("data/dog_breed_ranks.csv"), show_col_types = FALSE)

Which breeds differ between traits and ranks?

traits$breed[!traits$breed %in% ranks$breed]

[1] "American Rearsniffer" "English Buttdragger"

ranks$breed[!ranks$breed %in% traits$breed]

[1] "Bergamasco" "Pumi"       "Puli"

Merge traits and ranks (in that order) to produce a data frame that includes breeds shared by both data sets. How many rows are there?

traits |> 
  inner_join(ranks, by = "breed")

# A tibble: 195 × 16
   breed            affectionate children other_dogs shedding grooming coat_type
   <chr>                   <dbl>    <dbl>      <dbl>    <dbl>    <dbl> <chr>    
 1 Retrievers (Lab…            5        5          5        4        2 Double   
 2 French Bulldogs             5        5          4        3        1 Smooth   
 3 German Shepherd…            5        5          3        4        2 Double   
 4 Retrievers (Gol…            5        5          5        4        2 Double   
 5 Bulldogs                    4        3          3        3        3 Smooth   
 6 Poodles                     5        5          3        1        4 Curly    
 7 Beagles                     3        5          5        3        2 Smooth   
 8 Rottweilers                 5        3          3        3        1 Smooth   
 9 Pointers (Germa…            5        5          4        3        2 Smooth   
10 Dachshunds                  5        3          4        2        2 Smooth   
# ℹ 185 more rows
# ℹ 9 more variables: coat_length <chr>, `2013 Rank` <dbl>, `2014 Rank` <dbl>,
#   `2015 Rank` <dbl>, `2016 Rank` <dbl>, `2017 Rank` <dbl>, `2018 Rank` <dbl>,
#   `2019 Rank` <dbl>, `2020 Rank` <dbl>

Merge traits and ranks (in that order) to produce a data frame that includes all breeds included in either data set. How many rows are there?

traits |> 
  full_join(ranks, by = "breed")

# A tibble: 200 × 16
   breed            affectionate children other_dogs shedding grooming coat_type
   <chr>                   <dbl>    <dbl>      <dbl>    <dbl>    <dbl> <chr>    
 1 Retrievers (Lab…            5        5          5        4        2 Double   
 2 French Bulldogs             5        5          4        3        1 Smooth   
 3 German Shepherd…            5        5          3        4        2 Double   
 4 Retrievers (Gol…            5        5          5        4        2 Double   
 5 Bulldogs                    4        3          3        3        3 Smooth   
 6 Poodles                     5        5          3        1        4 Curly    
 7 Beagles                     3        5          5        3        2 Smooth   
 8 Rottweilers                 5        3          3        3        1 Smooth   
 9 Pointers (Germa…            5        5          4        3        2 Smooth   
10 Dachshunds                  5        3          4        2        2 Smooth   
# ℹ 190 more rows
# ℹ 9 more variables: coat_length <chr>, `2013 Rank` <dbl>, `2014 Rank` <dbl>,
#   `2015 Rank` <dbl>, `2016 Rank` <dbl>, `2017 Rank` <dbl>, `2018 Rank` <dbl>,
#   `2019 Rank` <dbl>, `2020 Rank` <dbl>

Merge traits and ranks (in that order) to produce a data frame that includes only breeds included in traits. How many rows are there?

traits |> 
  left_join(ranks, by = "breed")

# A tibble: 197 × 16
   breed            affectionate children other_dogs shedding grooming coat_type
   <chr>                   <dbl>    <dbl>      <dbl>    <dbl>    <dbl> <chr>    
 1 Retrievers (Lab…            5        5          5        4        2 Double   
 2 French Bulldogs             5        5          4        3        1 Smooth   
 3 German Shepherd…            5        5          3        4        2 Double   
 4 Retrievers (Gol…            5        5          5        4        2 Double   
 5 Bulldogs                    4        3          3        3        3 Smooth   
 6 Poodles                     5        5          3        1        4 Curly    
 7 Beagles                     3        5          5        3        2 Smooth   
 8 Rottweilers                 5        3          3        3        1 Smooth   
 9 Pointers (Germa…            5        5          4        3        2 Smooth   
10 Dachshunds                  5        3          4        2        2 Smooth   
# ℹ 187 more rows
# ℹ 9 more variables: coat_length <chr>, `2013 Rank` <dbl>, `2014 Rank` <dbl>,
#   `2015 Rank` <dbl>, `2016 Rank` <dbl>, `2017 Rank` <dbl>, `2018 Rank` <dbl>,
#   `2019 Rank` <dbl>, `2020 Rank` <dbl>

Merge traits and ranks (in that order) to produce a data frame that includes only breeds included in ranks. How many rows are there?

traits |> 
  right_join(ranks, by = "breed")

# A tibble: 198 × 16
   breed            affectionate children other_dogs shedding grooming coat_type
   <chr>                   <dbl>    <dbl>      <dbl>    <dbl>    <dbl> <chr>    
 1 Retrievers (Lab…            5        5          5        4        2 Double   
 2 French Bulldogs             5        5          4        3        1 Smooth   
 3 German Shepherd…            5        5          3        4        2 Double   
 4 Retrievers (Gol…            5        5          5        4        2 Double   
 5 Bulldogs                    4        3          3        3        3 Smooth   
 6 Poodles                     5        5          3        1        4 Curly    
 7 Beagles                     3        5          5        3        2 Smooth   
 8 Rottweilers                 5        3          3        3        1 Smooth   
 9 Pointers (Germa…            5        5          4        3        2 Smooth   
10 Dachshunds                  5        3          4        2        2 Smooth   
# ℹ 188 more rows
# ℹ 9 more variables: coat_length <chr>, `2013 Rank` <dbl>, `2014 Rank` <dbl>,
#   `2015 Rank` <dbl>, `2016 Rank` <dbl>, `2017 Rank` <dbl>, `2018 Rank` <dbl>,
#   `2019 Rank` <dbl>, `2020 Rank` <dbl>

Make table4a and table4b tidy then join them to replicate table1.

table4a_tidy <- pivot_longer(table4a, -country, names_to = "year", values_to = "cases")
table4b_tidy <- pivot_longer(table4b, -country, names_to = "year", values_to = "population")
left_join(table4a_tidy, table4b_tidy)

Joining with `by = join_by(country, year)`

# A tibble: 6 × 4
  country     year   cases population
  <chr>       <chr>  <dbl>      <dbl>
1 Afghanistan 1999     745   19987071
2 Afghanistan 2000    2666   20595360
3 Brazil      1999   37737  172006362
4 Brazil      2000   80488  174504898
5 China       1999  212258 1272915272
6 China       2000  213766 1280428583