For these exercises, we’ll use the dog breed traits data set along with the data on breed popularity rankings.
Load tidyverse, download and import dog_breed_traits_clean.csv
to traits
, and import dog_breed_ranks.csv
to ranks
. Make sure to download both files from the website, as they have changed or are new.
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.0 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
traits <- read_csv (here:: here ("data/dog_breed_traits_clean.csv" ), show_col_types = FALSE )
ranks <- read_csv (here:: here ("data/dog_breed_ranks.csv" ), show_col_types = FALSE )
Which breeds differ between traits
and ranks
?
traits$ breed[! traits$ breed %in% ranks$ breed]
[1] "American Rearsniffer" "English Buttdragger"
ranks$ breed[! ranks$ breed %in% traits$ breed]
[1] "Bergamasco" "Pumi" "Puli"
Merge traits
and ranks
(in that order) to produce a data frame that includes breeds shared by both data sets. How many rows are there?
traits |>
inner_join (ranks, by = "breed" )
# A tibble: 195 × 16
breed affectionate children other_dogs shedding grooming coat_type
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 Retrievers (Lab… 5 5 5 4 2 Double
2 French Bulldogs 5 5 4 3 1 Smooth
3 German Shepherd… 5 5 3 4 2 Double
4 Retrievers (Gol… 5 5 5 4 2 Double
5 Bulldogs 4 3 3 3 3 Smooth
6 Poodles 5 5 3 1 4 Curly
7 Beagles 3 5 5 3 2 Smooth
8 Rottweilers 5 3 3 3 1 Smooth
9 Pointers (Germa… 5 5 4 3 2 Smooth
10 Dachshunds 5 3 4 2 2 Smooth
# ℹ 185 more rows
# ℹ 9 more variables: coat_length <chr>, `2013 Rank` <dbl>, `2014 Rank` <dbl>,
# `2015 Rank` <dbl>, `2016 Rank` <dbl>, `2017 Rank` <dbl>, `2018 Rank` <dbl>,
# `2019 Rank` <dbl>, `2020 Rank` <dbl>
Merge traits
and ranks
(in that order) to produce a data frame that includes all breeds included in either data set. How many rows are there?
traits |>
full_join (ranks, by = "breed" )
# A tibble: 200 × 16
breed affectionate children other_dogs shedding grooming coat_type
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 Retrievers (Lab… 5 5 5 4 2 Double
2 French Bulldogs 5 5 4 3 1 Smooth
3 German Shepherd… 5 5 3 4 2 Double
4 Retrievers (Gol… 5 5 5 4 2 Double
5 Bulldogs 4 3 3 3 3 Smooth
6 Poodles 5 5 3 1 4 Curly
7 Beagles 3 5 5 3 2 Smooth
8 Rottweilers 5 3 3 3 1 Smooth
9 Pointers (Germa… 5 5 4 3 2 Smooth
10 Dachshunds 5 3 4 2 2 Smooth
# ℹ 190 more rows
# ℹ 9 more variables: coat_length <chr>, `2013 Rank` <dbl>, `2014 Rank` <dbl>,
# `2015 Rank` <dbl>, `2016 Rank` <dbl>, `2017 Rank` <dbl>, `2018 Rank` <dbl>,
# `2019 Rank` <dbl>, `2020 Rank` <dbl>
Merge traits
and ranks
(in that order) to produce a data frame that includes only breeds included in traits
. How many rows are there?
traits |>
left_join (ranks, by = "breed" )
# A tibble: 197 × 16
breed affectionate children other_dogs shedding grooming coat_type
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 Retrievers (Lab… 5 5 5 4 2 Double
2 French Bulldogs 5 5 4 3 1 Smooth
3 German Shepherd… 5 5 3 4 2 Double
4 Retrievers (Gol… 5 5 5 4 2 Double
5 Bulldogs 4 3 3 3 3 Smooth
6 Poodles 5 5 3 1 4 Curly
7 Beagles 3 5 5 3 2 Smooth
8 Rottweilers 5 3 3 3 1 Smooth
9 Pointers (Germa… 5 5 4 3 2 Smooth
10 Dachshunds 5 3 4 2 2 Smooth
# ℹ 187 more rows
# ℹ 9 more variables: coat_length <chr>, `2013 Rank` <dbl>, `2014 Rank` <dbl>,
# `2015 Rank` <dbl>, `2016 Rank` <dbl>, `2017 Rank` <dbl>, `2018 Rank` <dbl>,
# `2019 Rank` <dbl>, `2020 Rank` <dbl>
Merge traits
and ranks
(in that order) to produce a data frame that includes only breeds included in ranks
. How many rows are there?
traits |>
right_join (ranks, by = "breed" )
# A tibble: 198 × 16
breed affectionate children other_dogs shedding grooming coat_type
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 Retrievers (Lab… 5 5 5 4 2 Double
2 French Bulldogs 5 5 4 3 1 Smooth
3 German Shepherd… 5 5 3 4 2 Double
4 Retrievers (Gol… 5 5 5 4 2 Double
5 Bulldogs 4 3 3 3 3 Smooth
6 Poodles 5 5 3 1 4 Curly
7 Beagles 3 5 5 3 2 Smooth
8 Rottweilers 5 3 3 3 1 Smooth
9 Pointers (Germa… 5 5 4 3 2 Smooth
10 Dachshunds 5 3 4 2 2 Smooth
# ℹ 188 more rows
# ℹ 9 more variables: coat_length <chr>, `2013 Rank` <dbl>, `2014 Rank` <dbl>,
# `2015 Rank` <dbl>, `2016 Rank` <dbl>, `2017 Rank` <dbl>, `2018 Rank` <dbl>,
# `2019 Rank` <dbl>, `2020 Rank` <dbl>
Make table4a
and table4b
tidy then join them to replicate table1
.
table4a_tidy <- pivot_longer (table4a, - country, names_to = "year" , values_to = "cases" )
table4b_tidy <- pivot_longer (table4b, - country, names_to = "year" , values_to = "population" )
left_join (table4a_tidy, table4b_tidy)
Joining with `by = join_by(country, year)`
# A tibble: 6 × 4
country year cases population
<chr> <chr> <dbl> <dbl>
1 Afghanistan 1999 745 19987071
2 Afghanistan 2000 2666 20595360
3 Brazil 1999 37737 172006362
4 Brazil 2000 80488 174504898
5 China 1999 212258 1272915272
6 China 2000 213766 1280428583