Summarizing rows

Author

Jeffrey R. Stevens

Published

February 22, 2023

For these exercises, we’ll use a new clean version of the dog breed traits data set.

  1. Import data from https://jeffreyrstevens.quarto.pub/dpavir/data/dog_breed_traits_clean.csv and assign to traits.
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.0     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
traits <- read_csv("https://jeffreyrstevens.quarto.pub/dpavir/data/dog_breed_traits_clean.csv")
Rows: 197 Columns: 8
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): breed, coat_type, coat_length
dbl (5): affectionate, children, other_dogs, shedding, grooming

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
  1. What is the overall mean rating for affectionate?
summarise(traits, mean(affectionate))
# A tibble: 1 × 1
  `mean(affectionate)`
                 <dbl>
1                 4.50
  1. What is the overall mean rating for all rating columns ignoring NAs?
summarise(traits, across(affectionate:grooming, ~ mean(.x, na.rm = TRUE)))
# A tibble: 1 × 5
  affectionate children other_dogs shedding grooming
         <dbl>    <dbl>      <dbl>    <dbl>    <dbl>
1         4.50     3.88       3.55     2.61     2.28
  1. How many breeds are there in each coat type?
count(traits, coat_type)
# A tibble: 10 × 2
   coat_type     n
   <chr>     <int>
 1 Corded        4
 2 Curly         7
 3 Double       66
 4 Hairless      3
 5 Rough         4
 6 Silky         9
 7 Smooth       67
 8 Wavy          6
 9 Wiry         30
10 <NA>          1
  1. What is the median grooming rating for each coat type?
traits |> 
  group_by(coat_type) |> 
  summarise(median(grooming, na.rm = TRUE))
# A tibble: 10 × 2
   coat_type `median(grooming, na.rm = TRUE)`
   <chr>                                <dbl>
 1 Corded                                 4  
 2 Curly                                  3  
 3 Double                                 2.5
 4 Hairless                               1  
 5 Rough                                  2  
 6 Silky                                  3  
 7 Smooth                                 2  
 8 Wavy                                   2  
 9 Wiry                                   2  
10 <NA>                                   2  
  1. What is the lowest rating per coat length for each of the rating columns, ignoring NAs?
traits |> 
  group_by(coat_length) |> 
  summarise(across(affectionate:grooming, ~ min(.x, na.rm = TRUE)))
# A tibble: 4 × 6
  coat_length affectionate children other_dogs shedding grooming
  <chr>              <dbl>    <dbl>      <dbl>    <dbl>    <dbl>
1 Long                   3        3          2        1        1
2 Medium                 3        2          1        1        1
3 Short                  1        1          1        1        1
4 <NA>                   4        5          5        3        2
  1. What are the sample size, mean, and standard deviation of shedding ratings for medium coat length dogs per coat type sorted from largest to smallest sample size and only including coat types with 5 or more samples?
traits |> 
  filter(coat_length == "Medium") |> 
  group_by(coat_type) |> 
  summarise(n = n(), shedding_mean = mean(shedding), shedding_sd = sd(shedding)) |> 
  arrange(desc(n)) |> 
  filter(n > 4)
# A tibble: 5 × 4
  coat_type     n shedding_mean shedding_sd
  <chr>     <int>         <dbl>       <dbl>
1 Double       39          3.03       0.707
2 Wiry         19          2.53       0.612
3 Curly         5          1.4        0.894
4 Smooth        5          3          0    
5 Wavy          5          1.8        0.837
  1. Calculate each breed’s mean rating across all ratings columns and return a data frame with the highest rating for each coat type. Don’t forget to undo rowwise() with ungroup() before further calculations.
traits |>
  rowwise() |> 
  mutate(mean_rating = mean(c(affectionate, children, other_dogs, shedding, grooming), na.rm = TRUE)) |> 
  ungroup() |> 
  group_by(coat_type) |> 
  slice_max(mean_rating)
# A tibble: 16 × 9
# Groups:   coat_type [10]
   breed            affectionate children other_dogs shedding grooming coat_type
   <chr>                   <dbl>    <dbl>      <dbl>    <dbl>    <dbl> <chr>    
 1 Pulik                       5        3          3        1        5 Corded   
 2 Spanish Water D…            5        4          3        1        4 Corded   
 3 Portuguese Wate…            5        5          4        2        4 Curly    
 4 Bernese Mountai…            5        5          5        5        3 Double   
 5 American Hairle…            5        5          3        1        1 Hairless 
 6 American Rearsn…            5        2          4        5       NA Rough    
 7 Setters (Irish)             5        5          5        3        3 Silky    
 8 Bearded Collies             4        5          5        3        4 Silky    
 9 Pugs                        5        5          4        4        2 Smooth   
10 Retrievers (Fla…            5        5          5        3        2 Smooth   
11 Redbone Coonhou…            5        5          5        3        2 Smooth   
12 Chinooks                    4        5          5        3        3 Smooth   
13 Cavalier King C…            5        5          5        2        2 Wavy     
14 Miniature Schna…            5        5          3        3        4 Wiry     
15 Portuguese Pode…            5        5          5        3        2 Wiry     
16 English Buttdra…            4        5          5        3        2 <NA>     
# ℹ 2 more variables: coat_length <chr>, mean_rating <dbl>