Piping

Author

Jeffrey R. Stevens

Published

February 17, 2023

For these exercises, we’ll use the dog breed traits data set.

Create a pipeline to do all of the following:

assign pipeline to traits
import data from https://jeffreyrstevens.quarto.pub/dpavir/data/dog_breed_traits.csv
subset only the columns Breed through Coat Length
remove the Drooling Level column

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.0     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

traits <- read_csv("https://jeffreyrstevens.quarto.pub/dpavir/data/dog_breed_traits.csv") |> 
  select(Breed:`Coat Length`) |> 
  select(-`Drooling Level`)

Rows: 195 Columns: 17
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (3): Breed, Coat Type, Coat Length
dbl (14): Affectionate With Family, Good With Young Children, Good With Othe...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Rename the column names to "breed", "affectionate", "children", "other_dogs", "shedding", "grooming", "coat_type", "coat_length".

colnames(traits) <- c("breed", "affectionate", "children", "other_dogs", "shedding", "grooming", "coat_type", "coat_length")

Do the following using traits.

assign to traits2
rescale all of the ratings columns by subtracting 1 from all of the values
create a new column called coat that combines the coat_type and coat_length columns by pasting the values of those two columns separated by -
create a new column called shed that dichotomizes shedding such that values of 3 and above are “A lot” and values below 3 are “Not much” and places the new column after shedding
calculate the mean rating for the children and other_dogs columns in a column called mean_rating and place it after other_dogs

traits2 <- traits |> 
  mutate(across(affectionate:grooming, ~ .x - 1)) |> 
  mutate(coat = paste(coat_type, coat_length, sep = "-")) |> 
  mutate(shed = ifelse(shedding > 2, "A lot", "Not much"), .after = "shedding") |> 
  rowwise() %>%
  mutate(mean_rating = mean(c(children, other_dogs)), .after = "other_dogs")

Do the following using traits2.

assign to coat_grooming
subset only the grooming and coat_type columns
run a linear model (lm) using the formula grooming ~ coat_type (remember to use a placeholder for the data)
apply the summary() function
print the results to console

(coat_grooming <- traits2 |> 
  select(grooming, coat_type) |> 
  lm(grooming ~ coat_type, data = _) |> 
   summary())


Call:
lm(formula = grooming ~ coat_type, data = select(traits2, grooming, 
    coat_type))

Residuals:
    Min      1Q  Median      3Q     Max 
-2.5000 -0.5909  0.3134  0.4091  2.4091 

Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
(Intercept)         2.5000     0.4025   6.212 3.35e-09 ***
coat_typeCurly     -0.5000     0.5045  -0.991 0.322938    
coat_typeDouble    -0.9091     0.4145  -2.193 0.029520 *  
coat_typeHairless  -2.1667     0.6148  -3.524 0.000534 ***
coat_typeRough     -0.8333     0.6148  -1.356 0.176889    
coat_typeSilky     -0.1667     0.4837  -0.345 0.730805    
coat_typeSmooth    -1.8134     0.4143  -4.377 2.00e-05 ***
coat_typeWavy      -1.0000     0.5196  -1.925 0.055796 .  
coat_typeWiry      -1.2000     0.4284  -2.801 0.005636 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.8049 on 186 degrees of freedom
Multiple R-squared:  0.3054,    Adjusted R-squared:  0.2755 
F-statistic: 10.22 on 8 and 186 DF,  p-value: 8.381e-12