For these exercises, we’ll use the dog breed traits data set.
Load tidyverse, import dog_breed_traits_clean.csv
to traits
, and extract the breed column into an object called breeds
that randomly shuffles the breeds using 12 as a seed for randomization.
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.0 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
traits <- read_csv (here:: here ("data/dog_breed_traits_clean.csv" ), show_col_types = FALSE )
set.seed (12 )
breeds <- sample (traits$ breed)
View the breeds ending with the letter “s”.
str_view_all (breeds, "s$" )
Warning: `str_view_all()` was deprecated in stringr 1.5.0.
ℹ Please use `str_view()` instead.
[1] │ English Foxhound<s>
[2] │ Retrievers (Nova Scotia Duck Tolling)
[3] │ Coton de Tulear
[4] │ Norwegian Elkhound<s>
[5] │ Spaniels (Irish Water)
[6] │ Italian Greyhound<s>
[7] │ Chihuahua<s>
[8] │ Lakeland Terrier<s>
[9] │ English Buttdragger
[10] │ American Staffordshire Terrier<s>
[11] │ Bearded Collie<s>
[12] │ Beauceron<s>
[13] │ Maltese
[14] │ Silky Terrier<s>
[15] │ Belgian Tervuren
[16] │ Otterhound<s>
[17] │ Yorkshire Terrier<s>
[18] │ Entlebucher Mountain Dog<s>
[19] │ Scottish Terrier<s>
[20] │ Russell Terrier<s>
... and 177 more
Create a logical vector showing whether breeds have at least two words in their names.
[1] TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE FALSE
[13] FALSE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE
[25] FALSE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[37] TRUE TRUE TRUE TRUE FALSE FALSE TRUE TRUE TRUE FALSE TRUE TRUE
[49] TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE TRUE TRUE TRUE FALSE
[61] TRUE TRUE TRUE FALSE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
[73] TRUE FALSE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE FALSE TRUE
[85] TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE
[97] TRUE TRUE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[109] FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE TRUE TRUE FALSE TRUE
[121] TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE FALSE TRUE FALSE
[133] TRUE FALSE TRUE TRUE TRUE TRUE FALSE TRUE FALSE TRUE FALSE FALSE
[145] TRUE TRUE FALSE TRUE FALSE FALSE TRUE TRUE TRUE TRUE FALSE TRUE
[157] TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[169] TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[181] TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE TRUE TRUE TRUE
[193] TRUE TRUE TRUE TRUE TRUE
Extract the hounds (but don’t release them). That is, return a vector of all breeds that include the string “hound” or “Hound”.
str_subset (breeds, "hound|Hound" )
[1] "English Foxhounds" "Norwegian Elkhounds"
[3] "Italian Greyhounds" "Otterhounds"
[5] "Black and Tan Coonhounds" "Afghan Hounds"
[7] "Ibizan Hounds" "Plott Hounds"
[9] "Redbone Coonhounds" "Irish Wolfhounds"
[11] "American English Coonhounds" "Treeing Walker Coonhounds"
[13] "Bluetick Coonhounds" "Scottish Deerhounds"
[15] "American Foxhounds" "Greyhounds"
[17] "Pharaoh Hounds" "Basset Hounds"
[19] "Bloodhounds"
Extract the breeds that include the following pattern “<wildcard>ep”.
str_subset (breeds, ".ep" )
[1] "Icelandic Sheepdogs" "Shetland Sheepdogs"
[3] "Anatolian Shepherd Dogs" "Australian Shepherds"
[5] "Pyrenean Shepherds" "German Shepherd Dogs"
[7] "Bergamasco Sheepdogs" "Old English Sheepdogs"
[9] "Polish Lowland Sheepdogs" "Miniature American Shepherds"
[11] "Belgian Sheepdogs"
OK, maybe English Buttdragger isn’t the proper AKC name for this breed. Replace English Buttdragger with English Chaser.
str_replace (breeds, "English Buttdragger" , "English Chaser" )
[1] "English Foxhounds"
[2] "Retrievers (Nova Scotia Duck Tolling)"
[3] "Coton de Tulear"
[4] "Norwegian Elkhounds"
[5] "Spaniels (Irish Water)"
[6] "Italian Greyhounds"
[7] "Chihuahuas"
[8] "Lakeland Terriers"
[9] "English Chaser"
[10] "American Staffordshire Terriers"
[11] "Bearded Collies"
[12] "Beaucerons"
[13] "Maltese"
[14] "Silky Terriers"
[15] "Belgian Tervuren"
[16] "Otterhounds"
[17] "Yorkshire Terriers"
[18] "Entlebucher Mountain Dogs"
[19] "Scottish Terriers"
[20] "Russell Terriers"
[21] "Black and Tan Coonhounds"
[22] "Afghan Hounds"
[23] "Ibizan Hounds"
[24] "Azawakhs"
[25] "Borzois"
[26] "Spaniels (Cocker)"
[27] "Finnish Lapphunds"
[28] "Chinooks"
[29] "Cesky Terriers"
[30] "Plott Hounds"
[31] "Dogues de Bordeaux"
[32] "Icelandic Sheepdogs"
[33] "Border Collies"
[34] "Chow Chows"
[35] "Sealyham Terriers"
[36] "Miniature Schnauzers"
[37] "Petits Bassets Griffons Vendeens"
[38] "Retrievers (Golden)"
[39] "Bedlington Terriers"
[40] "Welsh Terriers"
[41] "Sloughis"
[42] "Akitas"
[43] "Norwegian Buhunds"
[44] "Shetland Sheepdogs"
[45] "Miniature Pinschers"
[46] "Lowchen"
[47] "Fox Terriers (Wire)"
[48] "Kerry Blue Terriers"
[49] "Redbone Coonhounds"
[50] "Anatolian Shepherd Dogs"
[51] "Soft Coated Wheaten Terriers"
[52] "Dandie Dinmont Terriers"
[53] "Lagotti Romagnoli"
[54] "Weimaraners"
[55] "Brittanys"
[56] "Collies"
[57] "Great Danes"
[58] "Berger Picards"
[59] "Spaniels (Clumber)"
[60] "Boxers"
[61] "Irish Wolfhounds"
[62] "Rhodesian Ridgebacks"
[63] "Norwegian Lundehunds"
[64] "Briards"
[65] "Setters (Irish)"
[66] "Bernese Mountain Dogs"
[67] "Giant Schnauzers"
[68] "Pointers"
[69] "Xoloitzcuintli"
[70] "Bulldogs"
[71] "Basenjis"
[72] "Harriers"
[73] "Siberian Huskies"
[74] "Whippets"
[75] "American English Coonhounds"
[76] "Doberman Pinschers"
[77] "Cardigan Welsh Corgis"
[78] "Tibetan Mastiffs"
[79] "Rat Terriers"
[80] "Dachshunds"
[81] "Retrievers (Chesapeake Bay)"
[82] "Chinese Crested"
[83] "Poodles"
[84] "Retrievers (Labrador)"
[85] "Fox Terriers (Smooth)"
[86] "Wirehaired Vizslas"
[87] "Bichons Frises"
[88] "West Highland White Terriers"
[89] "Miniature Bull Terriers"
[90] "Spaniels (Field)"
[91] "Australian Shepherds"
[92] "Bullmastiffs"
[93] "Pyrenean Shepherds"
[94] "Cirnechi dell Etna"
[95] "Chinese Shar-Pei"
[96] "Skye Terriers"
[97] "Norwich Terriers"
[98] "Treeing Walker Coonhounds"
[99] "Barbets"
[100] "Rottweilers"
[101] "Cairn Terriers"
[102] "Spanish Water Dogs"
[103] "Portuguese Podengo Pequenos"
[104] "Bluetick Coonhounds"
[105] "Shih Tzu"
[106] "Toy Fox Terriers"
[107] "Scottish Deerhounds"
[108] "Spaniels (Welsh Springer)"
[109] "Beagles"
[110] "German Shepherd Dogs"
[111] "Glen of Imaal Terriers"
[112] "American Foxhounds"
[113] "Bergamasco Sheepdogs"
[114] "Pugs"
[115] "Affenpinschers"
[116] "Pumik"
[117] "Setters (Gordon)"
[118] "French Bulldogs"
[119] "Leonbergers"
[120] "Pointers (German Wirehaired)"
[121] "Alaskan Malamutes"
[122] "Pembroke Welsh Corgis"
[123] "Nederlandse Kooikerhondjes"
[124] "Retrievers (Curly-Coated)"
[125] "Australian Terriers"
[126] "Cavalier King Charles Spaniels"
[127] "Retrievers (Flat-Coated)"
[128] "Mastiffs"
[129] "Shiba Inu"
[130] "Dalmatians"
[131] "Spaniels (American Water)"
[132] "Greyhounds"
[133] "Black Russian Terriers"
[134] "Salukis"
[135] "Spaniels (Sussex)"
[136] "Pharaoh Hounds"
[137] "Setters (English)"
[138] "Spaniels (English Cocker)"
[139] "Kuvaszok"
[140] "Cane Corso"
[141] "Pomeranians"
[142] "Great Pyrenees"
[143] "Schipperkes"
[144] "Papillons"
[145] "Finnish Spitz"
[146] "Tibetan Terriers"
[147] "Newfoundlands"
[148] "Neapolitan Mastiffs"
[149] "Samoyeds"
[150] "Keeshonden"
[151] "Setters (Irish Red and White)"
[152] "Greater Swiss Mountain Dogs"
[153] "Canaan Dogs"
[154] "St. Bernards"
[155] "Pulik"
[156] "Spinoni Italiani"
[157] "Old English Sheepdogs"
[158] "Tibetan Spaniels"
[159] "Japanese Chin"
[160] "Basset Hounds"
[161] "Havanese"
[162] "Wirehaired Pointing Griffons"
[163] "American Eskimo Dogs"
[164] "English Toy Spaniels"
[165] "Polish Lowland Sheepdogs"
[166] "Portuguese Water Dogs"
[167] "Irish Terriers"
[168] "Lhasa Apsos"
[169] "German Pinschers"
[170] "Border Terriers"
[171] "Komondorok"
[172] "Parson Russell Terriers"
[173] "Bouviers des Flandres"
[174] "Staffordshire Bull Terriers"
[175] "Norfolk Terriers"
[176] "Belgian Malinois"
[177] "Swedish Vallhunds"
[178] "Grand Basset Griffon Vendeens"
[179] "Brussels Griffons"
[180] "Pointers (German Shorthaired)"
[181] "Miniature American Shepherds"
[182] "Bloodhounds"
[183] "Australian Cattle Dogs"
[184] "Boerboels"
[185] "Dogo Argentinos"
[186] "Pekingese"
[187] "Bull Terriers"
[188] "Vizslas"
[189] "Standard Schnauzers"
[190] "Spaniels (English Springer)"
[191] "Airedale Terriers"
[192] "Spaniels (Boykin)"
[193] "Belgian Sheepdogs"
[194] "Manchester Terriers"
[195] "American Rearsniffer"
[196] "Boston Terriers"
[197] "American Hairless Terriers"
Replace all instances of “English” with “British” and then return the breeds that include “English” or “British” in them (to check our work).
str_replace (breeds, "English" , "British" ) |>
str_subset ("English|British" )
[1] "British Foxhounds" "British Buttdragger"
[3] "American British Coonhounds" "Setters (British)"
[5] "Spaniels (British Cocker)" "Old British Sheepdogs"
[7] "British Toy Spaniels" "Spaniels (British Springer)"
Extract the Spaniels and then separate the breed names into different strings for each word and create a matrix out of it.
breeds |>
str_subset ("spaniel|Spaniel" ) |>
str_split (" \\ s" , simplify = TRUE )
[,1] [,2] [,3] [,4]
[1,] "Spaniels" "(Irish" "Water)" ""
[2,] "Spaniels" "(Cocker)" "" ""
[3,] "Spaniels" "(Clumber)" "" ""
[4,] "Spaniels" "(Field)" "" ""
[5,] "Spaniels" "(Welsh" "Springer)" ""
[6,] "Cavalier" "King" "Charles" "Spaniels"
[7,] "Spaniels" "(American" "Water)" ""
[8,] "Spaniels" "(Sussex)" "" ""
[9,] "Spaniels" "(English" "Cocker)" ""
[10,] "Tibetan" "Spaniels" "" ""
[11,] "English" "Toy" "Spaniels" ""
[12,] "Spaniels" "(English" "Springer)" ""
[13,] "Spaniels" "(Boykin)" "" ""