Matching patterns

Author

Jeffrey R. Stevens

Published

March 10, 2023

For these exercises, we’ll use the dog breed traits data set.

  1. Load tidyverse, import dog_breed_traits_clean.csv to traits, and extract the breed column into an object called breeds that randomly shuffles the breeds using 12 as a seed for randomization.
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.0     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
traits <- read_csv(here::here("data/dog_breed_traits_clean.csv"), show_col_types = FALSE)
set.seed(12)
breeds <- sample(traits$breed)
  1. View the breeds ending with the letter “s”.
str_view_all(breeds, "s$")
Warning: `str_view_all()` was deprecated in stringr 1.5.0.
ℹ Please use `str_view()` instead.
 [1] │ English Foxhound<s>
 [2] │ Retrievers (Nova Scotia Duck Tolling)
 [3] │ Coton de Tulear
 [4] │ Norwegian Elkhound<s>
 [5] │ Spaniels (Irish Water)
 [6] │ Italian Greyhound<s>
 [7] │ Chihuahua<s>
 [8] │ Lakeland Terrier<s>
 [9] │ English Buttdragger
[10] │ American Staffordshire Terrier<s>
[11] │ Bearded Collie<s>
[12] │ Beauceron<s>
[13] │ Maltese
[14] │ Silky Terrier<s>
[15] │ Belgian Tervuren
[16] │ Otterhound<s>
[17] │ Yorkshire Terrier<s>
[18] │ Entlebucher Mountain Dog<s>
[19] │ Scottish Terrier<s>
[20] │ Russell Terrier<s>
... and 177 more
  1. Create a logical vector showing whether breeds have at least two words in their names.
str_detect(breeds, " ")
  [1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE FALSE
 [13] FALSE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE
 [25] FALSE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
 [37]  TRUE  TRUE  TRUE  TRUE FALSE FALSE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE
 [49]  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE  TRUE  TRUE  TRUE FALSE
 [61]  TRUE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE
 [73]  TRUE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE FALSE  TRUE
 [85]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE
 [97]  TRUE  TRUE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
[109] FALSE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE  TRUE  TRUE FALSE  TRUE
[121]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE FALSE  TRUE FALSE
[133]  TRUE FALSE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE FALSE  TRUE FALSE FALSE
[145]  TRUE  TRUE FALSE  TRUE FALSE FALSE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE
[157]  TRUE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
[169]  TRUE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
[181]  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE
[193]  TRUE  TRUE  TRUE  TRUE  TRUE
  1. Extract the hounds (but don’t release them). That is, return a vector of all breeds that include the string “hound” or “Hound”.
str_subset(breeds, "hound|Hound")
 [1] "English Foxhounds"           "Norwegian Elkhounds"        
 [3] "Italian Greyhounds"          "Otterhounds"                
 [5] "Black and Tan Coonhounds"    "Afghan Hounds"              
 [7] "Ibizan Hounds"               "Plott Hounds"               
 [9] "Redbone Coonhounds"          "Irish Wolfhounds"           
[11] "American English Coonhounds" "Treeing Walker Coonhounds"  
[13] "Bluetick Coonhounds"         "Scottish Deerhounds"        
[15] "American Foxhounds"          "Greyhounds"                 
[17] "Pharaoh Hounds"              "Basset Hounds"              
[19] "Bloodhounds"                
  1. Extract the breeds that include the following pattern “<wildcard>ep”.
str_subset(breeds, ".ep")
 [1] "Icelandic Sheepdogs"          "Shetland Sheepdogs"          
 [3] "Anatolian Shepherd Dogs"      "Australian Shepherds"        
 [5] "Pyrenean Shepherds"           "German Shepherd Dogs"        
 [7] "Bergamasco Sheepdogs"         "Old English Sheepdogs"       
 [9] "Polish Lowland Sheepdogs"     "Miniature American Shepherds"
[11] "Belgian Sheepdogs"           
  1. OK, maybe English Buttdragger isn’t the proper AKC name for this breed. Replace English Buttdragger with English Chaser.
str_replace(breeds, "English Buttdragger", "English Chaser")
  [1] "English Foxhounds"                    
  [2] "Retrievers (Nova Scotia Duck Tolling)"
  [3] "Coton de Tulear"                      
  [4] "Norwegian Elkhounds"                  
  [5] "Spaniels (Irish Water)"               
  [6] "Italian Greyhounds"                   
  [7] "Chihuahuas"                           
  [8] "Lakeland Terriers"                    
  [9] "English Chaser"                       
 [10] "American Staffordshire Terriers"      
 [11] "Bearded Collies"                      
 [12] "Beaucerons"                           
 [13] "Maltese"                              
 [14] "Silky Terriers"                       
 [15] "Belgian Tervuren"                     
 [16] "Otterhounds"                          
 [17] "Yorkshire Terriers"                   
 [18] "Entlebucher Mountain Dogs"            
 [19] "Scottish Terriers"                    
 [20] "Russell Terriers"                     
 [21] "Black and Tan Coonhounds"             
 [22] "Afghan Hounds"                        
 [23] "Ibizan Hounds"                        
 [24] "Azawakhs"                             
 [25] "Borzois"                              
 [26] "Spaniels (Cocker)"                    
 [27] "Finnish Lapphunds"                    
 [28] "Chinooks"                             
 [29] "Cesky Terriers"                       
 [30] "Plott Hounds"                         
 [31] "Dogues de Bordeaux"                   
 [32] "Icelandic Sheepdogs"                  
 [33] "Border Collies"                       
 [34] "Chow Chows"                           
 [35] "Sealyham Terriers"                    
 [36] "Miniature Schnauzers"                 
 [37] "Petits Bassets Griffons Vendeens"     
 [38] "Retrievers (Golden)"                  
 [39] "Bedlington Terriers"                  
 [40] "Welsh Terriers"                       
 [41] "Sloughis"                             
 [42] "Akitas"                               
 [43] "Norwegian Buhunds"                    
 [44] "Shetland Sheepdogs"                   
 [45] "Miniature Pinschers"                  
 [46] "Lowchen"                              
 [47] "Fox Terriers (Wire)"                  
 [48] "Kerry Blue Terriers"                  
 [49] "Redbone Coonhounds"                   
 [50] "Anatolian Shepherd Dogs"              
 [51] "Soft Coated Wheaten Terriers"         
 [52] "Dandie Dinmont Terriers"              
 [53] "Lagotti Romagnoli"                    
 [54] "Weimaraners"                          
 [55] "Brittanys"                            
 [56] "Collies"                              
 [57] "Great Danes"                          
 [58] "Berger Picards"                       
 [59] "Spaniels (Clumber)"                   
 [60] "Boxers"                               
 [61] "Irish Wolfhounds"                     
 [62] "Rhodesian Ridgebacks"                 
 [63] "Norwegian Lundehunds"                 
 [64] "Briards"                              
 [65] "Setters (Irish)"                      
 [66] "Bernese Mountain Dogs"                
 [67] "Giant Schnauzers"                     
 [68] "Pointers"                             
 [69] "Xoloitzcuintli"                       
 [70] "Bulldogs"                             
 [71] "Basenjis"                             
 [72] "Harriers"                             
 [73] "Siberian Huskies"                     
 [74] "Whippets"                             
 [75] "American English Coonhounds"          
 [76] "Doberman Pinschers"                   
 [77] "Cardigan Welsh Corgis"                
 [78] "Tibetan Mastiffs"                     
 [79] "Rat Terriers"                         
 [80] "Dachshunds"                           
 [81] "Retrievers (Chesapeake Bay)"          
 [82] "Chinese Crested"                      
 [83] "Poodles"                              
 [84] "Retrievers (Labrador)"                
 [85] "Fox Terriers (Smooth)"                
 [86] "Wirehaired Vizslas"                   
 [87] "Bichons Frises"                       
 [88] "West Highland White Terriers"         
 [89] "Miniature Bull Terriers"              
 [90] "Spaniels (Field)"                     
 [91] "Australian Shepherds"                 
 [92] "Bullmastiffs"                         
 [93] "Pyrenean Shepherds"                   
 [94] "Cirnechi dell Etna"                   
 [95] "Chinese Shar-Pei"                     
 [96] "Skye Terriers"                        
 [97] "Norwich Terriers"                     
 [98] "Treeing Walker Coonhounds"            
 [99] "Barbets"                              
[100] "Rottweilers"                          
[101] "Cairn Terriers"                       
[102] "Spanish Water Dogs"                   
[103] "Portuguese Podengo Pequenos"          
[104] "Bluetick Coonhounds"                  
[105] "Shih Tzu"                             
[106] "Toy Fox Terriers"                     
[107] "Scottish Deerhounds"                  
[108] "Spaniels (Welsh Springer)"            
[109] "Beagles"                              
[110] "German Shepherd Dogs"                 
[111] "Glen of Imaal Terriers"               
[112] "American Foxhounds"                   
[113] "Bergamasco Sheepdogs"                 
[114] "Pugs"                                 
[115] "Affenpinschers"                       
[116] "Pumik"                                
[117] "Setters (Gordon)"                     
[118] "French Bulldogs"                      
[119] "Leonbergers"                          
[120] "Pointers (German Wirehaired)"         
[121] "Alaskan Malamutes"                    
[122] "Pembroke Welsh Corgis"                
[123] "Nederlandse Kooikerhondjes"           
[124] "Retrievers (Curly-Coated)"            
[125] "Australian Terriers"                  
[126] "Cavalier King Charles Spaniels"       
[127] "Retrievers (Flat-Coated)"             
[128] "Mastiffs"                             
[129] "Shiba Inu"                            
[130] "Dalmatians"                           
[131] "Spaniels (American Water)"            
[132] "Greyhounds"                           
[133] "Black Russian Terriers"               
[134] "Salukis"                              
[135] "Spaniels (Sussex)"                    
[136] "Pharaoh Hounds"                       
[137] "Setters (English)"                    
[138] "Spaniels (English Cocker)"            
[139] "Kuvaszok"                             
[140] "Cane Corso"                           
[141] "Pomeranians"                          
[142] "Great Pyrenees"                       
[143] "Schipperkes"                          
[144] "Papillons"                            
[145] "Finnish Spitz"                        
[146] "Tibetan Terriers"                     
[147] "Newfoundlands"                        
[148] "Neapolitan Mastiffs"                  
[149] "Samoyeds"                             
[150] "Keeshonden"                           
[151] "Setters (Irish Red and White)"        
[152] "Greater Swiss Mountain Dogs"          
[153] "Canaan Dogs"                          
[154] "St. Bernards"                         
[155] "Pulik"                                
[156] "Spinoni Italiani"                     
[157] "Old English Sheepdogs"                
[158] "Tibetan Spaniels"                     
[159] "Japanese Chin"                        
[160] "Basset Hounds"                        
[161] "Havanese"                             
[162] "Wirehaired Pointing Griffons"         
[163] "American Eskimo Dogs"                 
[164] "English Toy Spaniels"                 
[165] "Polish Lowland Sheepdogs"             
[166] "Portuguese Water Dogs"                
[167] "Irish Terriers"                       
[168] "Lhasa Apsos"                          
[169] "German Pinschers"                     
[170] "Border Terriers"                      
[171] "Komondorok"                           
[172] "Parson Russell Terriers"              
[173] "Bouviers des Flandres"                
[174] "Staffordshire Bull Terriers"          
[175] "Norfolk Terriers"                     
[176] "Belgian Malinois"                     
[177] "Swedish Vallhunds"                    
[178] "Grand Basset Griffon Vendeens"        
[179] "Brussels Griffons"                    
[180] "Pointers (German Shorthaired)"        
[181] "Miniature American Shepherds"         
[182] "Bloodhounds"                          
[183] "Australian Cattle Dogs"               
[184] "Boerboels"                            
[185] "Dogo Argentinos"                      
[186] "Pekingese"                            
[187] "Bull Terriers"                        
[188] "Vizslas"                              
[189] "Standard Schnauzers"                  
[190] "Spaniels (English Springer)"          
[191] "Airedale Terriers"                    
[192] "Spaniels (Boykin)"                    
[193] "Belgian Sheepdogs"                    
[194] "Manchester Terriers"                  
[195] "American Rearsniffer"                 
[196] "Boston Terriers"                      
[197] "American Hairless Terriers"           
  1. Replace all instances of “English” with “British” and then return the breeds that include “English” or “British” in them (to check our work).
str_replace(breeds, "English", "British") |> 
  str_subset("English|British")
[1] "British Foxhounds"           "British Buttdragger"        
[3] "American British Coonhounds" "Setters (British)"          
[5] "Spaniels (British Cocker)"   "Old British Sheepdogs"      
[7] "British Toy Spaniels"        "Spaniels (British Springer)"
  1. Extract the Spaniels and then separate the breed names into different strings for each word and create a matrix out of it.
breeds |> 
  str_subset("spaniel|Spaniel") |> 
  str_split("\\s", simplify = TRUE)
      [,1]       [,2]        [,3]        [,4]      
 [1,] "Spaniels" "(Irish"    "Water)"    ""        
 [2,] "Spaniels" "(Cocker)"  ""          ""        
 [3,] "Spaniels" "(Clumber)" ""          ""        
 [4,] "Spaniels" "(Field)"   ""          ""        
 [5,] "Spaniels" "(Welsh"    "Springer)" ""        
 [6,] "Cavalier" "King"      "Charles"   "Spaniels"
 [7,] "Spaniels" "(American" "Water)"    ""        
 [8,] "Spaniels" "(Sussex)"  ""          ""        
 [9,] "Spaniels" "(English"  "Cocker)"   ""        
[10,] "Tibetan"  "Spaniels"  ""          ""        
[11,] "English"  "Toy"       "Spaniels"  ""        
[12,] "Spaniels" "(English"  "Springer)" ""        
[13,] "Spaniels" "(Boykin)"  ""          ""