install.packages(c("here", "palmerpenguins", "remotes", "tidyverse", "knitr", "rmarkdown", "papaja", "tinytex", "dataReporter", "qualtRics", "readxl", "nycflights13", "lubridate", "ggthemes", "patchwork", "gt"))
Resources
Packages
To install all of the course packages:
If you are using Windows, first install RTools.
Then copy and paste the following line to install the packages needed for the course (click the little clipboard in top right corner to copy everything).
Warning: it may take a while for these to install, so don’t start this if you need access to your R session (or open another R session to do this).
Click Full package list below to see all packages with links to their websites.
Full package list
Getting started
Literate programming
Data import/validation
Data processing
Plotting and tables
Glossary
If you want to find definitions of the terms that we use in the course, check out the PsyTeachR Glossary
Function list
This is a list of all of the functions that we will be learning throughout the course. Note these may change as we progress through the course. Click Full function list below to see all functions with links to their websites.
Full function list
Packages
-
install.packages()
: install R packages -
library()
: load R packages -
::
export variable from package for use
Data types
-
>
,>=
,<
,<=
,==
,!=
,%in%
: logical operators that outputTRUE
orFALSE
-
typeof()
,class()
,str()
: outputs object type, class, and structure -
is.numeric()
,is.character()
,is.factor()
: checks whether object is numeric, character, factor -
as.numeric()
,as.character()
,as.factor()
: coerces (converts) object to numeric, character, factor -
is.na()
: checks whether object isNA
and outputs logical
Data structures
-
[]
: index elements in vector, matrix, data frame, tibble -
$
: index column by name in data frame, tibble, list -
:
,seq()
,rep()
: creates sequences and repetitions of numbers -
length()
: outputs length of vector -
dim()
,nrow()
,ncol()
: outputs dimensions, number of rows, number of columns of matrices, data frames, tibbles -
colnames()
: outputs (and can assign) column names -
head()
,tail()
,dplyr::glimpse()
: outputs compressed views of data frames, tibbles -
c()
,list()
,data.frame()
,tibble::tibble()
: creates vectors, matrices, data frames, tibbles
Importing data
-
here::here()
: starts path at project directory -
read.csv()
,write.csv()
,readr::read_csv()
,readr::write_csv()
: imports and writes CSV files -
readxl::read_excel()
: imports Excel files
Validating data
-
range()
,min()
,max()
: finds range, minimum, and maximum of vector -
unique()
: returns vector of unique (not duplicated) elements -
duplicated()
: returns logical vector of duplicated elements -
which()
: returns indices of which elements of a logical vector areTRUE
-
summary()
: when applied to day, gives summary statistics -
skimr::skim()
: outputs overview of data -
dataReporter::makeCodebook()
: creates codebook of data
Cleaning columns
-
dplyr::select()
: selects subset of columns from data frame, tibble -
dplyr::everything()
,dplyr::contains()
,dplyr::starts_with()
,dplyr::ends_with()
: helper functions forselect()
-
dplyr::relocate()
,dplyr::rename()
: moves and renames columns in data frame, tibble -
dplyr::mutate()
,dplyr::transmute()
: applies function to change existing column or create new column -
dplyr::across()
: applies function across multiple columns insidemutate()
-
dplyr::rowwise()
: applies function to each row -
%>%
: pipe operator that transfers output to the next command -
dplyr::pull()
: creates a vector from a data frame/tibble column
Wrangling rows
-
dplyr::filter()
: filters subset of rows from data frame, tibble -
dplyr::if_any()
: apply function to columns and return TRUE if any values are TRUE -
tidyr::drop_na()
: drop rows containing missing values -
dplyr::arrange()
,dplyr::desc()
: sorts rows by column variable, in descending order -
dplyr::group_by()
: groups data by column levels -
dplyr::summarise()
: applies function over whole column or group
Tidy data
-
tidyr::pivot_longer()
,tidyr::pivot_wider()
: reshapes data to be longer or wider -
tidyr::separate()
,tidyr::unite()
: separates or combines column data with separator -
dplyr::coalesce()
: find the first non-missing element -
tidyr::complete()
,tidyr::expand()
,tidyr::nesting()
: finds all unique combinations of levels
Merging data
-
dplyr::inner_join()
,dplyr::left_join()
,dplyr::right_join()
{target=“_blank”},dplyr::full_join()
: mutating joins that merge data frames -
dplyr::semi_join()
,dplyr::anti_join()
: filtering joins that filter data frame based on another data frame -
dplyr::join_by()
: join data frames with different names for key columns (requires{dplyer}
v. 1.1.0 or higher) -
tibble::add_row()
: manually add rows of data -
dplyr::bind_rows()
,dplyr::bind_cols()
: binds rows or columns to data frame -
dplyr::intersect()
,dplyr::setdiff()
,dplyr::union()
,dplyr::union_all()
: set operations to find overlap, differences, and combinations of data sets
Numbers
-
dplyr::count()
,dplyr::n()
,dplyr::n_distinct()
: count instances of group levels -
round()
: round digits -
format()
: format numbers -
cut()
: bin numbers into ranges
Strings
-
stringr::str_length()
: finds the number of characters in a string -
stringr::str_sub()
: extracts parts of strings based on character position -
stringr::str_to_lower()
,stringr::str_to_upper()
: converts all letters to lowercase or uppercase -
stringr::str_to_title()
,stringr::str_to_sentence()
: converts strings to title or sentence case -
stringr::str_c()
: combine character vectors into single string -
stringr::str_glue()
: combines strings with R output -
paste()
,paste0()
: combines strings with R output -
stringr::str_detect()
,stringr::str_subset()
,stringr::str_extract()
: detects, subsets, and extracts strings -
stringr::str_replace()
,stringr::str_replace_all()
: replaces patterns with strings -
stringr::str_split()
: splits strings based on separators
Factors
-
levels()
: prints factor levels -
forcats::fct_inorder()
,forcats::fct_rev()
: orders levels by order in data or in reverse of current order -
forcats::fct_relevel()
: manually reorders levels -
forcats::fct_reorder()
: orders levels based on another variable -
forcats::fct_recode()
: recodes level with new value -
forcats::fct_collapse()
: recodes multiple levels into single new value -
forcats::fct_lump_n()
,forcats::fct_lump_prop()
,forcats::fct_lump_min()
: lumps infrequent levels into level “Other”
Dates and times
-
lubridate::today()
,lubridate::now()
: print today’s date or time -
lubridate::as_date()
,lubridate::as_datetime()
: create date or date-time object -
lubridate::mdy()
,lubridate::dmy()
,lubridate::ymd()
: convert various date formats to YYYY-MM-DD -
lubridate::hms()
,lubridate::hm()
: convert times to HH:MM:SS -
lubridate::mdy_hm()
,lubridate::mdy_hms()
: converts various date-time formats to YYYY-MM-DD HH:MM:SS -
lubridate::year()
,lubridate::month()
,lubridate::day()
,lubridate::wday()
: extracts year, month, day, or weekday from date -
lubridate::hour()
,lubridate::minute()
,lubridate::second()
: extracts hour, minute, second from date
Iteration
-
for()
: create for loops -
purrr::map()
,purrr::map_dbl()
,purrr::map_chr()
,purrr::map_df()
: map functions to vector, data frame, or list and return list, numeric vector, character vector, or data frame -
split()
: divide data frame into groups in a list -
dir()
: return files in a directory
Grammar of graphics
-
ggplot2::ggplot()
: creates a ggplot -
+
: pipe operator for ggplots -
ggplot2::aes()
: defines aesthetic properties of plot -
alpha
,color
,fill
,linesize
,linetype
,shape
,size
arguments: properties for geometric objects -
ggplot2::theme()
: Modify components of a theme -
ggplot2::ggsave()
: saves ggplot to file
Color
-
ggplot2::scale_color_brewer()
,ggplot2::scale_fill_brewer()
: uses existing qualitative colors scales for color and fill -
ggplot2::scale_color_manual()
,ggplot2::scale_fill_manual()
: sets manual colors for color and fill -
ggplot2::scale_color_gradient()
,ggplot2::scale_fill_gradient()
: sets sequential color gradient for color and fill -
ggplot2::scale_color_distiller()
,ggplot2::scale_fill_distiller()
: sets diverging color scale for color and fill
Visualizing distributions
-
ggplot2::geom_histogram()
: plots histograms -
ggplot2::geom_freqpoly()
: plots frequency polygons -
ggplot2::geom_density()
: plots density plot -
ggplot2::geom_boxplot()
: plots boxplot -
ggplot2::geom_violin()
: plots violin plot -
ggplot2::stat_summary()
: plots summaries of data (e.g., means \(\pm\) standard error)
Visualizing amounts and proportions
-
dplyr::count()
: calculates counts of data by variables -
ggplot2::geom_bar()
: plots bar plot with raw data -
ggplot2::geom_col()
: plots bar plot with counts -
position
argument: controls whether data are stacked, dodged, jittered, nudged -
ggplot2::geom_point()
: plots scatterplots -
ggplot2::coord_flip()
: flips x and y coordinates
Visualizing x-y data
-
ggplot2::geom_abline()
: plots line with slope and intercept -
pairs()
: plots correlation plots -
GGally::ggpairs()
: plots correlation plots -
ggplot2::geom_tile()
: plots tile plot -
ggcorrplot::ggcorrplot()
: plots correlation heatmaps -
ggplot2::geom_line()
: plots line plot -
ggplot2::geom_area()
: plots area under curve or line plot -
ggplot2::geom_count()
: plots overlapping points as size -
ggplot2::geom_smooth()
: plots fitted lines and curves -
ggplot2::geom_rug()
: plots rug plot -
ggplot2::geom_pointrange()
: plots point and error bar -
ggplot2::geom_jitter()
: plots jittered points -
ggbeeswarm::geom_beeswarm()
: plots beeswarm plots -
gghalves::geom_half_violin()
,gghalves::geom_half_dotplot()
: plots raincloud plots
Finessing plots
-
ggplot2::geom_jitter()
: plots jittered scatterplot -
ggbeeswarm::geom_beeswarm()
: plots beeswarm plot -
ggplot2::scale_x_discrete()
,ggplot2::scale_y_discrete()
: adjusts discrete scale properties (e.g., limits, ticks) -
ggplot2::scale_x_continuous()
,ggplot2::scale_y_continuous()
: adjusts continuous scale properties (e.g., limits, ticks) -
ggplot2::lims()
,ggplot2::xlim()
,ggplot2::ylim()
: adjusts axis limits -
ggplot2::facet_wrap()
,ggplot2::facet_grid()
: creates facets based on discrete variables
Adorning plots
-
ggplot2::labs()
,ggplot2::xlab()
,ggplot2::ylab()
: replaces axis labels -
ggplot2::annotate()
: annotates plot with text, segments, rectangles, etc. -
ggplot2::geom_text()
: plots text as aesthetic property -
ggplot2::geom_hline()
,ggplot2::geom_vline()
: plots horizontal and vertical reference lines -
ggplot2::stat_ellipse()
: plots ellipse around data
Tables
-
knitr::kable()
: creates table from data frame -
kableExtra::kable_styling()
: styles table -
kableExtra::pack_rows()
,kableExtra::add_header_above()
: adds grouping variables to rows or columns -
kableExtra::footnote()
: adds table note -
kableExtra::landscape()
: rotates table to landscape orientation -
papaja::apa_table()
: formats data frame to APA style table -
papaja::apa_print()
: formats statistics to APA style
Flashcards
Flashcards can be a useful way to help learning functions and their descriptions. I created a package called {flashr}
that builds decks of HTML flashcards. You’re welcome to build your own decks of flashcards by installing the package and following the instructions for building decks. Or, you can use existing decks built for the course or for each of the chapters of R for Data Science (1st edition).
DPaViR flashcards
- Introduction (terms first) (definitions first)
- Coding and workflows (terms first) (definitions first)
- Data types (terms first) (definitions first)
- Data structures (terms first) (definitions first)
- Importing data (terms first) (definitions first)
- Validating data (terms first) (definitions first)
- Cleaning columns (terms first) (definitions first)
- Wrangling rows (terms first) (definitions first)
- Tidy data (terms first) (definitions first)
- Merging data (terms first) (definitions first)
- Numbers (terms first) (definitions first)
- Strings (terms first) (definitions first)
- Factors (terms first) (definitions first)
- Dates and times (terms first) (definitions first)
- Iteration (terms first) (definitions first)
R4DS flashcards
- Ch. 1 Introduction
- Ch. 3 Data visualization
- Ch. 5 Data transformation
- Ch. 7 Exploratory data analysis
- Ch. 8 Workflow: projects
- Ch. 10 Tibbles
- Ch. 11 Data import
- Ch. 12 Tidy data
- Ch. 13 Relational data
- Ch. 14 Strings
- Ch. 15 Factors
- Ch. 16 Dates and times
- Ch. 18 Pipes
- Ch. 20 Vectors
- Ch. 21 Iteration
- Ch. 23 Model basics
- Ch. 25 Many models
- Ch. 27 R Markdown
- Ch. 28 Graphics for communication
- Ch. 29 R Markdown formats
- Ch. 30 R Markdown workflow