Data structures

Jeff Stevens

2023-02-06

Review

Mental model of data types

Vectors

Actually, everything in R is a vector

vector = atomic vector

elements with a single dimension of the same data type

Create vectors with `c()`

Numeric vectors

(myvec1 <- c(1, 5, 3, 6))

[1] 1 5 3 6

(myvec2 <- c(11, 14, 18, 12))

[1] 11 14 18 12

c(myvec1, myvec2)

[1]  1  5  3  6 11 14 18 12

Create vectors with `c()`

Character vectors

(myvec3 <- c("a", "b", "c"))

[1] "a" "b" "c"

Create vectors with `c()`

Strain your brain

What do you think will happen if you combine myvec2 and myvec3?

myvec2

[1] 11 14 18 12

myvec3

[1] "a" "b" "c"

c(myvec2, myvec3)

[1] "11" "14" "18" "12" "a"  "b"  "c"

Numeric vector `myvec2` converts to character vector to combine with `myvec3`

Create sequences with `seq()`

seq(from = 0, to = 20, by = 5)

[1]  0  5 10 15 20

seq(from = 20, to = 0, by = -5)

[1] 20 15 10  5  0

seq(0, 1, 0.2)

[1] 0.0 0.2 0.4 0.6 0.8 1.0

Create sequences with `:`

Sequences with increments of 1

4:9

[1] 4 5 6 7 8 9

9:4

[1] 9 8 7 6 5 4

Try it!

Make a sequence from 0 to 100 in steps of 10.

Create repetitions with `rep()`

Repeat single numbers

rep(0, times = 10)

 [1] 0 0 0 0 0 0 0 0 0 0

Create repetitions with `rep()`

Repeat vectors

rep(myvec3, times = 3)

[1] "a" "b" "c" "a" "b" "c" "a" "b" "c"

rep(c("d", "e", "f"), times = 3)

[1] "d" "e" "f" "d" "e" "f" "d" "e" "f"

Create repetitions with `rep()`

Repeat sequences

rep(1:4, times = 3)

 [1] 1 2 3 4 1 2 3 4 1 2 3 4

rep(1:4, each = 3)

 [1] 1 1 1 2 2 2 3 3 3 4 4 4

Try it!

Create a repetition of “yes” and “no” with 10 instance of each, alternating between the two. Then make one with 10 “yes” and then 10 “no”.

Working with vectors

Find vector length with `length()`

myvec3

[1] "a" "b" "c"

length(myvec3)

[1] 3

Try it!

How long is the combined vector of myvec1 and myvec2?

Checking `typeof()` and `str()`

myvec2

[1] 11 14 18 12

typeof(myvec2)

[1] "double"

str(myvec2)

 num [1:4] 11 14 18 12

myvec3

[1] "a" "b" "c"

typeof(myvec3)

[1] "character"

str(myvec3)

 chr [1:3] "a" "b" "c"

Index with `[]`

Tracks the content of a specific element (starting with 1)

myvec2

[1] 11 14 18 12

myvec2[2]

[1] 14

Allows subsetting

myvec2[2:4]

[1] 14 18 12

myvec2[c(4, 1, 3)]

[1] 12 11 18

Allows reassignment

myvec2[2] <- NA
myvec2

[1] 11 NA 18 12

Lists, data frames, and tibbles

Lists

Recursive vectors (vectors of vectors) potentially with different data types

(mylist <- list(a = 1:4, b = c(4, 3, 8, 5), c = LETTERS[10:15], d = c("yes", "yes")))

$a
[1] 1 2 3 4

$b
[1] 4 3 8 5

$c
[1] "J" "K" "L" "M" "N" "O"

$d
[1] "yes" "yes"

Working with lists

typeof(mylist)

[1] "list"

typeof(mylist$b)

[1] "double"

str(mylist)

List of 4
 $ a: int [1:4] 1 2 3 4
 $ b: num [1:4] 4 3 8 5
 $ c: chr [1:6] "J" "K" "L" "M" ...
 $ d: chr [1:2] "yes" "yes"

Data frames

List of named vectors of the same length (rectangular)

mydf <- data.frame(
  datetime = as.Date(c("2021-04-21 11:56:12", "2021-04-21 14:57:44", "2021-04-22 03:09:56", "2021-04-22 12:39:22")),
  session_complete = as.logical(c("TRUE", "TRUE", "TRUE", "FALSE")),
  condition = as.factor(c("control", "control", "experimental", "experimental")),
  mean_response = c(17.53, 24.45, 19.82, NA),
  age = c(19, 20, 19, NA),
  comments = c("none", "Great study", "toooo long", NA)
  )

Data frames

List of named vectors of the same length (rectangular)

mydf

    datetime session_complete    condition mean_response age    comments
1 2021-04-21             TRUE      control         17.53  19        none
2 2021-04-21             TRUE      control         24.45  20 Great study
3 2021-04-22             TRUE experimental         19.82  19  toooo long
4 2021-04-22            FALSE experimental            NA  NA        <NA>

typeof(mydf)

[1] "list"

str(mydf)

'data.frame':   4 obs. of  6 variables:
 $ datetime        : Date, format: "2021-04-21" "2021-04-21" ...
 $ session_complete: logi  TRUE TRUE TRUE FALSE
 $ condition       : Factor w/ 2 levels "control","experimental": 1 1 2 2
 $ mean_response   : num  17.5 24.4 19.8 NA
 $ age             : num  19 20 19 NA
 $ comments        : chr  "none" "Great study" "toooo long" NA

Creating data frames

Create new vectors

(mydf1 <- data.frame(subject = 1:3, 
                     response = 8:6))

  subject response
1       1        8
2       2        7
3       3        6

Combine existing vectors

var1 <- c(1:6)
var2 <- c(6:1)
var3 <- c(21:26)
mydf2 <- data.frame(var1, var2, 
                    resp = var3)
mydf2

  var1 var2 resp
1    1    6   21
2    2    5   22
3    3    4   23
4    4    3   24
5    5    2   25
6    6    1   26

Index with `[row, column]`

mydf1

  subject response
1       1        8
2       2        7
3       3        6

mydf1[2, 1]

[1] 2

mydf1[2, 1] <- 6
mydf1

  subject response
1       1        8
2       6        7
3       3        6

Index with `[row, column]`

Extract whole rows/columns

mydf1[2, ]

  subject response
2       6        7

mydf1[, 2]

[1] 8 7 6

Extract subsets

mydf1[2:3, 2]

[1] 7 6

mydf1[2:3, 1:2]

  subject response
2       6        7
3       3        6

Working with data frames

But extract columns by name with `$`

mydf1$response

[1] 8 7 6

mydf1$response[2]

[1] 7

mydf1$response[2:3]

[1] 7 6

Strain your brain

Why should you use column names rather than number?

Working with data frames

View first rows with `head()`

head(mtcars)

                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Note

Add the argument n = 10 to head(mtcars). What does this do?

Working with data frames

View dimensions

dim(mtcars)

[1] 32 11

nrow(mtcars)

[1] 32

ncol(mtcars)

[1] 11

Tibbles

Tibbles are just tidyverse versions of data frames

mydf2

  var1 var2 resp
1    1    6   21
2    2    5   22
3    3    4   23
4    4    3   24
5    5    2   25
6    6    1   26

(mytibble <- tibble::tibble(mydf2))

# A tibble: 6 × 3
   var1  var2  resp
  <int> <int> <int>
1     1     6    21
2     2     5    22
3     3     4    23
4     4     3    24
5     5     2    25
6     6     1    26

Mental model of data in R

Let’s code!

Data structures coding [Rmd]

Data structures

Review

Mental model of data types

Vectors

Vectors

Actually, everything in R is a vector

vector = atomic vector

Create vectors with c()

Numeric vectors

Create vectors with c()

Character vectors

Create vectors with c()

Numeric vector myvec2 converts to character vector to combine with myvec3

Create sequences with seq()

Create sequences with :

Sequences with increments of 1

Create repetitions with rep()

Repeat single numbers

Create repetitions with rep()

Repeat vectors

Create repetitions with rep()

Repeat sequences

Working with vectors

Find vector length with length()

Checking typeof() and str()

Index with []

Tracks the content of a specific element (starting with 1)

Allows subsetting

Allows reassignment

Lists, data frames, and tibbles

Lists

Recursive vectors (vectors of vectors) potentially with different data types

Working with lists

Data frames

List of named vectors of the same length (rectangular)

Data frames

List of named vectors of the same length (rectangular)

Creating data frames

Index with [row, column]

Index with [row, column]

Working with data frames

But extract columns by name with $

Working with data frames

View first rows with head()

Working with data frames

View dimensions

Tibbles

Tibbles are just tidyverse versions of data frames

Mental model of data in R

Let’s code!

Create vectors with `c()`

Create vectors with `c()`

Create vectors with `c()`

Numeric vector `myvec2` converts to character vector to combine with `myvec3`

Create sequences with `seq()`

Create sequences with `:`

Create repetitions with `rep()`

Create repetitions with `rep()`

Create repetitions with `rep()`

Find vector length with `length()`

Checking `typeof()` and `str()`

Index with `[]`

Index with `[row, column]`

Index with `[row, column]`

But extract columns by name with `$`

View first rows with `head()`