data1
# A tibble: 4 × 4
id cond1 cond2 cond3
<int> <dbl> <dbl> <dbl>
1 1 0.104 0.808 0.898
2 2 0.584 0.661 0.997
3 3 0.675 0.891 0.293
4 4 0.687 0.0970 0.436
2023-02-24
What’s different between these data sets?
What needs to happen to create data2
from data1
?
data1
# A tibble: 4 × 4
id cond1 cond2 cond3
<int> <dbl> <dbl> <dbl>
1 1 0.104 0.808 0.898
2 2 0.584 0.661 0.997
3 3 0.675 0.891 0.293
4 4 0.687 0.0970 0.436
data2
id condition response
1 1 cond1 0.10362418
2 1 cond2 0.80773459
3 1 cond3 0.89752165
4 2 cond1 0.58419946
5 2 cond2 0.66144323
6 2 cond3 0.99686881
7 3 cond1 0.67497439
8 3 cond2 0.89147710
9 3 cond3 0.29255574
10 4 cond1 0.68710397
11 4 cond2 0.09700144
12 4 cond3 0.43590274
Each variable has its own column
Each observation has its own row
Each value has its own cell
table1
# A tibble: 6 × 4
country year cases population
<chr> <dbl> <dbl> <dbl>
1 Afghanistan 1999 745 19987071
2 Afghanistan 2000 2666 20595360
3 Brazil 1999 37737 172006362
4 Brazil 2000 80488 174504898
5 China 1999 212258 1272915272
6 China 2000 213766 1280428583
table2
# A tibble: 12 × 4
country year type count
<chr> <dbl> <chr> <dbl>
1 Afghanistan 1999 cases 745
2 Afghanistan 1999 population 19987071
3 Afghanistan 2000 cases 2666
4 Afghanistan 2000 population 20595360
5 Brazil 1999 cases 37737
6 Brazil 1999 population 172006362
7 Brazil 2000 cases 80488
8 Brazil 2000 population 174504898
9 China 1999 cases 212258
10 China 1999 population 1272915272
11 China 2000 cases 213766
12 China 2000 population 1280428583
table3
# A tibble: 6 × 3
country year rate
<chr> <dbl> <chr>
1 Afghanistan 1999 745/19987071
2 Afghanistan 2000 2666/20595360
3 Brazil 1999 37737/172006362
4 Brazil 2000 80488/174504898
5 China 1999 212258/1272915272
6 China 2000 213766/1280428583
table4a
# A tibble: 3 × 3
country `1999` `2000`
<chr> <dbl> <dbl>
1 Afghanistan 745 2666
2 Brazil 37737 80488
3 China 212258 213766
table4b
# A tibble: 3 × 3
country `1999` `2000`
<chr> <dbl> <dbl>
1 Afghanistan 19987071 20595360
2 Brazil 172006362 174504898
3 China 1272915272 1280428583
Think about tidy from a model perspective
Tidyverse assumes tidy data
Easier to analyze and plot tidy data
But sometimes easier to store non-tidy data
table4a
# A tibble: 3 × 3
country `1999` `2000`
<chr> <dbl> <dbl>
1 Afghanistan 745 2666
2 Brazil 37737 80488
3 China 212258 213766
table4a
not tidy?pivot_longer()
pivot_longer(table4a, cols = c(`1999`, `2000`),
names_to = "year", values_to = "cases")
# A tibble: 6 × 3
country year cases
<chr> <chr> <dbl>
1 Afghanistan 1999 745
2 Afghanistan 2000 2666
3 Brazil 1999 37737
4 Brazil 2000 80488
5 China 1999 212258
6 China 2000 213766
table2
# A tibble: 12 × 4
country year type count
<chr> <dbl> <chr> <dbl>
1 Afghanistan 1999 cases 745
2 Afghanistan 1999 population 19987071
3 Afghanistan 2000 cases 2666
4 Afghanistan 2000 population 20595360
5 Brazil 1999 cases 37737
6 Brazil 1999 population 172006362
7 Brazil 2000 cases 80488
8 Brazil 2000 population 174504898
9 China 1999 cases 212258
10 China 1999 population 1272915272
11 China 2000 cases 213766
12 China 2000 population 1280428583
table2
not tidy?pivot_wider()
pivot_wider(table2, id_cols = c("country", "year"),
names_from = type, values_from = count)
# A tibble: 6 × 4
country year cases population
<chr> <dbl> <dbl> <dbl>
1 Afghanistan 1999 745 19987071
2 Afghanistan 2000 2666 20595360
3 Brazil 1999 37737 172006362
4 Brazil 2000 80488 174504898
5 China 1999 212258 1272915272
6 China 2000 213766 1280428583
What code turns data1
into data2
? And vice versa?
data1
# A tibble: 4 × 4
id cond1 cond2 cond3
<int> <dbl> <dbl> <dbl>
1 1 0.104 0.808 0.898
2 2 0.584 0.661 0.997
3 3 0.675 0.891 0.293
4 4 0.687 0.0970 0.436
data2
id condition response
1 1 cond1 0.10362418
2 1 cond2 0.80773459
3 1 cond3 0.89752165
4 2 cond1 0.58419946
5 2 cond2 0.66144323
6 2 cond3 0.99686881
7 3 cond1 0.67497439
8 3 cond2 0.89147710
9 3 cond3 0.29255574
10 4 cond1 0.68710397
11 4 cond2 0.09700144
12 4 cond3 0.43590274