# Validating data

Author

Jeffrey R. Stevens

Published

February 10, 2023

For these exercises, we’ll use the `mtcars` data set build into base R.

1. What are the dimensions of `mtcars`?
``dim(mtcars)``
``[1] 32 11``
1. In one line of code, view the data types for all of the columns in `mtcars`.
``str(mtcars)``
``````'data.frame':   32 obs. of  11 variables:
\$ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
\$ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
\$ disp: num  160 160 108 258 360 ...
\$ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
\$ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
\$ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
\$ qsec: num  16.5 17 18.6 19.4 17 ...
\$ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
\$ am  : num  1 1 1 0 0 0 0 0 0 0 ...
\$ gear: num  4 4 4 3 3 3 3 4 4 4 ...
\$ carb: num  4 4 1 1 2 1 4 2 2 4 ...``````
1. What is the range of values for the `mpg` column?
``range(mtcars\$mpg)``
``[1] 10.4 33.9``
1. What are all of the possible values used in `gear`?
``unique(mtcars\$gear)``
``[1] 4 3 5``
1. Check whether the value 5 is found in the `carb` column.
``5 %in% mtcars\$carb``
``[1] FALSE``
1. Do any columns have missing values?
``summary(mtcars)``
``````      mpg             cyl             disp             hp
Min.   :10.40   Min.   :4.000   Min.   : 71.1   Min.   : 52.0
1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.5
Median :19.20   Median :6.000   Median :196.3   Median :123.0
Mean   :20.09   Mean   :6.188   Mean   :230.7   Mean   :146.7
3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0
Max.   :33.90   Max.   :8.000   Max.   :472.0   Max.   :335.0
drat             wt             qsec             vs
Min.   :2.760   Min.   :1.513   Min.   :14.50   Min.   :0.0000
1st Qu.:3.080   1st Qu.:2.581   1st Qu.:16.89   1st Qu.:0.0000
Median :3.695   Median :3.325   Median :17.71   Median :0.0000
Mean   :3.597   Mean   :3.217   Mean   :17.85   Mean   :0.4375
3rd Qu.:3.920   3rd Qu.:3.610   3rd Qu.:18.90   3rd Qu.:1.0000
Max.   :4.930   Max.   :5.424   Max.   :22.90   Max.   :1.0000
am              gear            carb
Min.   :0.0000   Min.   :3.000   Min.   :1.000
1st Qu.:0.0000   1st Qu.:3.000   1st Qu.:2.000
Median :0.0000   Median :4.000   Median :2.000
Mean   :0.4062   Mean   :3.688   Mean   :2.812
3rd Qu.:1.0000   3rd Qu.:4.000   3rd Qu.:4.000
Max.   :1.0000   Max.   :5.000   Max.   :8.000  ``````
1. What is the 3rd quartile for `mpg`?
``summary(mtcars\$mpg)``
``````   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
10.40   15.43   19.20   20.09   22.80   33.90 ``````
1. Check whether all horsepower (`hp`) values fall between 50 and 300. Which row numbers fall out of this range?
``which(mtcars\$hp < 50)``
``integer(0)``
``which(mtcars\$hp > 300)``
``[1] 31``
1. Make a codebook for `mtcars`.
``#dataReporter::makeCodebook(mtcars, replace = TRUE)``