SQL Reading Group
4-5pm Thursdays
B60 CB3
Book
SQL for Data Scientists by Renee Teate
MySQL Installation and Farmer’s Market Database
- MySQL Documentation
- Windows
- MacOS
- MySQL Server (install version 8.0.42)
- MySQL Community Workbench
- Farmer’s Market Database
Schedule
Meeting | Chapters |
---|---|
June 19 | 1: Data sources 2: SELECT |
June 26 | 3: WHERE |
July 3 | NO MEETING |
July 10 | 4: CASE 5: JOINS |
July 17 | 6: Aggregating |
July 24 | 7: Window functions |
July 31 | 8: Date and time |
August 7 | 9: Exploratory data analysis |
August 14 | 10: Building datasets |
August 21 | 11: Advanced query structures |
Thought Questions/Exercises
Chapter 1: Data sources
- How is a relational database different from a spreadsheet? How are they similar?
- When would you want to store data in a relational database rather than a spreadsheet?
- What is the difference between a database and a schema?
- What is a primary key? What is a foreign key?
- What is the primary key for the
customer
table? What about themarket_date_info
table? What about thecustomer_purchases
table? - What are three foreign keys in the
customer_purchases
table?
Chapter 2: The SELECT Statement
- What are the two functions/statements required for all SQL queries?
- Are SQL functions/statements case sensitive? Try a query using
SELECT
,select
,Select
, andsElEcT
. Do they all work? - What is the role of spaces, line breaks, and tabs in SQL queries?
- What two pieces of information do you need to use the
FROM
statement? - What is the default sorting order for
ORDER BY
? - Which statement returns only first N rows, where the user can specify the number N?
- When you use
ROUND
to round a column of numbers, does that change the values in the original table? - What happens if you concatenate two columns without specifying a delimiter?
- Write a query that combines the three columns of
vendor_booth_assignments
separated by underscores to create a column that is a primary key. Name that columnkey
.
Chapter 3: The WHERE Clause
- Under what circumstances would you put quotes around a number?
- Let’s say your
WHERE
statement has three conditional statements. How many conditional statements have to be TRUE to keep a row when usingAND
? What about when usingOR
? - If you want to use a
WHERE
statement to subset rows for customer_ids 3 and 4, which logical operator should you use? - How do we specify the order of operations for evaluating conditional statements?
- Return a subset of the customer_purchases table for transactions involving more than 15 items purchased.
- Return a subset of the customer_purchases for customer ids 2-5 using three different techniques.
- Why does this statement not work?
WHERE customer_id = NULL
- What is a subquery?
Chapter 4: CASE Statements
- What is feature engineering and why is it called that?
- How do you return values that include specific character strings?
- What happens to values that don’t meet the conditions specified in
WHEN
statements if there is noELSE
provided? - What happens if a value in a column meets multiple
WHEN
conditional statements? - What column name is assigned if you don’t assign a new column name with
AS
? - How can you create dummy variables with
CASE
?
Chapter 5: SQL JOINs
- What does one-to-many mean in joins?
- What does
LEFT
refer to in aLEFT JOIN
? - How could we use
SELECT
to remove one of the duplicate column names for product_category_id in Figure 5.5? - When using a
LEFT JOIN
, what happens if a key value in the left table does not have a corresponding key value in the right table? - What is table aliasing, and how do you use it?
- How is an
INNER JOIN
different from aLEFT JOIN
andRIGHT JOIN
? - How many joins can you do at a time?
Chapter 6: Aggregating Results for Analysis
- Why was vendor_id not included in the query generating Figure 6.6?
- What is the difference between
COUNT
andCOUNT DISTINCT
? - Write a query that returns the earliest and latest transactions (using transaction_time) in the customer_purchases table.
- Write a query that returns the earliest and latest transactions (using transaction_time) for each vendor in the customer_purchases table.
- Write a query that returns the total number of different products for each vendor, along with the average quantity of items sold per vendor using the customer_purchases table.
- What is the difference between
WHERE
andHAVING
?
Chapter 7: Window Functions and Subqueries
- How is a window function different from
GROUP BY
? - Write a query that creates a column called daily_number that numbers all of the purchases in a day from customer_purchases in order from earliest to latest purchase.
- Write a query that uses the daily_number column to find the product information for the first purchase of every day.
- When there are subqueries, are the inner or outer queries processed first?
- How can you execute just a subquery part of a query?
- What is the difference between
ROW_NUMBER
andRANK
?
Chapter 8: Date and Time Functions
- How would you return only a year from a datetime column?
- How are datetimes stored in MySQL?
- What is the difference between
DATEDIFF
andTIMESTAMPDIFF
? - What does
CURDATE
do? - Write a query that finds the average duration of a market.
- Write a query that finds the average duration of a market for each market start time.
- Write a query that finds the time between first and last daily purchase for customers with more than one purchase in a day.