SQL Reading Group

4-5pm Thursdays
B60 CB3

Book

SQL for Data Scientists by Renee Teate

MySQL Installation and Farmer’s Market Database

Schedule

Meeting Chapters
June 19 1: Data sources
2: SELECT
June 26 3: WHERE
July 3 NO MEETING
July 10 4: CASE
5: JOINS
July 17 6: Aggregating
July 24 7: Window functions
July 31 8: Date and time
August 7 9: Exploratory data analysis
August 14 10: Building datasets
August 21 11: Advanced query structures

Thought Questions/Exercises

Chapter 1: Data sources
  1. How is a relational database different from a spreadsheet? How are they similar?
  2. When would you want to store data in a relational database rather than a spreadsheet?
  3. What is the difference between a database and a schema?
  4. What is a primary key? What is a foreign key?
  5. What is the primary key for the customer table? What about the market_date_info table? What about the customer_purchases table?
  6. What are three foreign keys in the customer_purchases table?
Chapter 2: The SELECT Statement
  1. What are the two functions/statements required for all SQL queries?
  2. Are SQL functions/statements case sensitive? Try a query using SELECT, select, Select, and sElEcT. Do they all work?
  3. What is the role of spaces, line breaks, and tabs in SQL queries?
  4. What two pieces of information do you need to use the FROM statement?
  5. What is the default sorting order for ORDER BY?
  6. Which statement returns only first N rows, where the user can specify the number N?
  7. When you use ROUND to round a column of numbers, does that change the values in the original table?
  8. What happens if you concatenate two columns without specifying a delimiter?
  9. Write a query that combines the three columns of vendor_booth_assignments separated by underscores to create a column that is a primary key. Name that column key.
Chapter 3: The WHERE Clause
  1. Under what circumstances would you put quotes around a number?
  2. Let’s say your WHERE statement has three conditional statements. How many conditional statements have to be TRUE to keep a row when using AND? What about when using OR?
  3. If you want to use a WHERE statement to subset rows for customer_ids 3 and 4, which logical operator should you use?
  4. How do we specify the order of operations for evaluating conditional statements?
  5. Return a subset of the customer_purchases table for transactions involving more than 15 items purchased.
  6. Return a subset of the customer_purchases for customer ids 2-5 using three different techniques.
  7. Why does this statement not work? WHERE customer_id = NULL
  8. What is a subquery?
Chapter 4: CASE Statements
  1. What is feature engineering and why is it called that?
  2. How do you return values that include specific character strings?
  3. What happens to values that don’t meet the conditions specified in WHEN statements if there is no ELSE provided?
  4. What happens if a value in a column meets multiple WHEN conditional statements?
  5. What column name is assigned if you don’t assign a new column name with AS?
  6. How can you create dummy variables with CASE?
Chapter 5: SQL JOINs
  1. What does one-to-many mean in joins?
  2. What does LEFT refer to in a LEFT JOIN?
  3. How could we use SELECT to remove one of the duplicate column names for product_category_id in Figure 5.5?
  4. When using a LEFT JOIN, what happens if a key value in the left table does not have a corresponding key value in the right table?
  5. What is table aliasing, and how do you use it?
  6. How is an INNER JOIN different from a LEFT JOIN and RIGHT JOIN?
  7. How many joins can you do at a time?
Chapter 6: Aggregating Results for Analysis
  1. Why was vendor_id not included in the query generating Figure 6.6?
  2. What is the difference between COUNT and COUNT DISTINCT?
  3. Write a query that returns the earliest and latest transactions (using transaction_time) in the customer_purchases table.
  4. Write a query that returns the earliest and latest transactions (using transaction_time) for each vendor in the customer_purchases table.
  5. Write a query that returns the total number of different products for each vendor, along with the average quantity of items sold per vendor using the customer_purchases table.
  6. What is the difference between WHERE and HAVING?
Chapter 7: Window Functions and Subqueries
  1. How is a window function different from GROUP BY?
  2. Write a query that creates a column called daily_number that numbers all of the purchases in a day from customer_purchases in order from earliest to latest purchase.
  3. Write a query that uses the daily_number column to find the product information for the first purchase of every day.
  4. When there are subqueries, are the inner or outer queries processed first?
  5. How can you execute just a subquery part of a query?
  6. What is the difference between ROW_NUMBER and RANK?
Chapter 8: Date and Time Functions
  1. How would you return only a year from a datetime column?
  2. How are datetimes stored in MySQL?
  3. What is the difference between DATEDIFF and TIMESTAMPDIFF?
  4. What does CURDATE do?
  5. Write a query that finds the average duration of a market.
  6. Write a query that finds the average duration of a market for each market start time.
  7. Write a query that finds the time between first and last daily purchase for customers with more than one purchase in a day.
Chapter 9: Exploratory Data Analysis with SQL
Chapter 10: Building SQL Datasets for Analytical Reporting
Chapter 11: More Advanced Query Structures
Chapter 12: Creating Machine Learning Datasets Using SQL
Chapter 13: Analytical Dataset Development Examples
Chapter 14: Storing and Modifying Data