CASE
An introduction to the CASE expression.
CASE is a general conditional expression that allows you to test data for
equality with a specified value, partition data into ranges, or perform any
other conditional check against the data. It is similar to an if-then-else
statement in standard programming languages. The CASE expression is comprised
of the keywords CASE, WHEN, THEN, ELSE, and END. CASE, WHEN,
THEN, and END are required, ELSE is optional and if omitted will return
a null value if none of the conditions are met. The data is tested in order
it’s presented in the query, and the first value that meets the conditions will
determine the result of the CASE expression. It is good practice always
include the ELSE keyword to clarify your intent. There are two forms of the
CASE expression: The simple form and the general form.
Simple CASE Expression
The simple CASE expression can be used to test data for equality or other
numeric comparisons and has the column against which the test is run listed
after the CASE keyword.
If you wanted to classify sales agents by coastal or central regions you could run the following query:
SELECT sales_agent,
CASE regional_office
WHEN "East" THEN "Coast"
WHEN "West" THEN "Coast"
WHEN "Central" THEN "Central"
ELSE "Unknown"
END AS region
FROM sales_teams
The first several rows of the resulting table would look like this:
| sales_agent | region |
|---|---|
| Anna Snelling | Central |
| Cecily Lampkin | Central |
| Mei-Mei Johns | Central |
| Violet Mclelland | Coast |
| Corliss Cosme | Coast |
| Rosie Papadopoulos | Coast |
| Garret Kinder | Coast |
| Wilburn Farren | Coast |
| Elizabeth Anderson | Coast |
If you wanted to label deals according to a size you could run the query:
SELECT DISTINCT sales_agent,
CASE close_value
WHEN > 30000 THEN "1"
WHEN > 20000 THEN "2"
WHEN > 10000 THEN "3"
WHEN > 5000 THEN "4"
WHEN > 1000 THEN "5"
WHEN > 500 THEN "6"
WHEN > 250 THEN "7"
ELSE "8"
END AS `sales tier`
FROM sales_pipeline
WHERE deal_stage = "Won"
The first several rows of the resulting table would look like:
| sales_agent | sales tier |
|---|---|
| Moses Frase | 5 |
| Darcel Schlecht | 5 |
| Anna Snelling | 8 |
| Violet Mclelland | 8 |
| Cassey Cress | 5 |
| Corliss Cosme | 4 |
| Daniell Hammack | 5 |
| Darcel Schlecht | 4 |
| Gladys Colclough | 5 |
CASE can also be used in an ORDER BY clause to sort your results. The same
query sorted would be:
SELECT DISTINCT sales_agent,
CASE close_value
WHEN > 30000 THEN "1"
WHEN > 20000 THEN "2"
WHEN > 10000 THEN "3"
WHEN > 5000 THEN "4"
WHEN > 1000 THEN "5"
WHEN > 500 THEN "6"
WHEN > 250 THEN "7"
ELSE "8"
END AS `sales tier`
FROM sales_pipeline
WHERE deal_stage = "Won"
ORDER BY `sales tier`
The first several rows of the results would look like:
| sales_agent | sales tier |
|---|---|
| Rosalina Dieter | 1 |
| Elease Gluck | 2 |
| Markita Hansen | 2 |
| Rosalina Dieter | 2 |
| Anna Snelling | 4 |
| Boris Faz | 4 |
| Cassey Cress | 4 |
| Cecily Lampkin | 4 |
General CASE Expression
The general CASE expression is more powerful than the simple CASE
expression and can be used in all the instances where the simple CASE
expression can be used. Additionally it can contain arbitrary boolean
expressions including AND, OR, and NOT. In the general CASE expression,
the column names are located after the WHEN keyword.
We could write a query similar to the one above to show the ranking of sales agents into sales tiers using a range of values:
SELECT DISTINCT sales_agent,
CASE
WHEN close_value > 0 AND close_value < 250 THEN "bronze"
WHEN close_value >= 250 AND close_value < 500 THEN "silver"
WHEN close_value >= 500 AND close_value < 1000 THEN "gold"
WHEN close_value >= 1000 AND close_value < 5000 THEN "sapphire"
WHEN close_value >= 5000 AND close_value < 10000 THEN "ruby"
WHEN close_value >= 10000 AND close_value < 20000 THEN "emerald"
WHEN close_value >= 20000 AND close_value < 30000 THEN "diamond"
WHEN close_value > 30000 THEN "platinum"
ELSE NULL
END AS `sales tier`
FROM sales_pipeline
ORDER BY close_value DESC
The initial rows of the resulting table would be:
| sales_agent | sales tier |
|---|---|
| Rosalina Dieter | platinum |
| Markita Hansen | diamond |
| Elease Gluck | diamond |
| Rosalina Dieter | diamond |
| James Ascencio | ruby |
| Moses Frase | ruby |
| Daniell Hammack | ruby |
| Garret Kinder | ruby |
| Reed Clapper | ruby |
Exercises (Continued from previous section)
There are two ways to do these exercises. The first is to use the “Try query” links to test your queries without saving them. The second is to create a data.world project and save your queries to it. If you are reading this documentation and completing the exercises as a tutorial, you will need to create your own project to save your work. Details and instructions are in the SQL tutorial which has instructions for setting up your project and links to all the current exercises.
Exercise 30
Write a query against the observations_cleaned table using the general CASE expression to return the patient and description columns, the average of the value column, and unit column along with a new column labeled BMI category. The results are only for the records where the description is Body Mass Index. The entries in the BMI category column should be Underweight for values under 18.5, Healthy for values from 18.5 to 24.9, Overweight for values of 25-29.9, and Obese for values over 30.
Hint
SELECT the column patient, and the AVG of the (value) column relabeled AS `AVG BMI`. In the CASE WHEN the entry in the value column is < 18.5 THEN “Underweight” should be returned. WHEN the entry in the value column is >= 18.5 AND the entry in the value column is < 25 THEN “Healthy” should be returned. WHEN the entry in the value column is >= 25 AND the entry in the value column is < 30 THEN “Overweight” should be returned. WHEN the entry in the value column is >= 30 THEN “Obese” should be returned. The values returned should END AS entries in a column called `BMI Category`. This data should be selected FROM the observations_cleaned table for the records WHERE the description column = “Body Mass Index”. The results should be GROUPed BY patient, and `BMI category`.
An introduction to the expressions IN and NOT IN.