CASE
An introduction to the CASE
expression.
CASE
is a general conditional expression that allows you to test data for
equality with a specified value, partition data into ranges, or perform any
other conditional check against the data. It is similar to an if-then-else
statement in standard programming languages. The CASE
expression is comprised
of the keywords CASE
, WHEN
, THEN
, ELSE
, and END
. CASE
, WHEN
,
THEN
, and END
are required, ELSE
is optional and if omitted will return
a null value if none of the conditions are met. The data is tested in order
it’s presented in the query, and the first value that meets the conditions will
determine the result of the CASE
expression. It is good practice always
include the ELSE
keyword to clarify your intent. There are two forms of the
CASE
expression: The simple form and the general form.
Simple CASE
Expression
The simple CASE
expression can be used to test data for equality or other
numeric comparisons and has the column against which the test is run listed
after the CASE
keyword.
If you wanted to classify sales agents by coastal or central regions you could run the following query:
SELECT sales_agent,
CASE regional_office
WHEN "East" THEN "Coast"
WHEN "West" THEN "Coast"
WHEN "Central" THEN "Central"
ELSE "Unknown"
END AS region
FROM sales_teams
The first several rows of the resulting table would look like this:
sales_agent | region |
---|---|
Anna Snelling | Central |
Cecily Lampkin | Central |
Mei-Mei Johns | Central |
Violet Mclelland | Coast |
Corliss Cosme | Coast |
Rosie Papadopoulos | Coast |
Garret Kinder | Coast |
Wilburn Farren | Coast |
Elizabeth Anderson | Coast |
If you wanted to label deals according to a size you could run the query:
SELECT DISTINCT sales_agent,
CASE close_value
WHEN > 30000 THEN "1"
WHEN > 20000 THEN "2"
WHEN > 10000 THEN "3"
WHEN > 5000 THEN "4"
WHEN > 1000 THEN "5"
WHEN > 500 THEN "6"
WHEN > 250 THEN "7"
ELSE "8"
END AS `sales tier`
FROM sales_pipeline
WHERE deal_stage = "Won"
The first several rows of the resulting table would look like:
sales_agent | sales tier |
---|---|
Moses Frase | 5 |
Darcel Schlecht | 5 |
Anna Snelling | 8 |
Violet Mclelland | 8 |
Cassey Cress | 5 |
Corliss Cosme | 4 |
Daniell Hammack | 5 |
Darcel Schlecht | 4 |
Gladys Colclough | 5 |
CASE
can also be used in an ORDER BY
clause to sort your results. The same
query sorted would be:
SELECT DISTINCT sales_agent,
CASE close_value
WHEN > 30000 THEN "1"
WHEN > 20000 THEN "2"
WHEN > 10000 THEN "3"
WHEN > 5000 THEN "4"
WHEN > 1000 THEN "5"
WHEN > 500 THEN "6"
WHEN > 250 THEN "7"
ELSE "8"
END AS `sales tier`
FROM sales_pipeline
WHERE deal_stage = "Won"
ORDER BY `sales tier`
The first several rows of the results would look like:
sales_agent | sales tier |
---|---|
Rosalina Dieter | 1 |
Elease Gluck | 2 |
Markita Hansen | 2 |
Rosalina Dieter | 2 |
Anna Snelling | 4 |
Boris Faz | 4 |
Cassey Cress | 4 |
Cecily Lampkin | 4 |
General CASE
Expression
The general CASE
expression is more powerful than the simple CASE
expression and can be used in all the instances where the simple CASE
expression can be used. Additionally it can contain arbitrary boolean
expressions including AND
, OR
, and NOT
. In the general CASE
expression,
the column names are located after the WHEN
keyword.
We could write a query similar to the one above to show the ranking of sales agents into sales tiers using a range of values:
SELECT DISTINCT sales_agent,
CASE
WHEN close_value > 0 AND close_value < 250 THEN "bronze"
WHEN close_value >= 250 AND close_value < 500 THEN "silver"
WHEN close_value >= 500 AND close_value < 1000 THEN "gold"
WHEN close_value >= 1000 AND close_value < 5000 THEN "sapphire"
WHEN close_value >= 5000 AND close_value < 10000 THEN "ruby"
WHEN close_value >= 10000 AND close_value < 20000 THEN "emerald"
WHEN close_value >= 20000 AND close_value < 30000 THEN "diamond"
WHEN close_value > 30000 THEN "platinum"
ELSE NULL
END AS `sales tier`
FROM sales_pipeline
ORDER BY close_value DESC
The initial rows of the resulting table would be:
sales_agent | sales tier |
---|---|
Rosalina Dieter | platinum |
Markita Hansen | diamond |
Elease Gluck | diamond |
Rosalina Dieter | diamond |
James Ascencio | ruby |
Moses Frase | ruby |
Daniell Hammack | ruby |
Garret Kinder | ruby |
Reed Clapper | ruby |
Exercises (Continued from previous section)
There are two ways to do these exercises. The first is to use the “Try query” links to test your queries without saving them. The second is to create a data.world project and save your queries to it. If you are reading this documentation and completing the exercises as a tutorial, you will need to create your own project to save your work. Details and instructions are in the SQL tutorial which has instructions for setting up your project and links to all the current exercises.
Exercise 30
Write a query against the observations_cleaned table using the general CASE expression to return the patient and description columns, the average of the value column, and unit column along with a new column labeled BMI category. The results are only for the records where the description is Body Mass Index. The entries in the BMI category column should be Underweight for values under 18.5, Healthy for values from 18.5 to 24.9, Overweight for values of 25-29.9, and Obese for values over 30.
Hint
SELECT
the column patient,
and the AVG
of the (value)
column relabeled AS
`AVG BMI`
. In the CASE
WHEN
the entry in the value
column is < 18.5
THEN
“Underweight”
should be returned. WHEN
the entry in the value
column is >= 18.5
AND
the entry in the value
column is < 25
THEN
“Healthy”
should be returned. WHEN
the entry in the value
column is >= 25
AND
the entry in the value
column is < 30
THEN
“Overweight”
should be returned. WHEN
the entry in the value
column is >= 30
THEN
“Obese”
should be returned. The values returned should END
AS
entries in a column called `BMI Category`
. This data should be selected FROM
the observations_cleaned
table for the records WHERE
the description
column =
“Body Mass Index”
. The results should be GROUP
ed BY
patient,
and `BMI category`
.
An introduction to the expressions IN
and NOT IN
.