CASE

An introduction to the CASE expression.

CASE is a general conditional expression that allows you to test data for equality with a specified value, partition data into ranges, or perform any other conditional check against the data. It is similar to an if-then-else statement in standard programming languages. The CASE expression is comprised of the keywords CASE, WHEN, THEN, ELSE, and END. CASE, WHEN, THEN, and END are required, ELSE is optional and if omitted will return a null value if none of the conditions are met. The data is tested in order it’s presented in the query, and the first value that meets the conditions will determine the result of the CASE expression. It is good practice always include the ELSE keyword to clarify your intent. There are two forms of the CASE expression: The simple form and the general form.

Simple CASE Expression

The simple CASE expression can be used to test data for equality or other numeric comparisons and has the column against which the test is run listed after the CASE keyword.

If you wanted to classify sales agents by coastal or central regions you could run the following query:

SELECT sales_agent,
       CASE regional_office
         WHEN  "East" THEN "Coast"
         WHEN  "West" THEN "Coast"
         WHEN  "Central" THEN "Central"
         ELSE "Unknown"
       END AS region
  FROM sales_teams

The first several rows of the resulting table would look like this:

sales_agent region
Anna Snelling Central
Cecily Lampkin Central
Mei-Mei Johns Central
Violet Mclelland Coast
Corliss Cosme Coast
Rosie Papadopoulos Coast
Garret Kinder Coast
Wilburn Farren Coast
Elizabeth Anderson Coast

If you wanted to label deals according to a size you could run the query:

SELECT DISTINCT sales_agent,
       CASE close_value
         WHEN > 30000 THEN "1"
         WHEN > 20000 THEN "2"
         WHEN > 10000 THEN "3"
         WHEN > 5000 THEN "4"
         WHEN > 1000 THEN "5"
         WHEN > 500 THEN "6"
         WHEN > 250 THEN "7"
         ELSE "8"
       END AS `sales tier`
  FROM sales_pipeline
 WHERE deal_stage = "Won"

The first several rows of the resulting table would look like:

sales_agent sales tier
Moses Frase 5
Darcel Schlecht 5
Anna Snelling 8
Violet Mclelland 8
Cassey Cress 5
Corliss Cosme 4
Daniell Hammack 5
Darcel Schlecht 4
Gladys Colclough 5

CASE can also be used in an ORDER BY clause to sort your results. The same query sorted would be:

SELECT DISTINCT sales_agent,
       CASE close_value
         WHEN > 30000 THEN "1"
         WHEN > 20000 THEN "2"
         WHEN > 10000 THEN "3"
         WHEN > 5000 THEN "4"
         WHEN > 1000 THEN "5"
         WHEN > 500 THEN "6"
         WHEN > 250 THEN "7"
         ELSE "8"
       END AS `sales tier`
  FROM sales_pipeline
 WHERE deal_stage = "Won"
 ORDER BY `sales tier`

The first several rows of the results would look like:

sales_agent sales tier
Rosalina Dieter 1
Elease Gluck 2
Markita Hansen 2
Rosalina Dieter 2
Anna Snelling 4
Boris Faz 4
Cassey Cress 4
Cecily Lampkin 4

General CASE Expression

The general CASE expression is more powerful than the simple CASE expression and can be used in all the instances where the simple CASE expression can be used. Additionally it can contain arbitrary boolean expressions including AND, OR, and NOT. In the general CASE expression, the column names are located after the WHEN keyword.

We could write a query similar to the one above to show the ranking of sales agents into sales tiers using a range of values:

SELECT DISTINCT sales_agent,
       CASE
         WHEN close_value > 0 AND close_value < 250 THEN "bronze"
         WHEN close_value >= 250 AND close_value < 500 THEN "silver"
         WHEN close_value >= 500 AND close_value < 1000 THEN "gold"
         WHEN close_value >= 1000 AND close_value < 5000 THEN "sapphire"
         WHEN close_value >= 5000 AND close_value < 10000 THEN "ruby"
         WHEN close_value >= 10000 AND close_value < 20000 THEN "emerald"
         WHEN close_value >= 20000 AND close_value < 30000 THEN "diamond"
         WHEN close_value > 30000 THEN "platinum"
         ELSE NULL
       END AS `sales tier`
  FROM sales_pipeline
 ORDER BY close_value DESC

The initial rows of the resulting table would be:

sales_agent sales tier
Rosalina Dieter platinum
Markita Hansen diamond
Elease Gluck diamond
Rosalina Dieter diamond
James Ascencio ruby
Moses Frase ruby
Daniell Hammack ruby
Garret Kinder ruby
Reed Clapper ruby

Exercises (Continued from previous section)

There are two ways to do these exercises. The first is to use the “Try query” links to test your queries without saving them. The second is to create a data.world project and save your queries to it. If you are reading this documentation and completing the exercises as a tutorial, you will need to create your own project to save your work. Details and instructions are in the SQL tutorial which has instructions for setting up your project and links to all the current exercises.

Exercise 30

Write a query against the observations_cleaned table using the general CASE expression to return the patient and description columns, the average of the value column, and unit column along with a new column labeled BMI category. The results are only for the records where the description is Body Mass Index. The entries in the BMI category column should be Underweight for values under 18.5, Healthy for values from 18.5 to 24.9, Overweight for values of 25-29.9, and Obese for values over 30.

Try query

Hint

SELECT the column patient, and the AVG of the (value) column relabeled AS `AVG BMI`. In the CASE WHEN the entry in the value column is < 18.5 THEN “Underweight” should be returned. WHEN the entry in the value column is >= 18.5 AND the entry in the value column is < 25 THEN “Healthy” should be returned. WHEN the entry in the value column is >= 25 AND the entry in the value column is < 30 THEN “Overweight” should be returned. WHEN the entry in the value column is >= 30 THEN “Obese” should be returned. The values returned should END AS entries in a column called `BMI Category`. This data should be selected FROM the observations_cleaned table for the records WHERE the description column = “Body Mass Index”. The results should be GROUPed BY patient, and `BMI category`.

See solution