HAVING

An introduction to the HAVING clause.

HAVING is used with aggregations to filter out results returned by the aggregation. It is similar to WHERE except that WHERE removes values before the aggregation function is applied to the values, and HAVING removes values after aggregation has occurred.

In the last section we ran a query to count the number of deals won and group them by sales agent. But what if we only wanted to see the counts for those sales agents who had closed over 200 deals? We would have to run the aggregation first to get the counts for each, and then we would filter the aggregations using HAVING to see all of the sales agents who had more than 200 deals closed. The query would look like this:

SELECT sales_agent,
       COUNT(sales_pipeline.close_value) AS `number won`
  FROM sales_pipeline
 WHERE sales_pipeline.deal_stage = "Won"
 GROUP BY sales_pipeline.sales_agent
HAVING COUNT(sales_pipeline.close_value) > 200

The results table would look like this:

sales_agent number won
Vicki Laflamme 221
Anna Snelling 208
Kary Hendrixson 209
Darcel Schlecht 349

HAVING, GROUP BY, and FILTER can all be used in the same query to focus on results returned from very specific queries. For example, if we wanted to see a table with counts of deals won, counts of deals won where the value was over 1000, average value of all won deals, and average value of all deals won whose value was over 1000, and we only wanted to see this data for sales agents who had won more than 100 deals with a value of over 1000, we could write a query like this:

SELECT sales_agent,
       COUNT(sales_pipeline.close_value) AS `number won`,
       COUNT(sales_pipeline.close_value)
FILTER(WHERE sales_pipeline.close_value > 1000) AS `number won > 1000`,
       AVG(sales_pipeline.close_value) AS `average of all`,
       AVG(sales_pipeline.close_value)
FILTER(WHERE sales_pipeline.close_value > 1000) AS `avg > 1000`
  FROM sales_pipeline
 WHERE sales_pipeline.deal_stage = "Won"
 GROUP BY sales_pipeline.sales_agent
HAVING COUNT(sales_pipeline.close_value)
FILTER(WHERE sales_pipeline.close_value > 1000) > 100

We would get a very small set of data back that looks like this:

sales_agent number won number won > 1000 average of all avg > 1000
Darcel Schlecht 349 272 3304.3381088825213 4115.036764705882
Reed Clapper 155 110 2827.974193548387 3808.981818181818
Donn Cantrell 158 106 2821.8987341772154 3938.6132075471696
Vicki Laflamme 221 111 2164.68778280543 3985.108108108108
Cassey Cress 163 112 2763.7361963190183 3792.4821428571427

Exercises (Continued from previous section)

There are two ways to do these exercises. The first is to use the “Try query” links to test your queries without saving them. The second is to create a data.world project and save your queries to it. If you are reading this documentation and completing the exercises as a tutorial, you will need to create your own project to save your work. Details and instructions are in the SQL tutorial which has instructions for setting up your project and links to all the current exercises.

Exercise 29

For this exercise you’ll use your query result from exercise 28 and modify it so that only patients with an average BMI over 30 are included. Order the results by the maximum body mass index highest to lowest.

Try query

Hint

SELECT the column patient, the AVG of the (value) column relabeled AS `AVG BMI` , a COUNT of the (value) column relabeled AS `Number of Readings` , and the MAX of the (value) column. FILTER your query so that it only returns results (WHERE the value column is > 30). Label the filtered results from the value column AS `Max Obese BMI`. All of these columns should be selected FROM the observations_cleaned table. Restrict your results to only include records WHERE the description column = “Body Mass Index”. GROUP your results BY the patient column. Also restrict your results to those HAVING an AVG of the (value) column which is > 30. ORDER your results BY the MAX of the (value) column in DESCending order.

See solution