GROUP BY and FILTER

An introduction to the GROUP BY clause and FILTER modifier.

GROUP BY enables you to use aggregate functions on groups of data returned from a query. FILTER is a modifier used on an aggregate function to limit the values used in an aggregation. All the columns in the select statement that aren’t aggregated should be specified in a GROUP BY clause in the query.

GROUP BY

Returning to a previous section, when we were working with aggregations, we used the aggregate function AVG to find out the average deal size. If we wanted to know the average value of the deals won by each sales person from highest average to lowest, the query would look like:

SELECT sales_agent,
       AVG(close_value)
  FROM sales_pipeline
 WHERE sales_pipeline.deal_stage = "Won"
 GROUP BY sales_agent
 ORDER BY AVG(close_value) DESC
sales_agent avg
Elease Gluck 3614.9375
Darcel Schlecht 3304.3381088825213
Rosalina Dieter 3269.4861111111113
Daniell Hammack 3194.9912280701756
James Ascencio 3063.2074074074076
Rosie Papadopoulos 2950.8846153846152
Wilburn Farren 2866.181818181818
Reed Clapper 2827.974193548387
Donn Cantrell 2821.8987341772154

We could even ascertain the average value of deals aggregated by manager by running a query with a join like this:

SELECT sales_teams.manager,
       AVG(sales_pipeline.close_value)
  FROM sales_teams
       JOIN sales_pipeline ON (sales_teams.sales_agent = sales_pipeline.sales_agent)
 WHERE sales_pipeline.deal_stage = "Won"
 GROUP BY sales_teams.manager
manager avg
Dustin Brinkmann 1465.0107095046853
Summer Sewald 2372.886473429952
Melvin Marxen 2553.2086167800453
Cara Losch 2354.26875
Celia Rouche 2629.339344262295
Rocco Neubert 2837.257597684515
Though it’s not required by SQL, it is advisable to include all non-aggregated columns from your SELECT clause in your GROUP BY clause. If you don’t, there are cases where the query will return the desired results, there are also instances where a random value from the non-aggregated row will be used as the representative for all the values returned by the query.

For example, let’s say you wanted to know the average deal by sales agent for each of their customers. If you used the query:

SELECT sales_agent,
       account,
       SUM(close_value)
  FROM sales_pipeline
 WHERE sales_pipeline.deal_stage = "Won"
 GROUP BY sales_agent
 ORDER BY sales_agent

you would get back the following table which shows each sales agent one time and chooses a value at random from the accounts won by that sales person:

sales_agent account sum
Anna Snelling Bioholding 275,056
Boris Faz Mathtouch 261,631
Cassey Cress Sumace 450,489
Cecily Lampkin Funholding 229,800
Corliss Cosme Zencorporation 421,036
Daniell Hammack Zathunicon 364,229
Darcel Schlecht Isdom 1,153,214
Donn Cantrell Bluth Company 445,860
Elease Gluck Dalttechnology 289,195
Garret Kinder Konex 197,773

To get the average deal by sales agent for each account the query would look like this:

SELECT sales_agent,
       account,
       SUM(close_value)
  FROM sales_pipeline
 WHERE sales_pipeline.deal_stage = "Won"
 GROUP BY sales_agent, account
 ORDER BY sales_agent, account

The first several rows of the table returned would look like this:

sales_agent account sum
Anna Snelling Betatech 11,340
Anna Snelling Bioholding 12,382
Anna Snelling Cancity 1,496
Anna Snelling Codehow 22,479
Anna Snelling Condax 1,572
Anna Snelling Conecom 4,186
Anna Snelling Domzoom 6,639
Anna Snelling Dontechi 5,578
Anna Snelling Finhigh 1,762
Anna Snelling Funholding 5,654

FILTER

If you wanted to refine your query even more by running your aggregations against a limited set of the values in a column you could use the FILTER keyword. For example, if you wanted to know both the number of deals won by a sales agent and the number of those deals that had a value greater than 1000, you could use the query:

SELECT sales_agent,
       COUNT(sales_pipeline.close_value) AS total,
       COUNT(sales_pipeline.close_value)
FILTER(WHERE sales_pipeline.close_value > 1000) AS `over 1000`
  FROM sales_pipeline
 WHERE sales_pipeline.deal_stage = "Won"
 GROUP BY sales_pipeline.sales_agent

The first several rows of the resulting table would look like this:

sales_agent total over 1000
Boris Faz 101 70
Maureen Marcano 149 96
Vicki Laflamme 221 111
Donn Cantrell 158 106
Jonathan Berthelot 171 74
Wilburn Farren 55 38
Elease Gluck 80 32
Cassey Cress 163 112
James Ascencio 135 88
Kami Bicknell 174 78
Anna Snelling 208 68
Violet Mclelland 122 33
As we saw in the aggregate functions section, WHERE also limits the values in a query against which an aggregate function is run. FILTER is more flexible than WHERE because you can use more than one FILTER modifier in an aggregate query while you can only use only one WHERE clause.

For example:

SELECT sales_agent,
       COUNT(sales_pipeline.close_value) AS `number won`,
       COUNT(sales_pipeline.close_value)
FILTER(WHERE sales_pipeline.close_value > 1000) AS `number won > 1000`,
       AVG(sales_pipeline.close_value) AS `average of all`,
       AVG(sales_pipeline.close_value)
FILTER(WHERE sales_pipeline.close_value > 1000) AS `avg > 1000`
  FROM sales_pipeline
 WHERE sales_pipeline.deal_stage = "Won"
 GROUP BY sales_pipeline.sales_agent

The first several rows returned by the above query would look like:

sales_agent number won number > 1000 average of all avg > 1000
Rosalina Dieter 72 30 3,269.49 7,537.83
Daniell Hammack 114 91 3,194.999 3,869.47
Gladys Colclough 135 88 2,560.55 3,791.19
Rosie Papadopoulos 78 49 2,950.885 4,559.16
Kary Hendrixson 209 98 2,173.675 4,071.13
Cecily Lampkin 107 63 2,147.665 3,439.74
Lajuana Vencill 127 49 1,532.54 3,262.94
Markita Hansen 130 68 2,529.17 4,512.37
Moses Frase 129 51 1,606.067 3,441.45
Darcel Schlecht 349 272 3,304.34 4,115.04
Hayden Neloms 107 82 2,543.09 3,215.06
Reed Clapper 155 110 2,827.97 3,808.98
Zane Levy 161 99 2,671.23 4,038.43
Garret Kinder 75 46 2,636.97 4,041.37
Niesha Huffines 105 56 1,685.34 2,747.96