GROUP BY enables you to use aggregate functions on groups of data returned from a query.
FILTER is a modifier used on an aggregate function to limit the values used in an aggregation. All the columns in the select statement that aren’t aggregated should be specified in a
GROUP BY clause in the query.
Returning to a previous section, when we were working with aggregations, we used the aggregate function
AVG to find out the average deal size. If we wanted to know the average value of the deals won by each sales person from highest average to lowest, the query would look like:
SELECT sales_agent, AVG(close_value) FROM sales_pipeline WHERE sales_pipeline.deal_stage = "Won" GROUP BY sales_agent ORDER BY AVG(close_value) DESC
We could even ascertain the average value of deals aggregated by manager by running a query with a join like this:
SELECT sales_teams.manager, AVG(sales_pipeline.close_value) FROM sales_teams JOIN sales_pipeline ON (sales_teams.sales_agent = sales_pipeline.sales_agent) WHERE sales_pipeline.deal_stage = "Won" GROUP BY sales_teams.manager
SELECTclause in your
GROUP BYclause. If you don’t, there are cases where the query will return the desired results, there are also instances where a random value from the non-aggregated row will be used as the representative for all the values returned by the query.
For example, let’s say you wanted to know the average deal by sales agent for each of their customers. If you used the query:
SELECT sales_agent, account, SUM(close_value) FROM sales_pipeline WHERE sales_pipeline.deal_stage = "Won" GROUP BY sales_agent ORDER BY sales_agent
you would get back the following table which shows each sales agent one time and chooses a value at random from the accounts won by that sales person:
|Donn Cantrell||Bluth Company||445,860|
To get the average deal by sales agent for each account the query would look like this:
SELECT sales_agent, account, SUM(close_value) FROM sales_pipeline WHERE sales_pipeline.deal_stage = "Won" GROUP BY sales_agent, account ORDER BY sales_agent, account
The first several rows of the table returned would look like this:
If you wanted to refine your query even more by running your aggregations against a limited set of the values in a column you could use the
FILTER keyword. For example, if you wanted to know both the number of deals won by a sales agent and the number of those deals that had a value greater than 1000, you could use the query:
SELECT sales_agent, COUNT(sales_pipeline.close_value) AS total, COUNT(sales_pipeline.close_value) FILTER(WHERE sales_pipeline.close_value > 1000) AS `over 1000` FROM sales_pipeline WHERE sales_pipeline.deal_stage = "Won" GROUP BY sales_pipeline.sales_agent
The first several rows of the resulting table would look like this:
WHEREalso limits the values in a query against which an aggregate function is run.
FILTERis more flexible than
WHEREbecause you can use more than one
FILTERmodifier in an aggregate query while you can only use only one
SELECT sales_agent, COUNT(sales_pipeline.close_value) AS `number won`, COUNT(sales_pipeline.close_value) FILTER(WHERE sales_pipeline.close_value > 1000) AS `number won > 1000`, AVG(sales_pipeline.close_value) AS `average of all`, AVG(sales_pipeline.close_value) FILTER(WHERE sales_pipeline.close_value > 1000) AS `avg > 1000` FROM sales_pipeline WHERE sales_pipeline.deal_stage = "Won" GROUP BY sales_pipeline.sales_agent
The first several rows returned by the above query would look like:
|sales_agent||number won||number > 1000||average of all||avg > 1000|
An introduction to the