ROLLUP and CUBE
An introduction to ROLLUP
and CUBE
.
ROLLUP
and CUBE
make it possible for values to be aggregated on several
different axes satisfying more detailed reporting needs. ROLLUP
summarizes
against a hierarchy of columns used in the GROUP BY
clause. CUBE
groups by
all combinations of the values.
ROLLUP
If we wanted to see closed deals in the pipeline aggregated by sum for the total, the region, and the sales agent, we would write the query like this:
SELECT teams.regional_office,
pipeline.sales_agent,
SUM(pipeline.close_value)
FROM sales_pipeline AS pipeline, sales_teams AS teams
WHERE sales_pipeline.sales_agent = sales_teams.sales_agent
AND sales_pipeline.deal_stage = "Won"
GROUP BY ROLLUP(teams.regional_office, pipeline.sales_agent)
ORDER BY teams.regional_office, pipeline.sales_agent
And the returned data would look like this:
regional_office | sales_agent | sum |
---|---|---|
10,005,534 | ||
Central | 3,346,293 | |
Central | Anna Snelling | 275,056 |
Central | Cecily Lampkin | 229,800 |
Central | Darcel Schlecht | 1,153,214 |
Central | Gladys Colclough | 345,674 |
Central | Jonathan Berthelot | 284,886 |
Central | Lajuana Vencill | 194,632 |
Central | Marty Freudenburg | 291,195 |
Central | Moses Frase | 207,182 |
Central | Niesha Huffines | 176,961 |
Central | Versie Hillebrand | 187,693 |
East | 3,090,594 | |
East | Boris Faz | 261,631 |
East | Cassey Cress | 450,489 |
East | Corliss Cosme | 421,036 |
East | Daniell Hammack | 364,229 |
East | Donn Cantrell | 445,860 |
East | Garret Kinder | 197,773 |
East | Reed Clapper | 438,336 |
East | Rosie Papadopoulos | 230,169 |
East | Violet Mclelland | 123,431 |
East | Wilburn Farren | 157,640 |
West | 3,568,647 | |
West | Elease Gluck | 289,195 |
West | Hayden Neloms | 272,111 |
West | James Ascencio | 413,533 |
West | Kami Bicknell | 316,456 |
West | Kary Hendrixson | 454,298 |
West | Markita Hansen | 328,792 |
West | Maureen Marcano | 350,395 |
West | Rosalina Dieter | 235,403 |
West | Vicki Laflamme | 478,396 |
West | Zane Levy | 430,068 |
The row with no value in either of the first two columns is the sum of all the sales in all regions. The rows with data in the first column but not the second contain the sum for that region.
CUBE
The above example was great for ROLLUP
because there was an easy hierarchy with the entire pipeline at the top, then sales divided into regions, then the regions further divided into sales agents. However if we wanted to compare the sales numbers for products and sales agents, there isn’t a logical hierarchy: product is not a subcategory of sales agent, and sales agent is not a subcategory of product. A better result combination then might be to return all possible combinations of of the sums of products and sales agents. We would get those combinations using CUBE
:
SELECT product,
sales_agent,
SUM(close_value)
FROM sales_pipeline
WHERE sales_pipeline.deal_stage = "Won"
GROUP BY CUBE(product, sales_agent)
ORDER BY product, sales_agent
Scroll down the results table to see the format of the combinations.
product | sales_agent | sum |
---|---|---|
10005534 | ||
Anna Snelling | 275056 | |
Boris Faz | 261631 | |
Cassey Cress | 450489 | |
Cecily Lampkin | 229800 | |
Corliss Cosme | 421036 | |
Daniell Hammack | 364229 | |
Darcel Schlecht | 1153214 | |
Donn Cantrell | 445860 | |
Elease Gluck | 289195 | |
Garret Kinder | 197773 | |
Gladys Colclough | 345674 | |
Hayden Neloms | 272111 | |
James Ascencio | 413533 | |
Jonathan Berthelot | 284886 | |
Kami Bicknell | 316456 | |
Kary Hendrixson | 454298 | |
Lajuana Vencill | 194632 | |
Markita Hansen | 328792 | |
Marty Freudenburg | 291195 | |
Maureen Marcano | 350395 | |
Moses Frase | 207182 | |
Niesha Huffines | 176961 | |
Reed Clapper | 438336 | |
Rosalina Dieter | 235403 | |
Rosie Papadopoulos | 230169 | |
Versie Hillebrand | 187693 | |
Vicki Laflamme | 478396 | |
Violet Mclelland | 123431 | |
Wilburn Farren | 157640 | |
Zane Levy | 430068 | |
GTK 500 | 400612 | |
GTK 500 | Elease Gluck | 184632 |
GTK 500 | Markita Hansen | 83528 |
GTK 500 | Rosalina Dieter | 132452 |
GTX Basic | 499263 | |
GTX Basic | Anna Snelling | 17437 |
GTX Basic | Boris Faz | 9041 |
GTX Basic | Cassey Cress | 19699 |
GTX Basic | Cecily Lampkin | 8756 |
GTX Basic | Corliss Cosme | 19970 |
GTX Basic | Daniell Hammack | 6030 |
GTX Basic | Darcel Schlecht | 20124 |
GTX Basic | Donn Cantrell | 24462 |
GTX Basic | Elease Gluck | 7095 |
GTX Basic | Garret Kinder | 8532 |
GTX Basic | Gladys Colclough | 7721 |
GTX Basic | James Ascencio | 17599 |
GTX Basic | Jonathan Berthelot | 39204 |
GTX Basic | Kami Bicknell | 38543 |
GTX Basic | Kary Hendrixson | 44671 |
GTX Basic | Lajuana Vencill | 29903 |
The data returned from this query shows the sum of all the sales, followed by all sales by sales agent. Then the product info is folded in and the sum of the first product sales is given, followed by the the sum of that product’s sales by sales agent. The product sum followed by product sum by sales agent pattern continues for the rest of the results table.
If one of the columns that you are rolling up contains nulls then you may see ambiguous rows where you cannot tell if the null in that column is a value in a rows, or is due to the rollup itself. In that case you can use the GROUPING aggregation to disambiguate. Add a GROUPING as another column, referencing the column you are grouping by. A one in the GROUPING column means the value is due to rollup. Otherwise a zero will be shown:
query:
SELECT sales_agent,
account,
GROUPING(sales_agent),
GROUPING(account),
SUM(order_value)
FROM apr_2017_orders
GROUP BY ROLLUP(sales_agent, account)
ORDER BY sales_agent, account
result: |sales_agent|account|grouping|grouping_2|sum| |-|-|-|-|-| |sales_agent|account|grouping|grouping_2|sum| |-|-|-|-|-| |Boris Faz|Zencorporation|0|0|560| |Boris Faz|Zotware|0|0|59| |Boris Faz||0|1|18957| |Cassey Cress|Donquadtech|0|0|59| |Cassey Cress|Gekko & Co|0|0|1114| |Cassey Cress|Initech|0|0|3275|
An introduction to PIVOT
.