Windowed Aggregations

An introduction to windowed aggregations.

Windowed aggregations partition the results from a SQL query into groups in order to perform calculations across adjacent rows of the query result. Currently windowed aggregations cannot be combined in the same SELECT statement with GROUP BY, HAVING, or any other aggregations. They are placed before an ORDER BY clause if one is used. To invoke a WINDOW function you use a special syntax with the OVER clause to specify the WINDOW. There are three parts in a windowed aggregation:

PARTITION BY - The partition specification works similarly to the GROUP BY clause and indicates how the query results are divided into groups. It is always optional.
ORDER BY - The ordering specification determines in what order the aggregations will be applied. It is required for the new functions and optional when using standard aggregations as windowed functions.
ROWS BETWEENor RANGE BETWEEN - The window frame designates a subset of consecutive rows adjacent to the current row in the window that will be evaluated as a group. The difference between ROWS BETWEEN and RANGE BETWEEN is that ROWS BETWEEN specifies the distance in number of rows and RANGE BETWEEN specifies the distance in the values. Options for both are:
- N PRECEDING AND M FOLLOWING - where N and M are positive integers.
- N PRECEDING AND M PRECEDING - legal but uncommon
- N FOLLOWING AND M FOLLOWING - legal but uncommon
- CURRENT - can replace either a PRECEDING value or a FOLLOWING value as in ROWS BETWEEN CURRENT AND 7 FOLLOWING or ROWS BETWEEN 7 PRECEDING AND CURRENT
- UNBOUNDED- can replace either a PRECEDING value or a FOLLOWING value as in ROWS BETWEEN 7 PRECEDING AND UNBOUNDED or ROWS BETWEEN UNBOUNDED and CURRENT

If not specified, the default range is the beginning of the partition through the current row. It is always optional.

Per the SQL standard, the newly supported windowed functions do NOT work with WINDOW frames. WINDOW frames are only for other aggregate functions.

Windowed functions also take WINDOW clauses which are similar to WITH clauses elsewhere in SQL. They allow you to define a WINDOW in the clause and then finish it’s definition in the aggregation, but you can’t actually override anything.That is to say, you can specify one or two of the three parts of the aggregation (PARTITION BY, ORDER BY, or ROWS BETWEEN) but they can’t also be specified in the aggregation. WINDOW clauses can also be nested. In other words you can define one, call it from another, and then use the second one in the aggregation. Though there are many ways you can use WINDOW clauses, the most common one is to fully define a window within the clause and then use it multiple times in different aggregations.

Following are some examples of common uses of windowed functions:

`AVG OVER`

If we wanted to know the average order value for a customer ordered from oldest to newest and grouped into threes (an average of the current order, the preceding order, and the following order) the query would look like this:

SELECT account, create_date, AVG(order_value) OVER (PARTITION BY account ORDER BY create_date ROWS BETWEEN 1 preceding and 1 following)
  FROM jun_2017_orders

sparklesquad/windowed-aggregations Run query Copy code

And a subset of the results would look like this:

account	create_date	avg
Betasoloin	6/3/17	3993.5
Betasoloin	6/4/17	4159.666667
Betasoloin	6/6/17	4187
Betasoloin	6/9/17	3570
Betasoloin	6/24/17	3109
Gekko & Co	6/2/17	3189.5
Gekko & Co	6/11/17	2480.333333
Gekko & Co	6/12/17	878
Gekko & Co	6/14/17	542

`ROW_NUMBER`

This is a useful function for knowing what row you are on within your query results:

SELECT name, regional_office, ROW_NUMBER() OVER ( PARTITION BY regional_office ORDER BY name)
  FROM employees

sparklesquad/windowed-aggregations Run query Copy code

Results show the row numbers based on the order by constraint:

name	regional_office	row_number
Carl Lin	West	1
Carol Thompson	West	2
Elease Gluck	West	3
Hayden Neloms	West	4
James Ascencio	West	5
Kami Bicknell	West	6
Kary Hendrixson	West	7
Markita Hansen	West	8
Maureen Marcano	West	9

`RANK`

The are a few things to note about rank on a range of values in a window partition:

If ranked values are the same they get the same rank.
A row that doesn’t have the same value as the preceding row has a rank equal to its row number.
Because of the way it handles ties, RANK can have gaps in the sequence.

Here is an example of a query that returns the home run ranking of baseball players in 2000 from the Lahman Sabermetrics dataset:

SELECT playerid, HR, RANK() OVER ( ORDER BY HR DESC)
  FROM batting
 WHERE yearId = 2000
 ORDER BY HR DESC, playerid

sparklesquad/windowed-aggregations Run query Copy code

And here are the results:

playerid	HR	rank
sosasa01	50	1
bondsba01	49	2
bagweje01	47	3
glaustr01	47	3
guerrvl01	44	5
hidalri01	44	5
giambja01	43	7
sheffga01	43	7
thomafr04	43	7
edmonji01	42	10
heltoto01	42	10
batisto01	41	12
delgaca01	41	12
rodrial01	41	12
griffke02	40	15

`FIRST_VALUE`

The FIRST_VALUE function returns the first value for each partition, as in this query which returns the first order value for each account in the jun_2017_ orders table:

SELECT account, create_date, order_value, FIRST_VALUE(order_value) OVER (PARTITION BY account ORDER BY create_date)
  FROM jun_2017_orders

sparklesquad/windowed-aggregations Run query Copy code

account	create_date	order_value	first_value
Betasoloin	6/3/17	5100	5100
Betasoloin	6/4/17	2887	5100
Betasoloin	6/9/17	5182	5100
Betasoloin	6/24/17	1036	5100
Gekko & Co	6/2/17	5304	5304
Gekko & Co	6/11/17	1075	5304
Gekko & Co	6/12/17	1062	5304
Gekko & Co	6/14/17	497	5304
Gekko & Co	6/19/17	67	5304
Gekko & Co	6/25/17	5031	5304
Gekko & Co	6/26/17	5622	5304
Plexzap	6/11/17	3850	3850

`LEAD`

If you want to know something like both the current order value and the next order value for a customer you could use the LEAD function. LEAD takes three arguments:

The name of the column to evaluate
The number of rows to offset
The value to put if there are no more rows that meet the partition requirement

Here is an example of a query that aggregates the results by account and either returns the next row’s value for the company or returns -1 if it’s the last order for the account:

SELECT account, create_date, order_value, LEAD(order_value, 1, -1) OVER (PARTITION BY account ORDER BY create_date)
  FROM jun_2017_orders

sparklesquad/windowed-aggregations Run query Copy code

account	create_date	order_value	lead
Betasoloin	6/3/17	5100	2887
Betasoloin	6/4/17	2887	4492
Betasoloin	6/6/17	4492	5182
Betasoloin	6/9/17	5182	1036
Betasoloin	6/24/17	1036	-1
Gekko & Co	6/2/17	5304	1075
Gekko & Co	6/11/17	1075	1062
Gekko & Co	6/12/17	1062	497

`LAG`

The LAG function works like the LEAD function except that it returns a value from any preceding row:

SELECT account, create_date, order_value, LAG(order_value, 1, -1) OVER (PARTITION BY account ORDER BY create_date)
  FROM jun_2017_orders

sparklesquad/windowed-aggregations Run query Copy code

count	create_date	order_value	lag
Betatech	2017-06-02	3666	-1
Betatech	2017-06-05	460	3666
Betatech	2017-06-06	41	460
Betatech	2017-06-11	1067	41
Betatech	2017-06-12	5370	1067
Betatech	2017-06-17	3822	5370
Betatech	2017-06-24	4410	3822
Betatech	2017-06-25	619	4410
Ganjaflex	2017-06-02	3780	-1

Here is a list of all the new windowed functions with links to their reference pages:

Next up: ROLLUP and CUBE

An introduction to ROLLUP and CUBE.