Working With Arrays

An introduction to working with arrays.

Traditional SQL databases store data as one value per field. More modern SQL databases can store multiple, indexed values of the same data type in a single field called an array. There is no standard set of SQL functions for arrays. Instead, different flavors of SQL each have their own functions and aggregations. The ones we support in our SQL implementation include: ARRAY, ARRAY_AGG, ARRAY_APPEND, ARRAY_CONCAT, ARRAY_JOIN, ARRAY_LENGTH, ARRAY_PREPEND, ARRAY_CONTAINS, and ELEMENT_AT.

For a full list of array functions see the reference section.

Arrays are currently only supported on tables from a live connection to an Athena database. You cannot use them with other databases or with tables derived from an Athena database.

Other features of our array implementation include:

  • We use a comma as an array separator. Note that (for the time being) commas in array values will result in an incorrect count of array items as we do not differentiate between commas in strings and the comma separator.

  • Array functions can be used with joins to other data sources, but at present the array functions must occur before joining or federation.

  • Indexing is one-based (the index starts with 1, not 0), and negative indexing is supported (See ELEMENT_AT below for more details).

The following examples are run against a small table called city_array that looks like this:

country_id cities
UK London, Stretford
SG Singapore
US Southlake, South San Francisco, South Brunswick, Seattle
CN Beijing
CH Geneva, Bern
BR Sao Paulo
NL Utrechtv
IN Bombayv
IT Roma, Venicev
AU Sydney
CA Toronto, Whitehorse
JP Tokyo, Hiroshima
DE Munich

ARRAY

The ARRAY command allows you to create an array from a group of values. The values can be field values, calculated values, literals, or a combination thereof. The only requirement is that all the values are of the same data type. The following is an example of creating an array using the city_array table and literals:

SELECT ARRAY(country_id, "kind", "of", "town")
  FROM city_array

A partial set of the results would be:

array
UK, kind, of, town
SG, kind, of, town
US, kind, of, town
CN, kind, of, town
CH, kind, of, town
BR, kind, of, town

ARRAY_AGG

The sole aggregation for arrays takes all the values in a column and aggregates them into one field. For example

SELECT ARRAY_AGG(country_id)
  FROM city_array

returns

array_agg
UK, SG, US, CN, CH, BR, NL, IN, IT, AU, CA, JP, DE

ARRAY_APPEND

The ARRAY_APPEND function is used to append values to an array. As with all array functions the values can be field values, calculated values, literals, or a combination thereof. The only requirement is that all the values are of the same data type. The following is an example of a query using ARRAY_APPEND:

SELECT country_id,
       ARRAY_APPEND(cities, "foo")
  FROM city_array
In the syntax of the command the array must be listed first followed by the appended value. `ARRAY_APPEND("foo", cities)` would not run.
country_id array_append
UK London, Stretford, foo
SG Singapore, foo
US Southlake, South San Francisco, South Brunswick, Seattle, foo
CN Beijing, foo
CH Geneva, Bern, foo

ARRAY_CONCAT

The ARRAY_CONCAT function combines multiple arrays into a single array:

SELECT country_id,
       ARRAY_CONCAT(cities, ARRAY("my", "kind", "of", "town"))
  FROM city_array
country_id array_concat
UK London, Stretford, my, kind, of, town
SG Singapore, my, kind, of, town
US Southlake, South San Francisco, South Brunswick, Seattle, my, kind, of, town
CN Beijing, my, kind, of, town

ARRAY_JOIN

This is the function to use if you want to concatenate all the values in an array field into one string value. You can specify an optional argument as a separator, and it can be any string. If you do not specify a separator, there will be nothing aded between the values. In this example we chose to use a forward slash:

SELECT country_id,
       ARRAY_JOIN(cities, "/")
  FROM city_array
country_id array_concat
UK London/Stretford
SG Singapore
US Southlake/South San Francisco/South Brunswick/Seattle
CN Beijing
CH Geneva/Bern
BR Sao Paulo

ARRAY_LENGTH

If you would like to know the number of items in an array you can use ARRAY_LENGTH:

SELECT country_id,
       cities,
       ARRAY_LENGTH(cities)
  FROM city_array
country_id cities array_length
UK London, Stretford 2
SG Singapore 1
US Southlake, South San Francisco, South Brunswick, Seattle 4
CN Beijing 1
CH Geneva, Bern 2
BR Sao Paulo 1

ARRAY_PREPEND

Similar to ARRAY_APPEND, ARRAY_PREPEND adds values to an array. Because the array always has to come first in the command, this function allows you to specify the output order:

SELECT country_id,
       ARRAY_PREPEND(cities, "foo")
  FROM city_array
country_id array_prepend
UK foo, London, Stretford
SG foo, Singapore
US foo, Southlake, South San Francisco, South Brunswick, Seattle
CN foo, Beijing
CH foo, Geneva, Bern
BR foo, Sao Paulo
NL foo, Utrecht

ARRAY_CONTAINS

The ARRAY_CONTAINS function evaluates a column for a specific value and returns true if the value exists in a row and false if it does not:

SELECT country_id,
       ARRAY_CONTAINS(cities, "London")
  FROM city_array

returns:

country_id array_contains
UK true
SG false
US false
CN false
CH false

ELEMENT_AT

Because the values in arrays are indexed, structured values (not strings as they can appear in a tabular format) you can find the values at different positions in the array. If you want to count from the beginning of the array, you use a positive integer If you want to count from the end of the array, use a negative integer:

SELECT country_id,
       ELEMENT_AT(cities, 1)
  FROM city_array

returns the first value in the array:

country_id element_at
UK London
SG Singapore
US Southlake
CN Beijing
CH Geneva

While

SELECT country_id,
       ELEMENT_AT(cities, -1)
  FROM city_array

returns the last value:

country_id element_at
UK Stretford
SG Singapore
US Seattle
CN Beijing
CH Bern

The ELEMENT_AT function comes in both a long version and a syntactically sugared light version. Both work equally well. The sugared versions are written like this:

SELECT country_id,
       cities[1]
  FROM city_array
SELECT country_id,
       cities[-1]
  FROM city_array