Docs portal

Querying data

After you have added data to your project, create queries to extract information from the data sources and make it consumable for users. All data files are normalized so they are immediately queryable and joinable, whether they are similar formats or not. This lets you jump right into analyzing and querying to perform calculations, produce summaries, and manipulate data across many different formats and locations.

data.world supports SQL and SPARQL query languages. While data.world is built on semantic web technologies that are best queried using SPARQL, we recognize that most people are more familiar with SQL. For that reason, we have developed our own version of SQL to easily query data.world datasets.

  • SQL: If you are new to SQL, we have documentation that will walk you through everything from What is a database to How do I join all these different tables together so I can query them all at the same time. There is also a tutorial based on the exercises in the documentation.

  • SPARQL: SPARQL (pronounced "Sparkle") is a powerful query language to retrieve, modify and make the best use out of linked data. It is recognized as one of the key technologies of the semantic web due to its flexibility as well as ease of joining complex data structures and detecting intricate patterns in data. Its also the query language upon which the data.world platform is based. If you are new to SPARQL and would like to learn more, see our Getting Started with SPARQL tutorial to jump into the basics.

SQL Query - native versus emulated

Whenever possible, data.world translates your query into the proper syntax to run on the target system. There are cases, however, where a query cannot be fully translated. When that happens, part of the query will run natively on the target system, and part will run on data.world as an emulation. As a result, the total query processing time will frequently be greater than if the entire query ran on the target system.

The most up-to-date information about the support for aggregations and functions is in our dataset Function Support Matrix.

If you are comfortable with SQL you can write your own query to search for support for aggregations in a specific system:

query_for_Athena_agg_support.png

Tables for the aggregation and function support for each system Is in the ReferenceAggregation and function support section.

Creating SQL queries

Best practices for optimizing your queries

The federation capabilities and the query optimizer available in the application make it possible to join tables from disparate data sources without having to worry about the source data formats. Use the following guidelines for joins to make your queries more efficient.

  • Be careful when creating joins to ensure that all applicable join conditions are specified.

  • Use inner joins rather than outer joins, where possible.

  • Be wary of joins on calculated conditions, including especially joins which require casting.

  • When possible, group together joined tables that are virtualized from the same sources.

To create a SQL query:

  1. Start creating a query from one of the following places in the application:

    1. On the Project details page, on the Overview tab, click the New Query button.

    2. From the Project workspace, click the Add button and select SQL Query.

    3. On the Datasets details page, on the Overview tab, click the Three dot menu next to a dataset file that allows query creation and click the Query option.

  2. The query editor opens with the Project schema available on the right side for reference. If the project schema is not visible, click the Show sidebar button to open it. Use the Project schema available on the page to refer to more details about the available columns. Click the More info icon to view the details and click the column name or table name to copy it and paste in the Query editor.

    query_view_project_schema.png
  3. Start typing the query in the Query editor. As you type your query, the editor suggests SQL terms--operators, aggregations, and functions--for you. Press Enter to auto-complete the highlighted suggestion in the list and use the Up and Down arrows to choose another option.

    query_type.png
  4. As you type the Function names, you can hover over them to view the available help text for the SQL functions and aggregations.

    query_function_help.png
  5. Once you have written a query, it is best practice to format the query to make it more readable. Use the keyboard shortcut to auto-format the query - on Mac - command + option + L , on Windows - ctrl + alt + L).

    Note

    To access the list of all available shortcuts, use the following shortcut - on Mac -cmd + Option + /, on Windows - ctrl + alt + /

  6. Click the  Run query button to run the query. If the query runs successfully, the results are displayed. If an error occurs, the details of the error are displayed. For details, see troubleshooting query errors.

    query_results_view.png
  7. Click the Save button to save the query. In the Save a copy of this SQL query window, set the following and click the Save button.

    • Name: Provide a name of the query.

    • Description: Provide a brief description.

    • Where will this query be saved: If you started creating a query from a project page, it shows the name of the project. If you started from a dataset page, you get to either save the query to a new project or to the dataset itself.

      Save queries to datasets if you want to clean up the data, join tables and preserve the lineage of the original tables for reference, or just use the query in multiple projects without having to rewrite it. Note that queries saved to a dataset do not show up in the queries list of any project the dataset is used in. Instead they display under the connected dataset when the dataset is opened in the project workspace.

    • Who can see this query: Specify if the query is available to everyone who has access to the project or only the user creating the project. This option is only available while saving queries to projects.

      project_save_query.png
Auto-complete of terms and objects

As you type your query, the editor suggests SQL terms--operators, aggregations, and functions--for you. Hitting 'enter' will autocomplete the highlighted suggestion in the list. Arrow down or up to choose another option:

Image_2019-09-18_at_11.48.00_AM.png
Click-to-copy for columns and tables

You have the option of typing in the table and column names for the data you are querying, but you can also use the click-to-copy feature on the project schema to copy the names for you so you don't have to worry about typos.

Image_2019-09-18_at_2.31.00_PM.png
Pop-up help text

Help text is provided for all SQL functions and aggregations when you hover over the function name:

Image_2019-09-18_at_11.52.24_AM.png
How to auto-format queries

While you can type on one line type your SQL query all on one line--it is perfectly legal and will run--it's good practice to split it up based on clauses and punctuation. The line breaks make it easier to see where you may have forgotten a comma or some other element of the query, and it also allows us to give you better error messages if there are problems with your query.

Here's an example of a query before and after running it through our auto-formatter:

Image_2019-09-18_at_2.04.12_PM.png
Image_2019-09-18_at_2.04.55_PM.png

To auto-format your query, use the keyboard shortcut Cmd + Option + L . If you forget the command there is a dropdown menu to the right of the Run query button which has a link to a list of all the keyboard shortcuts.

Image_2019-09-18_at_1.14.15_PM.png

From the dropdown menu you can also manage your query (rename, copy, etc.) as well as get to our SQL tutorial for additional help.

Image_2019-09-18_at_1.13.59_PM.png
Troubleshooting query errors

To troubleshoot query errors:

  1. If an error occurs while running a query, an error message is shown in context that points the issue and can help you solve the issue. For example, in the following query the error is in line 4. The value won is expected to be in double quotes.

    project_query_error.png
  2. Every time you run a query, query logs are also generated which capture all the stages the query execution goes through and any errors that are encountered. Click the View log link to browse to the query logs. If you need to email data.world support for help with the query, click the Contact support link in the window. A copy of the details of the query execution are automatically added to the email.

    window_view_query_logs.png
Saving queries

When your query is complete, save it for later use by selecting Save to the left of the Run query button. You'll be prompted to name the query, add a description, determine where to save it, and indicate who can see it.

It's good practice to give the query a short descriptive name for both your own later use and for use by others working on your project. Adding a description also makes your query more useable as the description shows up on the resource card returned from using search on data.world.

Saving a query to a dataset

Queries are saved to projects by default. The logic behind this structure is that datasets are for storing files and tables, and projects are for querying and analyzing those files and tables. A dataset is meant to be reused in multiple projects, and if queries are saved to it instead of to the projects using it then the dataset can rapidly fill with irrelevant queries making it difficult to use. However if the queries specific to a project are all stored in that project, the linked dataset remains clean and ready for reuse.

The reasoning above covers 80% of the use cases, but what about the times you really do want to save a query to a dataset? Maybe you want to clean up the data, join tables and preserve the lineage of the original tables for reference, or just use the query in multiple projects without having to rewrite it (you might even want to parameterize it). In those cases it is useful to be able to save your query to the dataset, and you can do that. After running your query, to save it to the dataset select the Save link and click the drop-down link to the right of the + New Project option. In addition to New project you'll also see the name of the dataset. Select it and the query will be saved to the dataset and you'll still be in an untitled, unsaved project:

Image_2019-10-16_at_5.10.12_PM.png

One thing about saving queries to the dataset instead of to the project is that queries saved to a dataset won't show up in the queries list of any project the dataset is used in. Instead they'll be displayed under the connected datasets info:

Image_2019-10-16_at_5.17.38_PM.png
Ways to use a query

Queries against your data are the building blocks for an analysis of the data. We covered the basics of writing your own queries in the article on creating a query. In this article we'll look at all the things you can do with an existing query.

Use other people's queries

As you work in projects with other people, you might find a query that you want to use and edit, but it's not your query so you don't have permissions to save changes to it. The easiest thing to do is to make a copy of the query by selecting the three dots to the right of the query name in the project workspace and selecting Save a copy:

You'll be prompted to name the query and write a description. For tracking lineage it's a good idea to include the name of the original query in the description. Other save options include where--the current project or the original dataset--and who can see the new query (hidden by the location selection in the image below:

Image_2019-10-08_at_2.07.00_PM.png
Copy a query to another dataset or project

Normally the choice would be to save the new query in the current project because then it would show up in the list of all the queries and would be limited to contributors on the project. However if the query would be useful to others who create projects from that dataset, it might make more sense to save it directly to the dataset. Saving it to the dataset enables anyone with access to the dataset to use it (unless you also set it to share with no one). It's also the easiest way to save the query to another project that is linked to the same dataset. One note: queries saved to datasets show up in the Connected datasets section of the Project directory in the left sidebar--not under Queries: Alternatively you could copy and paste the query into a new query window in the other project and save it directly there.

Make a template from a query

Another way you might want to modify a query is to make a template from it so you can change the input strings to various parts of it. For example we could turn the query we just saved as Animal breed query into a query template by selecting the New template link at the top right of the query panel:

Image_2019-10-08_at_3.58.53_PM.png

A dialog window will pop up giving you basic instructions to how query templates work and linking you to the SQL documentation for query templates:

Image_2019-10-18_at_10.38.12_AM.png

Select Add a sample statement a sample of the DECLARE statement used in query templates, a sample comment, and a link to the documentation will be inserted in the top of your query window:

Image_2019-10-08_at_4.05.53_PM.png

With just a couple of modifications to the SQL you can have a query where other users can input values for animal types without ever having to know SQL themselves. Here we moved the comment to the top line so it would show up as a prompt in the input area, and we indicated where in the original query the input value would go (in the WHERE clause):

Image_2019-10-08_at_4.15.41_PM.png

You can preview the query to see what other users will see and to test the query yourself by selecting the Preview query template link in the upper right corner of the query panel. Then fill in a value and hit Run query:

Image_2019-10-08_at_4.21.45_PM.png

For more detailed information on query templates see the article on using query templates.

Download query results

Another way to use a query is to download the results from it in either a csv or an xlsx file. After running your query, click the Download link below the query window and select the option you want:

Image_2019-10-17_at_11.01.42_AM.png
Save to project or dataset

Another option for working with your query results is to save them as a tabular file in your current project, a different project, in an existing dataset, or in a new dataset:

Image_2019-10-17_at_11.11.21_AM.png

For more information on why this feature is important, see the article Save the results of a query as a new table.

Copy URL or embed code

Two other ways you can use the results of a query are to make them available for download from a URL, or to embed them on a Markdown page. Both options are found in the Download menu under Copy URL or embed code. The URL is a download link for a csv file of the results and the embed code option renders the results table in a Markdown page.

Work with third-party apps

There are many applications you can use to analyze and display the results of your queries, and we make it easy to work with your favorites. You can find out about the applications we integrate with on our integrations page. If you don't have any integrations configured yet you can still make visualizations of your results with our built-in Chart Builder app:

Image_2019-10-18_at_9.03.33_AM.png

For instructions on how to use Chart Builder see one of the following articles on it:

If you already have an integration configured that you want to use or want to integrate a new application, the name of the first application in your list of integrations will show up as a menu to the right of the Download button:

Image_2019-10-18_at_9.06.22_AM.png

You can find out what integrations we currently offer and how to use them on our Integrations page.

Save the results of a query as a new table

When you run a query in data.world, one of the things you can do with the results is to save them as a new table in a dataset or project. The main reasons you might want to save your query as a new data file are security and ease of analysis. Saving the query allows you to:

  • Present a subset of the data from the original file

  • Reorder the data

  • Eliminate the need for joins or subqueries in future queries

  • Work with columns as rows and vice versa without having to include either a PIVOT or an UNPIVOT clause in your queries

Present a subset of the data

When data files are streamed into data.world they include all the columns in the original file. Sometimes there are columns which are irrelevant to your analysis or which contain substandard or incomplete data. Some columns might contain sensitive information that can't be shared outside of the original audience of the dataset but also have data that would be useful to a wider group. While you could manually clean up the data by downloading, editing, and re-uploading it, you would lose the ability for it to automatically update from the original source if that is how the initial data was set up in data.world. However you can both clean the data and maintain automatic updates by running a query against the original data file and saving the results to another dataset or project.

For example, the data file USDA Honey Census data in the dataset USDA bee colony inventories and loss is automatically brought in and synced daily through a URL. Because the original dataset contains information on much more than bees and honey, there are many columns in the table that contain either identical data for all the rows or no data at all.

A quick look at the data pulled in from the URL looks like this:

Screen_Shot_2019-03-14_at_12.29.05_PM.png

All columns displayed above except the second one contain duplicate data. Scrolling through the rest of the columns reveals eight more which have either no values or all the same value. To make this data more useful we can run the following query:

Screen_Shot_2019-03-14_at_1.23.37_PM.png

and save it to another dataset (we could also have saved it to the current dataset). To save the query select Save to dataset or project from the Download link:

Screen_Shot_2019-05-22_at_4.31.19_PM.png

There you will be prompted to either select a dataset or project from your recently viewed items , type the name of a different dataset or project and the system will autofill items for you, or save the data as a new dataset:

Image_2019-10-04_at_11.59.32_AM.png

When saving the query results as a table in a brand new dataset, you create what is called a derviative dataset. The new dataset updates from the original dataset which updates from the source, but it only pulls in the data you want to see:

Screen_Shot_2019-03-14_at_1.44.34_PM.png

To keep the file up-to-date you can either manually sync it (using the Sync now button on the right of the screen shown above), or by going to the dataset/project overview page and setting the Autosync on:

Screen_Shot_2019-03-14_at_2.01.27_PM.png

Note that you have the option upon saving to either allow the query underlying the table to update based on changes to the original table or not. If you allow the query to update then changes to the original table will be reflected in your results (e.g., column name changes, columns added, columns deleted, etc.).

Reorder the data

When you use the SQL SELECT * clause to return data from a dataset, the columns in the resulting table are presented in the order in which they are found in the original data source. If you would always like to see them in a different order, you can accomplish this by running a query to order them as you would like to see them and then saving the query as a table in the current dataset. This query would be similar to the one shown above for presenting a subset of the data and the steps to save it would be the same.

Eliminate the need for joins or subqueries

Sometimes all the data you need to analyze is not stored in one table. While you could still access it using a join or a subquery, it's easier and more convenient to be able to query one table. Another reason for saving the data into one table is that queries against large amounts of data run faster if there is no subquery. Finally, it makes it easier for the members of your team who are not well-versed in SQL to do their own analyses of the data.

For this example we'll use the dataset SQL CRM Example Data. With the way the dataset is structured we can see a lot of information about what’s in the pipeline for sales agents (in the table sales_pipeline), but we can’t easily see the connection between their managers (information stored in the table sales-teams) and the pipeline. To see which managers had the most closed deals and for which accounts, we could run a query like this:

Screen_Shot_2019-03-25_at_5.53.19_PM.png

And that's nice, but if someone else wanted different information the query would have to be modified and the person running the query would have to be comfortable enough with SQL to know how to do it. Far easier would be to to write another query with the WITH clause and to create a new table from it. Here is an example of such a query:

Screen_Shot_2019-03-25_at_5.35.38_PM.png

The results would look like this:

Screen_Shot_2019-03-25_at_5.50.37_PM.png

By saving the query as a new table it could be queried directly without having to use the join. The first query in this example would now look like this:

Screen_Shot_2019-05-15_at_11.08.38_AM.png
Work with columns as rows and vice versa

SQL has two clauses that allow you to rearrange the data in a table: PIVOT and UNPIVOT. PIVOT lets you see rows as columns, and UNPIVOT shows columns as rows. These are both complicated clauses and do not allow certain other clauses to follow them (e.g., WHERE, GROUP BY, and HAVING). if you find yourself using PIVOT or UNPIVOT frequently, or if you'd like to be able to use one of the incompatible clauses, it would be far better to write the query using PIVOT or UNPIVOT once and then save the resulting table for further analysis. See our SQL documentation on PIVOT and UNPIVOT for more information.

About query templates

A query template creates a graphical interface that simplifies running a query with different parameters. This allows an end-user to easily change the values that a query uses without needing to access or understand the underlying SQL or SPARQL code.

As an example, let us take a look at the Data Professional Salary Survey and examine the most popular databases from that survey in 2018. To do that, you could use the following SQL query.

Screen_Shot_2019-07-18_at_3.38.20_PM.png

For those with a bit of SQL knowledge, modifying this query to get the results for the year 2017 is trivial. One would just need to change the year value from 2018 to 2017. But when a query involves JOINs, UNIONs, nested queries, and aliases, the number of lines in the query could go from 5 lines to 50 or more and finding the desired value buried in the query becomes far more challenging.

Creating a query template makes it easy for end users to access the information without worrying about editing a SQL or SPARQL query.

query-template-02.png
Creating query templates

Creating query templates is possible for both SQL and SPARQL queries. In both cases, you need to create a variable to turn a standard query into a parameterized query (though creating a such a variable requires a different process in each). Then it is just one more click to create that graphical query template form.

Creating SQL query templates

The steps in this section are explained using a sample dataset and query. This example only uses one variable but you can extend your query to use multiple variables making your query templates more roburst.

Steps include: Create a query, declare a variable, use the variable in the query, add comments to give instructions on using the template, and save the template.

To create a query template:

  1. Create a new SQL query.

    Screen_Shot_2019-07-18_at_3.38.20_PM.png
  2. Click the New template button in the query editor. It opens the Create a query template window with sample code. Click the Add a sample statement button to add the sample code in the query.

    window_create_query_template.png
  3. Modify the variable for use in the query or create your own variable. In this example, year is the variable name, INTEGER is the data type, and 2018 is the optional default value. If no default value is specified and the query is run without entering a value manually, the query will return an error. Allowed data types are: STRING, INTEGER, DECIMAL, BOOLEAN, DATE, TIME, and DATETIME.

    query-template-03.png
  4. Use the variable in the query. Choose where the variable is to be used in the query. This will often be in the WHERE clause of the query , although it can also be used in ORDER BY, LIMIT, or other clauses. In our example, we'll use it in the WHERE clause.

    Note

    This example is a bit complicated because the survey_year has the data type YEAR in the database, but we cannot declare a variable with that type. Instead, we declare our variable as a INTEGER and then used the CAST function to convert the value of survey_year from YEAR to INTEGER to make the comparison on Line 5.

    query-template-04.png
  5. Add comments above the Declare statement to provide instructions and present valid input values. Denote comments by adding the # or -- characters before the text. Comments added before the DECLARE statement display in the final query template form.

    query-template-05.png
  6. Click the Preview query template button to view and use your query template.

    query-template-06.png
  7. Click Save to save the query template in a project.

  8. Any time you want to make changes to the query or the template, click the Edit query link.

Using the variable in the query

Now that the variable exists, choose where the variable will be used in the query - this will often be in the WHERE clause of the query - though it could also appear in an ORDER BY, LIMIT, or other clause . In our example, we'll use it in the WHERE clause on line 5:

query-template-04.png

Note: This example is a bit complicated because the survey_year has the data type YEAR in our database, but we can't declare a variable with that type. Instead, we declare our variable as a INTEGER and then used the CAST function to convert the value of survey_year from YEAR to INTEGER to make the comparison on Line 5.

Adding Comments

Comments added before the DECLARE statement will display in the final query template form. The creator of the query can provide instructions and present valid input values.

Denote comments by adding the # or -- characters before the desired text:

query-template-05.png
Creating the query template

After declaring and referencing the variable(s), click the "Preview query template" link that now shows above the query editor:

query-template-07.png

This will create the graphical form where an end-user can enter the desired value for the variable. We'll now see that friendly form originally shown in the Overview section:

query-template-06.png

Creating this query template will lock the query from further edits. To unlock it to make a change, just click the Edit query link above. If another user owns the query, then you will need to save a copy of it after making any edits.

That completes SQL query template - but keep in mind that you can create additional variables to make a more powerful, flexible query.

Creating SPARQL query templates

Steps include: Create a query, declare a variable, use the variable in the query, add comments to give instructions on using the template, and save the template.

To create a template for SPARQL query:

  1. Create a SPARQL query. Check out the SPARQL tutorial to learn about how to write SPARQL queries.

    query-template-08.png
  2. Declare a variable in the query. The SPARQL query contains atleast one variable by default. To turn that into a parameter that can be used in a query template, change the character in front of the variable name from to $. Make sure to perform this change for every occurrence of the variable within the query. Alternatively, declare a new variable by starting its name with with  the character.

    In our example, we will create a new variable $year_of_interest and replacing the value 2018 in the FILTER function.

    query-template-09.png
  3. To add a default value for a variable, add a conditional BIND statement on the first line of the WHERE clause.

    The general format of that statement is:

    BIND(IF(BOUND($variable), $variable, DEFAULT_VALUE) AS $variable)

    In our example, add a new line on line 7:

    BIND(IF(BOUND($year_of_interest), $year_of_interest, 2018) AS $year_of_interest)

    query-template-10.png

    Although this sets the default value, that value is not shown in the textbox in the query template form. Even though the value is not visible, the query will now use 2018 as the default year until another value is entered.

    Note

    Since the query template does not show any default values, it is recommended you add this information in the comment for the query so that users using the template are aware of it.

  4. Add comments to the query by using the # character. The query template shows all comments before the first SPARQL statement (typically a PREFIX statement).

    query-template-11.png
  5. Click the Preview query template link above to create the template.

    Note

    Note that the Preview query template link appears only when the SPARQL query is created properly with the variables required to create a template.

    query-template-12.png
  6. Click Save to save the query template in a project.

  7. Any time you want to make changes to the query or the template, click the Edit query link.

Declaring a variable

Due to the syntax of SPARQL, a query will contain at least one variable by default.

To turn that into a parameter that can be used in a query template, change the character in front of the variable name from ? to $. Make sure to perform this change for every occurrence of the variable within the query.

A new variable can also be declared by starting its name with with $ character.

In our query, we'll do the latter by creating a new variable $year_of_interest and replacing the value 2018 in the FILTER function on line 9.

query-template-09.png
Setting default values

In the form we just created, you may have noticed we did not see a default value populate.

Setting an optional default value for a variable is more complicated in SPARQL than SQL. To do that, we need to add a conditional BIND statement on the first line of the WHERE clause. The general format of that statement is:

BIND(IF(BOUND($variable), $variable, <DEFAULT_VALUE>) AS $variable)

In our case, let's go back into the editor and add a new line on line 7:

BIND(IF(BOUND($year_of_interest), $year_of_interest, 2018) AS $year_of_interest)
query-template-10.png

Although this will set the default value, that value will not be shown in the textbox in the query template form as it does in SQL. Instead it will be hidden from the end-user. Even though that value is not visible, the query will now use 2018 as the default year until another value is entered.

Adding comments

Comments can be added to a SPARQL query template as well. Only the # character (and not the -- characters that are available in SQL) can be used to specify a comment. The query template will show all comments before the first SPARQL statement (typically a PREFIX statement).

Since the query template won't show any default values, making a note of such values in the comments is a great idea.

query-template-11.png
Creating the Query Template

Clicking the Preview query template link above once more will generate our final template:

query-template-12.png

Just like the SQL version, this query will be locked from edits until the "Edit Query" link above is clicked.

API

Parameterized queries can also be made through the data.world public API. Note that non-parameterized queries can be executed with a GET request to the same endpoint, but in order to specify the parameters, a POST request must be made with the parameters specified in the body of the request.

Example curl command:

curl -X POST \

https://api.data.world/v0/queries/${DW_QUERY_ID}/results \

-H 'Accept: application/json' \

-H 'Authorization: Bearer ${DW_API_TOKEN}' \

-H 'Content-Type: application/json' \

-d '{

"parameters" : {

"parameterName1" : "parameter value 1",

"parameterName2" : "parameter value 2" }

}'

See the API docs for more information.

Sharing

Queries that are executed with a new parameter value may be shared with others by simply copying the URL and sending it to another user. If parameters have been filled in, then these parameters will be encoded into the URL for sharing.

SQL queries - native vs emulated

Whenever possible, data.world will translate your query into the proper syntax to run on the target system. There are cases, however, where a query cannot be fully translated. When that happens, part of the query will run natively on the target system, and part will run on data.world as an emulation. As a result, the total query processing time will frequently be greater than if the entire query ran on the target system.

The most up-to-date information about the support for aggregations and functions is in our dataset Function Support Matrix.

function_support_matrix.png

Select Explore the dataset in the upper right of the screen and you'll be taken to a project workspace for the dataset. There, if you select Aggregations pivoted you see columns with the aggregations, the urls to our support docs for them, and the support status for each system:

agg_support_pivot_query.png

If you are comfortable with SQL you can write your own query to search for support for aggregations in a specific system:

query_for_Athena_agg_support.png

Tables for the aggregation and function support for each system Is in the Reference section.