Skip to main content

Query your data

Note

This tutorial is part of the basic tutorial series for the data.world platform. See the article overview of basic tutorials for more information.

In this tutorial we introduce writing SQL queries in the query panel of the workspace. A query in the context of data is a request for information written in a query language like SQL or SPARQL. Queries use a specified set of terms in a specific order with prescribed syntax. In addition to this tutorial we have many resources to help you learn about about querying on data.world including articles on query basics, working with existing queries, and using query templates. We also have complete documentation for SQL and SPARQL -- the query languages used on data.world.

Objectives

After working through the tutorial you should be able to:

  • Tell the difference between datasets and projects

  • Create a SQL query

  • Use auto-complete of terms and objects in your query

  • Use click-to-copy for columns and tables

  • Understand the pop-up help text in queries

  • Auto-format queries

  • Troubleshoot queries

  • Save queries

To complete this tutorial you need to have:

  • A data.world login (available for free here if you don't have one).

  • Your own tutorial project (you must create this yourself--it cannot be downloaded)

  • The Bee Colony Statistics dataset linked to your project

If you need help creating the project or linking the dataset to it, detailed instructions are in the tutorial Create a project to work with data.

Introduction

Note

If you prefer to go straight to the exercises, click here.

Where are queries stored

As was discussed in the Intro to the data.world tutorial, a dataset is where data is stored for use in projects. It contains only the basic resources that pertain to the majority of the projects that use it. Queries are generally specific to a project as they pull only the data from the dataset which is needed for analysis for that project. As such, queries are stored, with rare exceptions, in a project. See our overview of the project workspace for more information.

How to create a query

To write a SQL query against data in a project go to the upper left corner of the project workspace and select SQL Query from the + Add menu:

Screen_Shot_2020-02-02_at_12.41.18_PM.png

When the query editor comes up in the center panel of the screen, the Project Schema also comes up in the right side panel. If the project schema isn't showing, select the arrow to the right of the Run query button to expand the sidebar:

Screen_Shot_2020-02-02_at_12.43.49_PM.png

Auto-complete of terms and objects

Click next in the query editor window to begin entering your query text. As you type your query, the editor suggests SQL terms--operators, aggregations, and functions--for you. Arrow down or up to choose another option:

Auto-complete.png

Hitting 'enter' will auto-complete the highlighted suggestion in the list.

Click-to-copy for columns and tables

You have the option of typing in the table and column names for the data you are querying, but you can also use the click-to-copy feature on the project schema to copy the names for you so you don't have to worry about typos.

Screen_Shot_2020-02-02_at_12.47.38_PM.png

Pop-up help text

Help text is provided for all SQL functions and aggregations when you hover over the function name:

Screen_Shot_2020-02-02_at_1.21.09_PM.png

Auto-formatting queries

While you can type your SQL query all on one line--it is perfectly legal and will run--it's good practice to split it up based on clauses and punctuation. The line breaks make it easier to see where you may have forgotten a comma or some other element of the query, and it also allows us to give you better error messages if there are problems with your query.

Here's an example of a query before and after running it through our auto-formatter:

Screen_Shot_2020-02-02_at_1.42.45_PM.png
Screen_Shot_2020-02-02_at_1.43.37_PM.png

To auto-format your query, use the keyboard shortcut Cmd + Option + L . If you forget the command there is a dropdown menu to the right of the Run query button which has a link to a list of all the keyboard shortcuts.

auto-format_1.png

From the dropdown menu you can also manage your query (rename, copy, etc.) as well as get to our SQL tutorial for additional help.

auto-format_2.png

The article on query editor shortcuts also contains a full list of the shortcuts for the query editor.

Troubleshooting queries

If you try to auto-format your query and it doesn't work, that's a sign that you have an error in the query. If you run it you'll get an error message telling you what the problem is and where to find it:

Screen_Shot_2020-02-02_at_2.09.32_PM.png

In the above example the query was written to search for the number of colonies recorded by year just for the state of Montana. However the way the query is written it looks like it is searching for records where the value in the bee_colony_census_data_by_state.state column matches the value in another column named Montana. For Montana to be recognized as a value, it has to be surrounded by double-quotes. The error message returned from running the query tells you that the problem is in the 3rd line and the 47th character position (column). The 47th character should have been the opening ":

Screen_Shot_2020-02-02_at_2.09.56_PM.png

Saving queries

When your query is complete, save it for later use by selecting Save to the left of the Run query button. You'll be prompted to name the query, add a description, determine where to save it, and indicate who can see it:

saving_queries.png

It's good practice to give the query a short descriptive name for both your own later use and for use by others working on your project. Adding a description also makes your query more usable as the description shows up on the resource card returned from using search on data.world.

Exercises

  • Open your tutorial project.

  • Select SQL query from the +Add button in the upper left.

  • Write the following query in the query editor pane and run it:

SELECT *
FROM bee_colony_census_data_by_county
WHERE bee_colony_census_data_by_county.state = "MONTANA"
  • Save the query to your project

Conclusion

There is much more to learn about queries and query languages than will fit into one tutorial, and this one just scratches the surface of how to write and save queries. If you're conversant with SQL or SPARQL, your best next step is use your project to practice writing queries on data.world. If you need more information, both our SQL and SPARQL documentation are great resources. If you don't know either SQL or SPARQL and want to query data.world datasets we recommend you work through our SQL tutorial. More information about queries and how to use them is covered in the tutorial Advanced work with queries.

References