Community docs

What is a project?

Projects bring datasets together with documentation and analysis. This is where work and collaboration happen. A project, as the name implies, likely has a beginning and an end. Data in it is shared and analyzed, and insights are derived from the analysis and written up in the project.

The biggest difference between a dataset and a project is that datasets can be linked to and included in projects, but projects cannot be linked to or included in other projects or datasets--nor can the files that are added directly to a project. With a project you can run queries against the data, analyze it, share it and create charts and visualizations from it. However if you decide to start right away with a project and add your data files to it, neither you nor anyone else can link those data files to another project. The only way to reuse the data in another project is to download it file by file and re-upload it into a dataset or directly into another project. While there are times you'll want to download and re-upload files instead of just linking to them, you won't have a choice if you start by adding new data files directly to a project. One disadvantage to re-uploading is that you have to recreate all the metadata for the files (descriptions and the data.dictionary) which is a very cumbersome process!

Start with a question or task

Through hundreds of interviews with people who work with data, we have found that most work stems from a question rather than a particular set of data. It's the question that drives the search for relevant data, generating insights, and presenting reproducible findings, yet there hasn't been a great way to keep all your work in one place--not to mention collaborate easily with others on the project.

That's what we hope to address with Projects on data.world. Projects help you to capture and share the most important aspects of your work as the project unfolds, even across multiple datasets, from question to conclusion.

With data projects you can:

Check out some of our favorite projects below, or jump straight into creating your own!

Once you've explored projects, we'd love to hear from you! Please contact us with any questions or feedback, or join our Slack community to connect with the data.world team and it's members.

Page layout

When you land on a project workspace you can tell what's showing in the main area of the screen by what's highlighted in the Project directory in the left sidebar:

Image_2019-09-09_at_1.14.02_PM.png

The project workspace has five main parts:

  • The left sidebar is the Project directory

  • Underneath the header in the center are the tabs and actions for the current object

  • Beneath the tabs In the center is the object viewer

  • In the right sidebar (when visible) on the top is the About section with information about the currently selected object

  • For some objects, the bottom of the right sidebar contains the Project schema

Image_2019-09-16_at_10.47.02_AM.png

Project Directory

Image_2019-09-09_at_1.19.13_PM.png

The Project directory is the navigation area of the workspace. At the top is the + Add dropdown which is where you can add files to the project, link datasets to it, add posts or insights, and add new SQL or SPARQL queries.

Screen_Shot_2019-08-15_at_10.09.58_PM.png

Below the + Add button are a link to the Home tab, and the main project files (Project summary and Data dictionary). Information about the project summary and the data dictionary can be found in the articles Description and Summary, and Data dictionary.

The next section of the Project directory is for project files. Project files are any data resources uploaded or saved directly to the Project--not to a dataset.

Below the project files are connected datasets. The datasets used in this project may also be used in and linked to other projects. When there are changes to the underlying dataset, all projects using the dataset also update. All files and queries associated with a dataset are linked to the project and can be used in it:

Screen_Shot_2019-09-09_at_2.12.24_PM.png

More information on how to create a dataset is in the article on creating a dataset.

The last two sections of the project directory, queries and insights, are also specific to the project. The Queries section contains all the stored queries in the project. Learn how to write and use queries in Query data of the . We also have a SQL tutorial and a SPARQL tutorial to help you learn or improve your query language skills.

Insights are findings, conclusions, or interesting points for discussion about your project. They allow you to capture conclusions from your work, packaging them up in a way that quickly communicates a nugget of information, while giving the viewer the tools they need to dig down into your methods and sources. See Posting insights for details on insights.

Tabs and actions

Screen_Shot_2019-08-16_at_9.21.28_AM.pngWhenever you select an object or the + Add button from the Project directory, a new tab opens up under the header bar with a preview of that object. Underneath the tabs are an icon indicating what type of object is shown (the name of the object shown and to the right of the name are any actions that can be taken with that object. and whether it is shared with all users of the project or private only to you. Further to the right is a link to create a new query template, or parameterized query. You can learn more about parameterized queries from the article Using query templates. The last arrow icon is used to either expand or collapse the about and project schema panel. For more information about query-specific actions, see the article Query data.

Object viewer

The central are of the workspace is devoted to to viewing or interacting with the object selected in the project directory. The viewer renders previews of most file types (for a complete list of files supported by preview see the article Supported file types. When the object is a query, the viewer is split into two parts: the query editor at the top and the query results at the bottom. The query editor is where you compose and run queries against your data. In addition to typing in your text, data.world's query editor also provides auto-complete and auto-format of commands, columns and and tables:

Screen_Shot_2019-08-15_at_1.16.45_PM.png

The bottom center of the screen is used to display results of queries or error messages when queries are not written correctly:

Screen_Shot_2019-08-15_at_1.25.58_PM.png

About

In the upper right corner of the project workspace is a pane with the information about the object highlighted in the project directory. It is used for queries and files to give you more information about them that might be relevant for their use:

Image_2019-09-18_at_9.27.52_AM.png

Project schema

The project schema shows up at the bottom of the right sidebar whenever the object displayed in the main section is a query. It contains a list of the queryable entities in the project (datasets, tables, and columns). Items can be expanded or collapsed, and selecting either a column or table copies its name to the clipboard for easily pasting into the query editor:

Screen_Shot_2019-08-15_at_1.58.54_PM.png