Docs portal

Work with data: Projects

What is a project?

Projects bring datasets together with documentation and analysis. This is where work and collaboration happen. A project, as the name implies, likely has a beginning and an end. Data in it is shared and analyzed, and insights are derived from the analysis and written up in the project.

The biggest difference between a dataset and a project is that datasets can be linked to and included in projects, but projects cannot be linked to or included in other projects or datasets--nor can the files that are added directly to a project. With a project you can run queries against the data, analyze it, share it and create charts and visualizations from it. However if you decide to start right away with a project and add your data files to it, neither you nor anyone else can link those data files to another project. The only way to reuse the data in another project is to download it file by file and re-upload it into a dataset or directly into another project. While there are times you'll want to download and re-upload files instead of just linking to them, you won't have a choice if you start by adding new data files directly to a project. One disadvantage to re-uploading is that you have to recreate all the metadata for the files (descriptions and the data.dictionary) which is a very cumbersome process!

Start with a question or task

Through hundreds of interviews with people who work with data, we have found that most work stems from a question rather than a particular set of data. It's the question that drives the search for relevant data, generating insights, and presenting reproducible findings, yet there hasn't been a great way to keep all your work in one place--not to mention collaborate easily with others on the project.

That's what we hope to address with Projects on data.world. Projects help you to capture and share the most important aspects of your work as the project unfolds, even across multiple datasets, from question to conclusion.

With data projects you can:

Check out some of our favorite projects below, or jump straight into creating your own!

Once you've explored projects, we'd love to hear from you! Please contact us with any questions or feedback, or join our Slack community to connect with the data.world team and it's members.

Page layout

When you land on a project workspace you can tell what's showing in the main area of the screen by what's highlighted in the Project directory in the left sidebar:

Image_2019-09-09_at_1.14.02_PM.png

The project workspace has five main parts:

  • The left sidebar is the Project directory

  • Underneath the header in the center are the tabs and actions for the current object

  • Beneath the tabs In the center is the object viewer

  • In the right sidebar (when visible) on the top is the About section with information about the currently selected object

  • For some objects, the bottom of the right sidebar contains the Project schema

Image_2019-09-16_at_10.47.02_AM.png
Project Directory
Image_2019-09-09_at_1.19.13_PM.png

The Project directory is the navigation area of the workspace. At the top is the + Add dropdown which is where you can add files to the project, link datasets to it, add posts or insights, and add new SQL or SPARQL queries.

Screen_Shot_2019-08-15_at_10.09.58_PM.png

Below the + Add button are a link to the Home tab, and the main project files (Project summary and Data dictionary). Information about the project summary and the data dictionary can be found in the articles Description and Summary, and Data dictionary.

The next section of the Project directory is for project files. Project files are any data resources uploaded or saved directly to the Project--not to a dataset.

Below the project files are connected datasets. The datasets used in this project may also be used in and linked to other projects. When there are changes to the underlying dataset, all projects using the dataset also update. All files and queries associated with a dataset are linked to the project and can be used in it:

Screen_Shot_2019-09-09_at_2.12.24_PM.png

More information on how to create a dataset is in the article on creating a dataset.

The last two sections of the project directory, queries and insights, are also specific to the project. The Queries section contains all the stored queries in the project. Learn how to write and use queries in Query data of the . We also have a SQL tutorial and a SPARQL tutorial to help you learn or improve your query language skills.

Insights are findings, conclusions, or interesting points for discussion about your project. They allow you to capture conclusions from your work, packaging them up in a way that quickly communicates a nugget of information, while giving the viewer the tools they need to dig down into your methods and sources. See Posting insights for details on insights.

Tabs and actions

Screen_Shot_2019-08-16_at_9.21.28_AM.pngWhenever you select an object or the + Add button from the Project directory, a new tab opens up under the header bar with a preview of that object. Underneath the tabs are an icon indicating what type of object is shown (the name of the object shown and to the right of the name are any actions that can be taken with that object. and whether it is shared with all users of the project or private only to you. Further to the right is a link to create a new query template, or parameterized query. You can learn more about parameterized queries from the article Using query templates. The last arrow icon is used to either expand or collapse the about and project schema panel. For more information about query-specific actions, see the article Query data.

Object viewer

The central are of the workspace is devoted to to viewing or interacting with the object selected in the project directory. The viewer renders previews of most file types (for a complete list of files supported by preview see the article Supported file types. When the object is a query, the viewer is split into two parts: the query editor at the top and the query results at the bottom. The query editor is where you compose and run queries against your data. In addition to typing in your text, data.world's query editor also provides auto-complete and auto-format of commands, columns and and tables:

Screen_Shot_2019-08-15_at_1.16.45_PM.png

The bottom center of the screen is used to display results of queries or error messages when queries are not written correctly:

Screen_Shot_2019-08-15_at_1.25.58_PM.png
About

In the upper right corner of the project workspace is a pane with the information about the object highlighted in the project directory. It is used for queries and files to give you more information about them that might be relevant for their use:

Image_2019-09-18_at_9.27.52_AM.png
Project schema

The project schema shows up at the bottom of the right sidebar whenever the object displayed in the main section is a query. It contains a list of the queryable entities in the project (datasets, tables, and columns). Items can be expanded or collapsed, and selecting either a column or table copies its name to the clipboard for easily pasting into the query editor:

Screen_Shot_2019-08-15_at_1.58.54_PM.png

Sharing within an organization

Suppose you are part of an organization on data.world and would like to create a new project or dataset. You can choose to create the resource in your own account or within the organization's account.

In both cases, you will maintain control over the dataset or project. You can invite contributors, change visibility from private to public, and edit it to your heart's content.

Individuals can share a dataset with an organization exactly the same way they share with other individuals. Just go to the Contributors tab on a dataset or People tab on a project, and use the Invite button to add the organization.

Once added, the organization's administrators will receive a notification they can accept or reject. If they accept the invitation, the dataset will then be shared across the organization's top-level members using the same permissions the organization received.

When you create a dataset or project you can choose whether to share it with the entire public, just an organization, or with no one at all.

Grant_access_1.png

To change the permission levels later, go to the Settings tab from the overview page and select Access and ownership.

Grant_access_1.png

From here you can grant or remove access to users by username, full name, or email address. Organizations can also be added using the organization name:

Grant_access_2.png

Documenting your project

Once you have created a project and added your files to it, you can make it easier to find and more useful to others by describing, or documenting it. Documenting consists of creating the metadata for your dataset or project and helps others to trust your data and work.  Searches on data.world also look at titles, descriptions, summary, and tags to match search strings so the more completely you describe your data the more chance it has of being found.

Posting insights

When you have findings, conclusions, or interesting points for discussion about your project, you can create an Insight to display them prominently and invite discussion with other contributors to the project. Insights allow you to capture the conclusions from your work and present them in a way that quickly communicates a nugget of information, while giving the viewer the tools they need to dig down into your methods and sources. Insights balance efficiency of communication with reproducibility--two concepts that are often at odds in this phase of data work.

You can use insights to capture the results and analysis of your work and synthesize them so they are understandable and accessible to stakeholders at all levels in the project. Insights can be created by the project owner and any contributors, and are the first thing that displays on the main overview page. They also have their own tab in the project.

There are several places you can add an insight to your project:

  • On the Overview tab with the New insight button

Screen_Shot_2019-08-16_at_10.47.46_AM.png
  • On the Insights tab with the Add a new insight button

Screen_Shot_2019-08-16_at_10.49.16_AM.png
  • From the project workspace after clicking the + Add button

Screen_Shot_2019-08-16_at_10.50.45_AM.png

NOTE: Insight titles are a main search field in data.world. Make sure you use a descriptive title so it can be easily found with search. To find out more about how search works see the article Using Search.Advanced search

When you create a new Insight, you will enter the Simple Editor - the same editor used for dataset and project summaries. You can switch to Markdown from the link next to the Done button if you prefer. For more information on using the Simple Editor, see:

For more information on Markdown see our Markdown syntax reference.

Citing data

When citing and sharing data found on data.world, please cite the original data source and URL where you've retrieved the data from data.world.

A dataset citation should include the same components that any other citation would include:

  • author

  • title

  • year of publication

  • publisher (for data this is often where it is housed, i.e. data.world in this case)

  • edition or version

  • access information (a URL or other persistent identifier, i.e. the dataset URL)

It is very important to note that some datasets may have special instructions on how to cite or use their data. Be sure to check the dataset summary for any additional requirements or guidelines.

To learn more about licensing, see:

How to cite using the APA Style Guide.

Format:

Author/Rightsholder. (Year). Title of data set (Version number) [Description of form]. Retrieved from http://data.world/[accountname]/[dataset]

Example - Citing the Federal government awards in Q2 published by the Treasury Department’s account @usaspending

USAspending.gov. (2017). Federal government awards in Q2. Retrieved from https://data.world/usaspending/federal-government-awards-in-q-2.

Example - Citing Ride Austin’s ride data

Ride Austin. (2017). Ride-Austin-june6-april13 [Data file and code book]. Reitrieved from https://data.world/ride-austin/ride-austin-june-6-april-13

Citation Resources