Enterprise docs

Start a data project

In data.world, projects are where all querying, analysis and discussion of data takes place. Data in different datasets can be used for many different projects, but each project contains all and only the data that is relevant for that project. The information in a project can come from datasets, files attached directly to the project, insights written by the project's team members about the data and the project, and discussions about the project.

The idea behind a project is that all work and communication about that project done by any member of the project team is stored in the project, always accessible by all team members, and always the most up-to-date information.

To create a project, click on the + New icon in the top right of your menu bar and select New project.

Screen_Shot_2019-04-15_at_10.46.46_AM.png

On the Create a new project window you are prompted to select the owner of the project, the project name, and who can see the project. By default, if you are in an organization (also called a team), your organization is set as the owner of the project. You can choose another organization from the dropdown if you are in more than one or you can switch to personal account if you wish to be the owner as well as the creator of the project. Permission options to the project are either no one, the other members of your team (if the team is set to be the owner), or public to the data.world community:

Screen_Shot_2019-04-15_at_10.48.49_AM.png

Once your project has been created you are prompted to add an objective for it and given the opportunity to drag and drop files to the project or connect to a data source:

Screen_Shot_2019-04-15_at_11.04.14_AM.png

To link existing datasets to your project select Add data and you'll be taken to the Add data from anywhere dialog:

Image_2019-10-09_at_9.33.56_AM.png

After you have selected Link a data.world dataset you can search for the dataset you want to add using the search bar. or you can scroll through your datasets, your bookmarked datasets, and community results:

Image_2019-10-09_at_9.40.09_AM.png

More information about adding data to your project can be found in the article Connect data to your project.

If you have data that might be used on other data projects, we recommend adding the raw data to a dataset and then linking that data to a project where you will do further analysis. This will allow you to link and access the dataset from multiple projects without having to import multiple copies of the data. For more information on making a new dataset, see Create a datasetCreate a dataset

To create a new untitled project and go straight to the project workspace, you can click on the Explore this dataset button from the dataset overview page.

If you'd like to create a new project based off the dataset and go through the initial project creation steps (e.g. giving the project a name and permission level and optionally adding a description and other data), then click on the dropdown arrow on the right of the Explore this dataset button and choose Create a new project.

You can also connect the dataset to an existing project from this overview page. To do that, click on the dropdown arrow on the right of the Explore this dataset button and choose Connect to existing projects to select from a list of available projects.

connect-data-to-your-project-02.png

You can add data to a project at any time - either when creating it or at a later time - in a few ways.

Linking datasets to an existing project

From the project overview page, click the Connect Datasets link within How do projects work? section, if available. Or from below the project description and summary section, choose Add data then data.world dataset.

connect-data-04.png

You can also link datasets from the project workspace. From within the workspace, click the Add > Dataset buttonon the top left. Or from the home tab, click the Drag and drop, upload files or connect to a data source box.

connect-data-05.png
Linking existing datasets

You can link datasets to a project when creating the project or later if you require additional data. For projects that already exist, you can link datasets from the project overview page and the project workspace.

Linking datasets during project creation

When creating a new project, you can click on the big Add Data prompt to connect any sort of data - including a linked dataset.

connect-data-01.png

You will then have the option to Link a data.world dataset in the new window that opens:

connect-data-02.png

After selecting that option, you can use the search bar and the tabs below to find your datasets in your resources, among your bookmarks, or from the data.world community at large. Click on the question mark icon for some hints or see Using search for advanced search tips.Advanced search

connect-data-07.png

Simply click the Link button to the right of the dataset you would like to add. You can link as many datasets as you'd like. If you accidentally link one that you don't want, hover the mouse over the Linked button - it will changed to an Unlink button which you can click to remove the dataset from the project.

There are many great projects and datasets on data.world, and it's likely that at some point you are going to want to use data from them in your own work. There are two different ways to reuse data on data.world: linking, and downloading and re-uploading. Which option you choose depends on a few factors:

  • Is the source data in a project or a dataset?

  • How well does the source data meet your needs?

  • Is the data either streamed or regularly updated?

If the data is in a dataset (as opposed to in a project), is well-documented, concise, and clean, you may very well want to link to it. However if you need to make changes to it, you'll need to download it, edit it, and re-upload it.

Some reasons you might choose to link to the dataset include:

  • You don't need to make any additions to the source dataset (e.g., adding extra columns with data.world linked-data fact tables)

  • The source dataset is really clean so you don't need to go in and clean it up

  • The dataset is well-documented with a good dataset summary, references to the original source, and a complete data dictionary

  • The dataset is automatically updated from an external source

Some reasons you might choose to download and re-upload data include:

  • You want to add columns from data.world linked-data fact tables (e.g., US census region, currencies, ICD10 medical codes, etc.,)

  • You only want to use a subset of the files in a dataset and don't want the rest of the files adding unnecessary complexity to your dataset or project

  • The data files would benefit from cleaning for clarity (e.g., removing blank columns, removing columns containing a single value, changing file or column names, etc.,)

  • The data files only exist in a project and not in a dataset

  • The data dictionary and/or dataset summary are incomplete and you do not have write privileges to the dataset.

The table below summarizes the differences between linking and downloading a file and re-uploading a data file:

Linked vs Reimported Data

Linked

Reimported

Can add to a project

X

X

Can add to a dataset

X

Can extend data with data.world linked-data fact tables

X

Can edit data dictionary and dataset summary

X

Must recreate the dataset summary and the data dictionary for every file in the dataset

X

Do not have to use all of the files in a dataset

X

Can reuse data dictionary and other metadata

X

Automatically updated from original dataset

X

Must include all the files in a given dataset

X

Uploading files and data directly to your data project is great for materials specific to the project such as images, documentation, and code. You can also add data this way, but because you cannot link projects to other projects, data is generally best placed within datasets to make reuse easier. You can add files directly to your project by clicking the Add button next to your Project directory header and selecting Project file:

connect-data-06.png

Files added to the project are not visible anywhere else on data.world. If there are files that you want to access from more than one project, upload them to a dataset instead of a project. For information on other ways to add data to your project see the article How to get your data into data.world.How to get your data into data.world