Skip to main content

Add data to data.world

Note

This tutorial is part of the basic tutorial series for the data.world platform. See the article overview of basic tutorials for more information.

There are many ways to find data on the web, and the data you find comes in many different formats from text to tables to images. In this tutorial we will find data in a public database on the web, create a dataset from it, and link it to our project. You will walk through finding a public database on the web, downloading data from it, and uploading that data to a data.world dataset.

Objectives

After working through the tutorial you should be able to:

  • Find data on the web

  • Prepare a file for upload to data.world

  • Create a dataset

  • Add a new dataset to a project

Requirements

To complete this tutorial you need to have:

  • A data.world login (available for free here if you don't have one).

  • Your own tutorial project (you must create this yourself--it cannot be downloaded)

  • The Bee Colony Statistics dataset linked to your project

If you need help creating the project or linking the dataset to it, detailed instructions are in the tutorial Create a project to work with data.

Background

When the original Bee Colony Statistics dataset was created, the 2017 data from the United States Department of Agriculture (USDA) wasn't available. Now that it is, we can download it, create a new dataset from it, and add it to our project. Because the original dataset is well-documented, it's easy to look up the source of the original data (the Quick Stats database of the National Agricultural Statistics Service) so we can get the latest statistics.

Find data on the web

The original dataset has more than one table in it, but for this tutorial we'll be looking at just the bee colony census data by state. The link to the Quick Stats database is in the Summary of the Bee Colony Statistics dataset:

Screen_Shot_2020-01-14_at_5.04.23_PM.png

and the parameters used in it are shown in the file Search criteria for bee colony census by state.png:

Screen_Shot_2020-01-14_at_5.09.18_PM.png

However to make getting the data a little easier, here is a link to the Quick Stats database with the parameters already filled in. All we have to do now is to select the Get Data button at the bottom of the screen. The results should contain 50 rows. The number of rows is shown in the upper right corner of the window. If there aren't 50 rows, use the Back link on the bottom of the screen to go back to the previous page to verify your parameters.

Screen_Shot_2020-01-13_at_9.09.34_PM.png

Once you have the results, you can download them onto your desktop to re-upload them. They will be in CSV format so if you have Excel, Google Sheets, or another spreadsheet program you could open the file after you've downloaded it, but that isn't necessary for this activity. Select Spreadsheet (shown in the image above)to download the file.

Prepare a file for upload to data.world

The filename from the USDA will be a series of letters and numbers--nothing with any informational content. When files are uploaded to data.world, the names they have on ingest are the names they will have on data.world--they cannot be changed after uploading except by downloading them, changing the name, re-uploading the renamed version, and deleting the original file. To make your data more useful rename the file Bee Colony Census 2017 by State.csv:

Screen_Shot_2020-01-15_at_1.52.45_PM.png

When you upload a a spreadsheet with multiple tabs each tab is preserved as a separate table in data.world. Before uploading your file it is a good idea to review the names on all the tabs as they will each show up as a table name.

While you can upload any type of file to data.world you might get an error if the file is too large or if it's corrupt, or if there is another issue with the file. For a complete list of the errors you might encounter when uploading a file see the article on file upload status messages.

Create a dataset

Creating a dataset is very similar to creating a project. From your homepage (or any page with a + New link in the header) Select + New from the header and choose Create new dataset:

Screen_Shot_2020-01-15_at_1.54.54_PM.png

In the Create a new dataset dialog you can name of the dataset, choose the owner, and set the access permissions. If you are in an organization, the organization's name will show as the owner by default. If you are not in an organization, you will be the default owner. From the dropdown on the Owner field you can change the owner--including proposing ownership to an organization that accepts proposals for ownership:

Screen_Shot_2020-01-15_at_1.58.58_PM.png

After you have put in a title and set the ownership, you need to set the permissions. By default, permissions are set to share with no one. If you set the ownership of the dataset to an organization, the other options are to share with everyone in the organization or to make it public to the data.world community. If you set yourself as the owner your only options are to share with no one or make public to the entire data.world community. Once you've set the permissions, select Create dataset and you can either add a description and/or upload your data file, or you can continue on to the dataset overview:

Add a new dataset to a project

To add this dataset to a project select the arrow next to Explore this dataset and choose Add to existing project:

Screen_Shot_2020-01-16_at_1.34.20_PM.png

Note: If you did not create a project in the prior exercise you can do it now by selecting Create a new project. If you need help creating the project, see the tutorial article Create a project to work with data.

At this point you'll be presented with a dialog box showing the dataset on the left and a list of the projects owned by you or an organization you are in to which you have write permissions:

Screen_Shot_2020-01-16_at_1.35.59_PM.png

Make your selection, and after you click Save you can either go back to your dataset or to the project:

Screen_Shot_2020-01-16_at_1.39.52_PM.png

Exercises

This tutorial uses real-world, feral data--not a made-up, sanitized file. It begins with accessing a live, publicly-accessible, US government database on the web, running a query against it, and saving the results from the query to a file. The next step is to create a dataset and upload a file to it. If you prefer to skip right to creating the dataset and uploading the file, download the Bee Colony Census 2017 by State.csv file from the dataset Bee colony statistics and proceed directly to step 5 below.

  1. Go to the Quick Stats database for the National Agriculture Statistics Service on the United States Department of Agriculture website (the parameters will be pre-populated for you).

  2. Select Get Data

  3. Download the data as a spreadsheet

  4. Rename the downloaded file Bee Colony Census 2017 by State.csv

  5. Login to your data.world account and create a dataset named Bee Colony Census 2017

  6. Upload and add the file to the dataset

  7. Add the new dataset to your project

Best Practices

It is very easy to create a dataset and add new data to data.world, and there are some things you can do to make it easy to use too. See our article Dataset best practices for more information.

Conclusion

Creating a dataset and creating a project are very similar activities, and both are intentionally structured to work together easily. You can create a dataset and add it to an existing project, or you can create a dataset and a project to work with it all at the same time. Uploading a file is only one way to add data to data.world. See the article on getting your data into data.world for information on other methods.

After you put your data onto data.world there are many things you can do to make it easy for others to find and use. We have additional articles on how you license, document, verify, set file labels, and tag your data, and how different file types are handled in our help center. We encourage you to make use of them to get the most out of your experience putting data on data.world.

References