Skip to main content

What is a dataset?

Danger

data.world University!

Check out our Intro to Datasets and Projects video!

A dataset is the basic repository for data files and associated metadata, documentation, scripts, and any other supporting resources that should be stored alongside the data. Datasets are where all data is stored and documented for later sharing and use in projects. They are the building blocks for projects. They contain data and metadata related to a topic. The files and tabular data in a dataset are meant be used--queried and analyzed--in one or more projects. Datasets are meant to be reusable resources. They can be combined with other datasets in projects, or they can be a single source for querying and analysis in a project.

Datasets can be owned by an individual or an organization, and a dataset provides an additional layer of access permissions to the data in a project. Because permissions are assigned at both the dataset and the project level an individual can create a project available to the public, but if the individual adds any datasets owned by an organization to the project, they won't be visible to the public--only to the other people in the organization. Not only is the dataset not visible, but any queries in the project written against that dataset are also not visible except to members of the organization.

Because datasets are linked to projects, any changes to the data or the metadata in the dataset show up automatically in the linked project. Linking data to a project instead of copying it into the project means that everything is kept up to date throughout your organization.