Community docs

Create a dataset

When you create a dataset it might be because you have a database or other tabular data that you want to analyze and share. But data from a database isn't the only kind of data you can put in a dataset. Any file type can be saved there. Check out our article on supported file types for Information about various file types and the ways they are handled.

There are several ways datasets can be created:

  • Manually - we'll walk through that here

  • Via our API - instructions available in our API docs

  • Through super connectors like Stitch, KNIME, Knots, and Singer - instructions can be found in our integration documentation under super connectors

  • With Sparklebot - for data portals or enterprise companies, contact to find out more about our tools to automate creation and syncing of your data resources. This can be full data and metadata mirroring, or simply a catalog of your data sources with metadata and sample data where you'd like.

+New dataset

While logged in to click on +New in the upper right corner of your window to create a new dataset and you'll be prompted to choose either a dataset or a project:


Choose Create new dataset and you'll be prompted to name the dataset, and set the ownership and, accessibility. If you are in one or more organizations, by default the owner field will contain the name of one of the organizations you are in. You can also set the owner to be yourself or any of the other organizations you are in by selecting the dropdown on the owner filed:


Dataset owner and permissions

If the dataset is intended to be used in the organization, it should typically be created with the organization as the owner. In this way the dataset benefits from the organization's service tier, permissions can be easily set based on the members of the organization, and datasets remain available within the organization even as individuals and permissioning changes. Permissions on a dataset owned by an organization can either be set to No one or to everyone in the organization:


If you are not in any organizations, you will automatically be set as the owner of the dataset, and you can choose to keep the dataset private or to share it with the community:


The number of private datasets you are allowed is determined by your user license--you can create as many public datasets as you would like. More information on account types and pricing are found on our pricing page. There are several factors to consider when deciding whether to make your dataset public or private:

  • When you make a dataset public you allow others to use that dataset in their own projects and build from it. They can't change your dataset in any way or even save queries to it, but they can use and share it.

  • Data that is public on can be downloaded from and used externally. If your data is proprietary or sensitive, it shouldn't be shared.

  • Publicly shared datasets add to the amount of information that is available to everyone for analyzing, visualizing and learning from

More information on permissions can be found in the article Understanding permissions.Setting permissions

Whatever the permissions are set at for the dataset will also pass through to any projects that use the dataset. So if the dataset is shared with no one then only you will be able to use it in a project, and if the project in which you include it is open to everyone, no one else will be able to see that dataset. Permissions can always be edited at a later time. After you create your dataset you can document your objective for it, add data to it, or continue on to the overview.


Crowdsourced datasets

In data.word datasets and projects can be owned by individuals or organizations. They can be private, shared with an organization, or shared with the public. With the crowdsourcing feature individuals can even set the ownership of a dataset or project to an organization that they don't belong to as long as that organization is configured to accept ownership proposals. In this article we'll cover:

  • Configuring an organization you administer to accept dataset proposals

  • Setting ownership of a resource to an organization you don't belong to

  • What happens after an ownership proposal is made

Configuring an organization you administer to accept dataset proposals

On an organization's page in there is a settings tab where administrators of the organization can set preferences for membership in the organization and whether it accepts datasets proposed by individuals outside the organization:


If the organization is configured so that it accepts dataset proposals the projects and datasets can be created for this organization by any community member and will be subject to admin approval. Proposed resources will count towards the organization’s resource limit. Once the organization hits the limit, users will not be able to submit proposals and will see a message saying, “This organization is not accepting datasets or projects right now.”

Setting ownership of a resource to an organization you don't belong to

When you create a new dataset or project, one of the ownership options you can choose is any organization that is accepting proposals. On the Create a new dataset dialog there is an option to see what organizations are accepting proposals:


When you follow the link you will be able to search for an organization by name or select from a list:


After you select the owner it's a good idea to enter a description and upload or link data so that the organization admin

What happens after an ownership proposal is made

When you propose a dataset or project to an organization it is only visible to you and the org admin(s) until it has been approved by the admin(s) and shared. If the admin(s) choose not to share it, it will remain visible only to you and them--no one else in the organization. You will also no longer be able to change the access privileges (make it discoverable, or publicly available) or invite anyone else to contribute to it.