Skip to main content

What does it mean for a dataset to be discoverable?

When a dataset is set to be discoverable, it is listed in public search results and its metadata (description, summary, creator, and license), contributors, and discussion can be viewed by anyone on data.world. For those customers with a closed network, it is visible to everyone on that customer installation. None of the individual files or tables in the dataset are visible, however. Only files that are explicitly marked as ‘Preview’ files are viewable by end-users until the end-user is granted read or write access to the dataset. The reason for the discoverable flag is to expose the existence of a dataset to others who might have a use for it, while at the same time maintaining control over who has can access the data in it. It is a useful tool for making users in other groups aware of the dataset so they can be granted permissions to it on an individual or group basis.

Setting a dataset as discoverable

Only a dataset that is owned by an organization (as opposed to an individual user) and which has its permissions set to private can be made discoverable. When a dataset is owned by an organization but set to private, it is not automatically shared with anyone else in the organization except the admins. Sharing a dataset or making it discoverable can be changed from the Settings tab on the dataset. Making a dataset discoverable is done after the dataset has been created from the Settings tab:

Screen_Shot_2019-12-17_at_6.38.25_PM.png

When a dataset is discoverable this is what it looks like to everyone who doesn't have explicit permissions to it. Notice the indication under the summary that there are files in the dataset to which the viewer doesn't have permission:

Screen_Shot_2020-01-15_at_3.51.42_PM.png

When someone wants to see the rest of the dataset, they select the Request access button in the upper right of any tab on the dataset. The creator receives notice of the request in email and can then either approve or disapprove it:

Screen_Shot_2019-12-17_at_7.21.26_PM.png

Protecting your data

With discoverable datasets we introduced the ability to make a the existence of a dataset available without exposing any of the data in it. In this article we'll discuss how to take this feature and extend it even further by adding sample data files that can also be viewed. Users can view the samples to determine if they want to request access to the full dataset. A preview of the sample file is visible on the dataset overview page. If the user evaluating the data would like to see more than the preview, the file can be downloaded and viewed.

Considerations

There are different ways to create sample preview files. A sample may have all of the columns as the original file, but not all the rows. Or it may have only some of the rows and some of the columns--columns with sensitive data having been removed.

Creating an extract sample from a table to use as a preview for the table

The quickest way to create an extract from a file to use as a preview in a discoverable dataset is to:

  1. Query the original table and limit the results

  2. Save the query results (the extract from the table) as a new file in the dataset

If there is sensitive data in the file that you would like to mask you can either modify your query to exclude or mask the data, or use a custom_types.ttl file to set the column to a masked data type. See the linked articles for more information.

To query the original table, select the Explore dataset button on the top right of the dataset overview page. Then select the table you would like to create the extract from in the list in the left sidebar and click Query:

Screen_Shot_2019-12-17_at_8.50.34_PM.png

A sample query will be presented at the top of the window and all you need to do is change the LIMIT clause to however many rows you would like to be available in the preview. Note that only five results will be previewed on the dataset Overview tab, but the rest are available for download):

Screen_Shot_2019-12-17_at_8.53.35_PM.png

Hit the Run query button, then Download and Save to dataset or project:

Screen_Shot_2019-12-17_at_8.45.07_PM.png

You will be given the choice of using a live view or using a data extract.

CTttl_live_data_or_extract.png

If you want to be able to apply custom data types to the columns in the table, choose use data extract as custom types cannot be applied to live data.

Name your dataset, and the current dataset name will automatically be populated in the Dataset/Project field:

Tip

It's a good idea to name the resulting table so that it's easily identified as a sample of a data file, not the file itself.

Screen_Shot_2019-12-18_at_3.00.06_PM.png

Enabling public preview access to a file in a discoverable dataset

Enabling file on a discoverable dataset to be previewed is a quick and easy process:

  1. On the dataset overview tab, scroll down to the sample file and select the three dots on the right to edit its metadata

    CTttl_preview_edit_metadata.png
  2. Check the Preview box to make a random 5-line sample of the file visible from the overview page of the dataset and save

    CTttl_set_preview_2.png

Files that have been made previewable are flagged with the Preview label on the overview tab in the creator's view:

CTttl_preview_flag.png

Caution

Even though the file cannot be accessed directly in data.world until permission has been granted by the creator, any file visible on the overview page can be downloaded by anyone that can see it.

Screen_Shot_2019-12-17_at_8.05.42_PM.png