Enterprise docs

What does it mean for a dataset to be discoverable?

When a dataset is set to be discoverable, it is listed in public search results and its metadata (description, summary, creator, and license), contributors, and discussion can be viewed by anyone on data.world. For those customers with a closed network, it is visible to everyone on that customer installation. None of the individual files or tables in the dataset are visible, however. Only files that are explicitly marked as ‘Preview’ files are viewable by end-users until the end-user is granted read or write access to the dataset. The reason for the discoverable flag is to expose the existence of a dataset to others who might have a use for it, while at the same time maintaining control over who has can access the data in it. It is a useful tool for making users in other groups aware of the dataset so they can be granted permissions to it on an individual or group basis.

Setting a dataset as discoverable

Only a dataset that is owned by an organization (as opposed to an individual user) and which has its permissions set to private can be made discoverable. When a dataset is owned by an organization but set to private, it is not automatically shared with anyone else in the organization except the admins. Sharing a dataset or making it discoverable can be changed from the Settings tab on the dataset. Making a dataset discoverable is done after the dataset has been created from the Settings tab:

Screen_Shot_2019-12-17_at_6.38.25_PM.png

When a dataset is discoverable this is what it looks like to everyone who doesn't have explicit permissions to it. Notice the indication under the summary that there are files in the dataset to which the viewer doesn't have permission:

Screen_Shot_2020-01-15_at_3.51.42_PM.png

When someone wants to see the rest of the dataset, they select the Request access button in the upper right of any tab on the dataset. The creator receives notice of the request in email and can then either approve or disapprove it:

Screen_Shot_2019-12-17_at_7.21.26_PM.png

Protecting your data

With discoverable datasets we introduced the ability to make a the existence of a dataset available without exposing any of the data in it. In this article we'll discuss how to take this feature and extend it even further by adding sample data files that can also be viewed. Users can view the samples to determine if they want to request access to the full dataset. A preview of the sample file is visible on the dataset overview page. If the user evaluating the data would like to see more than the preview, the file can be downloaded and viewed.

Considerations

There are different ways to create sample preview files. A sample may have all of the columns as the original file, but not all the rows. Or it may have only some of the rows and some of the columns--columns with sensitive data having been removed.

Creating a sample file with all the columns

Starting from the overview page of your discoverable dataset, the easiest way to create a sample with all of the columns and only some of the rows is to select the Explore dataset button on the top right of the dataset overview page. Then select the file you would like to preview from the list in the left sidebar and click Query:

Screen_Shot_2019-12-17_at_8.50.34_PM.png

All you need to add to the sample query you are presented is a line at the end with a LIMIT clause in it setting the limit of the number of rows returned by the query to five (or however many you would like to be available. Note that only five will be previewed but the rest are available for download):

Screen_Shot_2019-12-17_at_8.53.35_PM.png

Hit the Run query button, then Download and Save to dataset or project:

Screen_Shot_2019-12-17_at_8.45.07_PM.png

It's a good idea to save the resulting table so that it's easily identified as a sample of a data file, not the file itself. The current dataset name will automatically be populated in the Dataset/Project field:

Screen_Shot_2019-12-18_at_3.00.06_PM.png

After you have added the file to your dataset you can go back to the dataset overview page by selecting View dataset from the left sidebar menu:

Screen_Shot_2019-12-18_at_3.16.43_PM.png

Then scroll down to the sample file and select the three dots on the right to edit its metadata:

Screen_Shot_2019-12-18_at_3.46.01_PM.png

Check the Preview option and save:

Screen_Shot_2019-12-18_at_3.01.58_PM.png

Creating a sample file with some columns removed

The process for creating this sample file is exactly the same as for the previous file except for one exception: Instead of using the * wildcard in your SQL query for "all columns", you need to list out the columns you want to include and then finish with the LIMIT clause as before. Here is an example:

SELECT sales_per_year.base_product,
       sales_per_year.yearly_sales,
       sales_per_year.statusid
FROM sales_per_year
LIMIT 5

each file can also be flagged as discoverable from the file metadata, accessed from the three dots to the right of the file name on the Overview tab:

Screen_Shot_2019-12-17_at_7.53.25_PM.jpg

Check the Preview box to make a random 5-line sample of the file visible from the overview page of the dataset:

Screen_Shot_2019-12-17_at_7.53.57_PM.png

Files that have been made previewable are flagged with the Preview label on the overview tab in the creator's view:

Screen_Shot_2019-12-17_at_7.54.40_PM.png

Note: Even though the file cannot be accessed in data.world until permission has been granted by the creator, it can still be downloaded by anyone. For this reason it's best to make a sample file derived from the original data and flag it as discoverable rather than make the original file previewable:

Screen_Shot_2019-12-17_at_8.05.42_PM.png