Docs portal

Security

Security

Security is a paramount concern for enterprise customers who are evaluating new systems. The data.world service is not just cloud-first, but it's also security-first. We’ve designed it from the ground up to ensure that we can support a unique combination of internal and external compliance needs. In the following articles we have documented how we keep both your data and your connections safe, and how we enable you to do the same.

How our data connections interact with your data

There are three main ways that data.world can interact with data from your environment:

  1. Metadata collection, including optional Technical Lineage Server

  2. Virtual query capability

  3. Data extract/import capability

Typically for the vast majority of security and compliance needs, (1) Metadata collection and (2) virtual query are sufficient, and (3) data extract/import can be optionally disabled. If only (1) and (2) are leveraged, data.world does NOT store data in our platform--only metadata.

Data extract/import

Virtual connections can be leveraged to extract/import data into data.world. Any data imported into data.world will often benefit from query performance improvements if it is in a small or medium size dataset (i.e., under 3GB). We recommend larger amounts of data be separated into multiple datasets, or accessed via virtualization instead. Data in data.world datasets is kept securely in encrypted data stores.

Note

Some customers choose to disable the ability to extract and import data to prevent any data from being persisted in data.world for compliance or other reasons. The decision to allow or disallow extract/import is fully in the hands of the customer.

Virtual query

Datasets in data.world are built off of the connections, but no data is stored in them. When a user performs a data query, a short-lived connection fetches only the query results. These results are NOT persisted--neither in storage nor in memory. Additionally, by default, queries have a 5 minute timeout for performance and security reasons. Finally, all queries are comprehensively tracked and audited via query audit logs that customers can access and monitor.

A virtual appliance (or optionally a hardware appliance or reverse SSH tunnel service) is implemented in the customer environment by the customer. The virtual and hardware appliances are powered by our partner Trustgrid, a secure connection bridge technology vendor trusted by many of the largest banks and financial institutions. Using the Trustgrid technology, a network bridge is configured on the data.world side in conjunction with the customer. At that point, read-only system user credentials can be securely stored in data.world as connections, and encrypted connections are initiated as outbound only. Connections in the system are encrypted and initiated as outbound only.

Understanding permissions

Permissions on a dataset or project are initially set when the resource is created. If an organization is set as the owner, then permission options are:

  • No one

  • Everyone in the organization

  • Public to the data.world community

New_dataset_permissions_org.png

Note

One safeguard against users accidentally publishing enterprise data out to the wider community is our standard enterprise team publication configuration: By default ‘Create public datasets’ is turned off for our Enterprise customers.

Owners of datasets and projects can invite specific users to contribute, or approve incoming requests from users who want to contribute. Either way, the owner controls what each contributor can do by granting three levels of permissions:

  • View only

  • View + edit

  • View + edit + manage

Datasets have another layer of access permission as they can be flagged as Discoverable. More about this kind of access in the section Discoverable datasets.Here's what each permission level will allow a contributor to do:

View only: primarily used for private datasets and projects, this allows the user to simply view the dataset or project. As part of that, the contributor can:

  • Download any of the files.

  • Query the data and export results.

  • View and comment in either public or private discussions.

  • Create new discussion topics.

View + edit: in addition to the view-only permissions, the contributor can:

  • Make edits to descriptions and summaries.

  • Add and remove tags.

  • Add and remove files.

  • Replace files by uploading new versions with the same name.

  • Modify file and column descriptions.

  • Modify license type.

  • Switch the dataset or project between open and private.

  • Publish queries for others to use.

View + edit + manage: The contributor will have full admin controls to the dataset or project. In addition to the view + edit permissions, they can:

  • Delete the dataset or project.

  • Add, remove, and modify contributors.

Manage your organizations, roles, and users

Organizations are the central “group” unit in data.world, and can be nested for ultimate control and flexibility over access control. An organization is made up of several different types of users:

  • Organization admins are your data domain stewards / governance managers or other personnel who should have direct management overall content in that org.

  • Individual resources can have owners assigned which may be more specific, “ground-level” data stewards.

  • “Discover” level access provides high-level metadata, but requires that a user request access to the resource before seeing additional information or being able to query the data. For more details on discoverable access, which is a very powerful feature and what enables agile data governance, see:

Note

Organization admins can see all org-owned datasets regardless of sharing configuration.

How link sharing works

One of the powerful features of our platform is that results from queries in a project can be reused or embedded. These links are not discoverable.

When a link to the results of a query is created, it is encoded with the user token information for the user who originally ran the query. Every subsequent running from that link also runs with the original user's permissions and token. As further security however, even with the link, access is scoped and limited to the specific results of the query. Finally, in VPC deployments share URL's expire after 12 hours.