Community docs

How our data connections interact with your data

There are three main ways that data.world can interact with data from your environment:

  1. Metadata collection, including optional Technical Lineage Server

  2. Virtual query capability

  3. Data extract/import capability

Typically for the vast majority of security and compliance needs, (1) Metadata collection and (2) virtual query are sufficient, and (3) data extract/import can be optionally disabled. If only (1) and (2) are leveraged, data.world does NOT store data in our platform--only metadata.

Data extract/import

Virtual connections can be leveraged to extract/import data into data.world. Any data imported into data.world will often benefit from query performance improvements if it is in a small or medium size dataset (i.e., under 3GB). We recommend larger amounts of data be separated into multiple datasets, or accessed via virtualization instead. Data in data.world datasets is kept securely in encrypted data stores.

Note

Some customers choose to disable the ability to extract and import data to prevent any data from being persisted in data.world for compliance or other reasons. The decision to allow or disallow extract/import is fully in the hands of the customer.

Virtual query

Datasets in data.world are built off of the connections, but no data is stored in them. When a user performs a data query, a short-lived connection fetches only the query results. These results are NOT persisted--neither in storage nor in memory. Additionally, by default, queries have a 5 minute timeout for performance and security reasons. Finally, all queries are comprehensively tracked and audited via query audit logs that customers can access and monitor.

A virtual appliance (or optionally a hardware appliance or reverse SSH tunnel service) is implemented in the customer environment by the customer. The virtual and hardware appliances are powered by our partner Trustgrid, a secure connection bridge technology vendor trusted by many of the largest banks and financial institutions. Using the Trustgrid technology, a network bridge is configured on the data.world side in conjunction with the customer. At that point, read-only system user credentials can be securely stored in data.world as connections, and encrypted connections are initiated as outbound only. Connections in the system are encrypted and initiated as outbound only.