Enterprise docs

How our data connections interact with your data

There are three main ways that data.world can interact with data from your environment:

  1. Metadata collection, including optional Technical Lineage Server

  2. Virtual query capability

  3. Data extract/import capability

Typically for the vast majority of security and compliance needs, (1) Metadata collection and (2) virtual query are sufficient, and (3) data extract/import can be optionally disabled. If only (1) and (2) are leveraged, data.world does NOT store data in our platform--only metadata.

Docker microservices (or Java application in the case of the Technical Lineage Server) are deployed in the customer environment by the customer. These services only transmit metadata to data.world:

  • Source system descriptive information

  • Schema information such as tables and columns

  • Object-oriented descriptive information such as the titles/names of dashboards or reports and when they were created

Transmission of this information is over HTTPS with optional custom SSL certificate support. This information is non-sensitive in nature, and can be managed by data stewards, data governance, and data security professionals in your organization for further curation and access control.

Virtual connections can be leveraged to extract/import data into data.world. Any data imported into data.world will often benefit from query performance improvements if it is in a small or medium size dataset (i.e., under 3GB). We recommend larger amounts of data be separated into multiple datasets, or accessed via virtualization instead. Data in data.world datasets is kept securely in encrypted data stores.

Note

Some customers choose to disable the ability to extract and import data to prevent any data from being persisted in data.world for compliance or other reasons. The decision to allow or disallow extract/import is fully in the hands of the customer.

Datasets in data.world are built off of the connections, but no data is stored in them. When a user performs a data query, a short-lived connection fetches only the query results. These results are NOT persisted--neither in storage nor in memory. Additionally, by default, queries have a 5 minute timeout for performance and security reasons. Finally, all queries are comprehensively tracked and audited via query audit logs that customers can access and monitor.

A virtual appliance (or optionally a hardware appliance or reverse SSH tunnel service) is implemented in the customer environment by the customer. The virtual and hardware appliances are powered by our partner Trustgrid, a secure connection bridge technology vendor trusted by many of the largest banks and financial institutions. Using the Trustgrid technology, a network bridge is configured on the data.world side in conjunction with the customer. At that point, read-only system user credentials can be securely stored in data.world as connections, and encrypted connections are initiated as outbound only. Connections in the system are encrypted and initiated as outbound only.