Enterprise docs

Connection security

There are a variety of connection solutions available for configuring data.world access to remote data sources, and there are several factors to consider when determining the best way to connect. Some connection types are simpler to set up, while others offer enhanced security characteristics but require the involvement of internal IT to deploy software and/or hardware. The connection types available are:

  • Direct connection (inbound)

  • SSH tunnel (inbound, preferred)

  • Bridged connection appliance (outbound)

When a database is accessible on the open internet from a known hostname or IP, you can simply connect to it directly from data.world. You can configure the connection either with the Connection managerConnection manager, or through the Integrations gallery. This solution is perfect for online or SAAS products like Snowflake or when there is a desire to test virtualization capabilities.

Note

When a database is located inside of an organization's network, this is not the preferred solution. Opening database servers to the open web can pose a security risk and should be avoided. If you are considering this solution, you should block traffic from all hosts except for those from a specified allowlist.

When connecting to a database server, data.world allows you to optionally configure an SSH tunnel to connect through. This solution requires some setup on the part of the network administrator, but is more secure than a direct connection.

This is the preferred method for connecting database servers to data.world. It is easy, flexible and secure. No additional hardware is required beyond a bastion server. Many organizations have these deployed as a normal part of their infrastructure.

For additional security, data.world provides user specific SSH public keys which should be configured on the bastion server (in .authorized_keys) to ensure traffic is from data.world.

advanced_connection_screen.png

With some organizations, any incoming connection at all is considered insecure. This is especially true in industries which deal with health or financial data. In these cases, consider an alternative architecture which does not require the organization to accept connections from the open web.

A bridged connection involves deploying an appliance inside of the organization's network—where the target database servers reside. The appliance makes outbound connections at startup, and maintains these connections over time.

data.world has partnered with Trustgrid (http://trustgrid.io) to provide this capability. While Trustgrid helps configure and maintain the connection to the data.world network, data is visible only to data.world. No data enters Trustgrid's network.

This solution requires a commitment of time and resources on the part of the organization's IT department. They must work with data.world to deploy and configure the appliance. Ongoing maintenance of the appliance should be minimal— requiring time only in exceptional cases.

The appliance runs inside the organization's network and makes outbound connections to:

  • data.world "Data Plane" – data transfer only

  • Trustgrid "Control Plane" – configuration only

  • The target database instance(s)

Note

There are no inbound connections to the organization from the open internet with this solution.

A hardware or virtual appliance may be deployed. If you meet minimum requirements outlined below, virtual appliances are preferred, as hardware deployments require additional time and physical (rack) space.

  • Hardware: 2vCPU, 4GBRAM, 32GB Storage (supports 250Mbps of throughput)

  • Hypervisor/Virtualization: VMware vSphere5.5 (or greater), Amazon Web Services, Microsoft Azure Cloud, and Google Cloud Platform

Configuration

Trustgrid appliances are stateless network devices. They require specific vSphere configuration settings to maximize uptime:

  • Set the DRS level for a VM to either PartiallyAutomated or Disabled

  • Create an anti‐affinity rule

  • Do not backup Trustgrid virtual appliances

  • Deploy Trustgrid secondary high-availability appliances on separate physical hosts

Note

High availability deployments require two instances to be deployed and running.