Skip to main content

Advanced configuration: Setting up data.world bridge for data virtualization

The data.world Bridge is a customer-managed connectivity solution that enables outbound-only, short-lived, and highly secure communication between on-premise systems and the data.world platform. It is designed for organizations with strict security and compliance requirements, including those in industries such as banking, healthcare, and financial services.

The Bridge supports data virtualization and federated queries, allowing data to be securely accessed in place without requiring full data movement or replication.

data.world offers the Bridge as either a hardware appliance or a software virtual appliance, with deployment options for AWS, vSphere, Microsoft Azure, and Google Cloud Platform.

data.world provides the Bridge as either a hardware appliance or a software virtual appliance, with deployment support for AWS, vSphere, Microsoft Azure, and Google Cloud Platform.

Bridged connection appliance (outbound)

For some organizations, any incoming connection from the open internet is considered insecure—particularly in sectors that handle sensitive health or financial data. In these cases, an outbound-only architecture offers a secure alternative that avoids exposing internal systems to external connections.

A bridged connection involves deploying a Bridge appliance inside the organization’s network, where the target database servers reside. The appliance initiates outbound connections at startup and maintains them securely over time.

This solution ensures that no inbound connections from the open web are ever required. The Bridge operates within the organization’s controlled environment while securely linking on-premise data sources to the data.world platform.

Implementing the Bridge requires some collaboration with the organization’s IT department, which will work with data.world to deploy and configure the appliance. Once deployed, the Bridge requires minimal ongoing maintenance, with attention only needed in exceptional cases.

The appliance makes outbound connections to:

  • The data.world Data Plane – for secure data transfer

  • The data.world Control Plane – for configuration and management

  • The target database instances within the organization’s network

    Note

    This solution ensures that no inbound connections are established from the open internet. All communication originates from within the organization’s network.

Deployment options

To effectively deploy a data.world Bridge appliance, you have two options:

  • Hardware appliance

  • Virtual appliance

For optimal efficiency and ease of deployment, virtual appliances are recommended if your system meets the minimum specifications listed below. Hardware deployments may require additional setup time and physical rack space.

System requirements

Hardware Appliance:

  • Minimum specifications: 2 vCPU, 4 GB RAM, 32 GB storage

  • Throughput capacity: Supports up to 250 Mbps

Virtual Appliance:

  • Compatible platforms: VMware vSphere 5.5 (or later), Amazon Web Services (AWS), Microsoft Azure Cloud, Google Cloud Platform

Recommended configuration for all deployments:

Table 1.

Item

Requirements

CPU

4 vCPU (or equivalent)

RAM

4 GB

Disk space

30 GB



Configuration guidelines

The data.world Bridge operates as a stateless network device, ensuring efficient and resilient connectivity. Following these configuration best practices will help maintain reliability and uptime.

vSphere Configuration:

  • Set the Distributed Resource Scheduler (DRS) level for VMs to Partially Automated or Disabled.

  • Create an anti-affinity rule to improve resource distribution and system reliability.

  • Avoid backing up Bridge virtual appliances, as this can interfere with their stateless design.

High Availability Deployment:

  • Deploy secondary high-availability (HA) Bridge nodes on separate physical hosts to ensure redundancy and minimize downtime.

  • A high-availability setup requires two active instances running simultaneously.

High-level architecture

data.world manages the Application, Gateway Node, and Cloud Management components, while customers manage the Edge Nodes and their connections to databases, BI tools, and other systems that integrate with data.world.

For optimal reliability, a high availability (HA) configuration is recommended. Customers implementing the Bridge can use up to two nodes for redundancy.

dataworld_bridge.png

Network requirements

If your organization enforces firewall restrictions for outbound internet traffic—or if there is a firewall between Bridge nodes and internal servers—ensure the necessary firewall rules are in place to allow traffic from the data.world Bridge.

If assistance is needed to determine the required firewall rules, please contact the data.world support team.

For environments with multiple VLANs, configure switch ports to permit access to the appropriate VLAN.

For Standard (one node) setup:

  • 1 Public IP Address

  • 1 Private IP Address

  • 2 DNS Servers

For High Availability (Two Nodes) setup:

  • 2 Public IP Addresses

  • 2 Private IP Addresses

  • 1 Private IP Address for Cluster IP

  • 2 DNS Servers

Installing the data.world bridge

  • For instructions on how to install the data.world Bridge, contact your customer service representative or the data.world support team.