Community docs

Integration architecture

data.world is a cloud-first, SaaS solution. However some use cases and capabilities either can be or must be implemented in your own compute environment, dependent on security or infrastructure operations requirements.

This document outlines the architecture design of those customer hosted components, and how to implement them.

Servers and permissions

Metadata collectors can be run on the same server. If you do so, ensure you have enough combined compute, storage, and networking resources to meet peak usage based on your scheduling. For example, if you plan to schedule 10 complex collection tasks to run simultaneously from the same instance, it is recommended that you increase available resources 3-5X, and monitor the resource usage on that instance for further optimization.

We recommend the data.world Bridge and technical lineage server be implemented on separate instances from metadata collectors to minimize resource interference and maximize network security control. Specifically, MANTA recommends, for the technical lineage server, that a “dedicated machine is recommended for MANTA to avoid collision for resources and limit access of MANTA to other data and applications for security reasons.”

Make sure the services have the proper permissions to allow network connectivity:

  • Metadata collectors must be able to connect to the source systems like databases and BI tools that you intend to collect metadata from.

  • The metadata collector you are using for lineage collection must be able to connect to the technical lineage server (MANTA) in order to pass some lineage information to data.world.

  • If you intend to automate the upload of metadata to data.world, metadata collectors must have permission to do an outbound connection to api.data.world via HTTPS on port 443.

  • The technical lineage server also must be able to connect to the source systems like databases and BI tools that you intend to collect metadata from.

  • The technical lineage server does not need to have outbound internet access, unless you intend to make the technical lineage server UI available via the public internet. However the technical lineage server does have it’s own UI, which your technical users will benefit from being able to use directly. The UI port should be made accessible for others in your enterprise to reach, and optionally it supports LDAP/AD integration for simplified, consolidated user administration.

  • The data.world Bridge can be a hardware appliance, but most often is leveraged as a virtual appliance. These appliances are provided by data.world. The data.world Bridge does not need to be able to connect with the on-premise metadata collectors nor the technical lineage server. It is only focused on secure brokering of data.world hosted metadata collectors (noted in DWCC hosted by data.world below) and data.world data virtualization / federated query capabilities.

More details are available in the supplemental technical lineage server (Manta) and data.world Bridge documentation articles.