Preparing to run the Marquez collector
Setting up pre-requisites for running the collector
Make sure that the machine from where you are running the collector meets the following hardware and software requirements.
Item | Requirement |
---|---|
Hardware Note: The following specs are based upon running one collector process at a time. Please adjust the hardware if you are running multiple collectors at the same time. | |
RAM | 8 GB |
CPU | 2 Ghz processor |
Software | |
Docker | Click here to get Docker. |
data.world specific objects | |
Dataset | You must have a ddw-catalogs dataset set up to hold your catalog files when you are done running the collector. If you are using Catalog Toolkit , follow these instructions to prepare the datasets for collectors. |
Network connection | |
Allowlist IPs and domains |
Setting up Marquez
The collector requires lineage events to strictly follow the OpenLineage specification.
To ensure proper harvesting, ensure that dataset names and namespaces in reported lineage events adhere to the OpenLineage naming conventions.