About the Marquez collector
Warning
This collector is in public preview. It has passed our standard testing, but it is not yet widely adopted. You might encounter unforeseen edge cases in your environment. data.world is committed to promptly addressing any issues with public preview collectors. If you face any problems, please report them through your Customer Success Director, implementation team, or support team for assistance.
Use this collector to harvest metadata for Marquez objects such as datasets, jobs, and job runs. The collector harvests lineage relationships among the data resources represented by datasets and the jobs that move data between them.
Important
The Marquez collector can be on-premise using Docker or Jar files.
Note
The latest version of the Collector is 2.272. To view the release notes for this version and all previous versions, please go here.
Marquez versions supported
The collector supports Marquez version 0.50.1.
Authentication supported
The collector currently harvests from unauthenticated Marquez API instances only.
What is cataloged
The collector catalogs the following information.
Object | Information cataloged |
---|---|
Dataset | Identifier, Title (name), Description, Creation time, Last update time, Namespace |
Job | Identifier, Title (name), Description, Creation time, Last update time, Namespace |
Job Run | Identifier, Last run time, Last run state (error, completed) |
Relationships between objects
By default, the harvested metadata includes catalog pages for the following resource types. Each catalog page has a relationship to the other related resource types. If the metadata presentation for this data source has been customized with the help of the data.world Solutions team, you may see other resource pages and relationships.
Resource page | Relationship |
---|---|
Dataset |
|
Job |
|
Job Run |
|
Lineage for Marquez
Object | Lineage available |
---|---|
Data resource | Data resources from which this data resource was derived, per a lineage event recorded in Marquez |
Supported cross-system lineage
The currently supported data sources for cross-system lineage are:
AWS Glue
Microsoft SQL Server
Postgres
Oracle
MySQL
Teradata