About data lineage and Eureka Explorer
Introduction to Data Lineage
Data lineage refers to the comprehensive tracing of the lifecycle of data—from its origins, through various transformations, and eventually to the systems, processes, and applications utilizing that data. Organizations rely on data lineage for:
Ensuring data quality and accuracy
Assisting with compliance and auditing
Facilitating troubleshooting when issues arise
Supporting data governance and collaboration
Lineage in data.world
In data.world, lineage is captured through the platform’s catalog collectors. These collectors are capable of uniquely identifying each data resource, even when that resource is referenced by multiple data sources. This capability is enhanced by our Knowledge Graph Foundation, which stitches together information from each collector to create a robust representation of data lineage.
Eureka Explorer: Overview, features, and benefits
Eureka™ Explorer is a visual map of your data and relationships, powered by a knowledge graph. It visually represents your data catalog, providing a visual tool that combines powerful automated metadata lineage collection with an easy-to-use interface. This allows data teams to rapidly and visually browse the lineage relationships of data resources, answering root cause, impact, and compliance questions with just a few clicks.
Features
Aggregated Summary and Interactive Graph: Explorer lineage delivers a general preview of data flow and a fully interactive graph of technical lineage, including table, column, and query-level details. This makes it a comprehensive tool for understanding data connections, transformations, and dependencies.
Simplified Browsing: Allows the exploration of lineage relationships on a single screen, integrating the richness of the data catalog so teams do not need to spend extensive time searching for answers.
Automatic Collection for Enterprise Customers: Available to all Enterprise customers, with automated collection and display of lineage information when a supported collector is run. Refer to Supported sources for detailed information on sources with automated lineage collection.
Lineage Benefits
Building Trust in Data: Provides essential context, enabling business analysts and decision-makers to ensure the data's reliability. Users can examine upstream sources to verify data derivation, trustworthiness, and resolve issues such as broken dashboards or disrupted data flows.
Risk and Impact Analysis: Data producers and engineers are informed about data changes' impact on downstream users, aiding them in effective communication and troubleshooting.
Cross-System Lineage Relationships
Lineage relationships between two objects from different source systems are feasible if collectors for those sources harvest the lineage relationships. Exceptions occur when source metadata is lacking information or is insufficient to identify both resources unequivocally.
Eureka Explorer in data.world offers an intuitive, user-friendly, and visually appealing approach to tracking and exploring data lineage. It supports organizations in maintaining a clear and comprehensive understanding of data pathways across their data infrastructure, enhancing data governance and operational efficiency.
Access needed for viewing Explorer lineage
Users who have View access to the resource will be able to see the Explorer lineage on the Resource page and interact with it in fullscreen mode.
If there are resources in the lineage graph the user does not have permission to see, users will see an indication that there is upstream or downstream content that they are unable to see.
Customizing the Lineage pages
The system allows you to enhance the lineage pages to highlight key metadata about the resource. You can do this by adding custom fields to the sidebar of the pages. To learn everything about the out-of-the-box fields that are available in the sidebar and instructions on adding custom fields, please see this documentation.