Skip to main content

Eureka Explorer lineage for catalog resources

Danger

data.world University!

Check out our Intro to Lineage video!

About Explorer lineage

Eureka™ Explorer is a visual map of your data and relationships powered by knowledge graph. It is a visual representation of your data catalog. Lineage combines the powerful automated metadata lineage collection with a visual tool so your data teams can quickly and visually browse the lineage relationships of your data resources. It includes the richness of the data catalog and provides one screen where lineage can be explored. With this capability, your teams will no longer spend hours and days trying to find answers to their root cause, impact, and compliance questions. They can see these relationships on one screen and answer their questions in a matter of a few clicks.

Explorer lineage delivers both a general preview (aggregated summary of how data flows from its source to where it is consumed) and a fully interactive graph of your technical lineage (table, column, and query-level lineage).

Note

Explorer lineage is available to all Enterprise customers. When a supported collector is run, the lineage information is automatically collected and displayed in data.world. To view the list of data sources for which automated lineage collection is available, please see Supported sources.

Some questions that lineage information answers for you

  1. What data and analytics do I have? Where does it live? Can I trust it? What are my bottlenecks and hotspots?

  2. How does it connect? How was it created or derived?

  3. When stuff breaks, why? What will break or be affected when I change something? Who should I talk to?

  4. How does sensitive data flow? Is it being handled properly?

    sample_lineage.png
. How does lineage help you?
  • Build trust in the data you are seeing: Business analysts or a decision makers relies on a dashboard or a report or a particular table and need to know if they can rely on the data source in order to make accurate decisions. That means knowing the context and being able to look upstream to know where does it come from, how was it derived, and be able to know if you can trust how the information was derived. And when something breaks, for example, if your dashboard is not working anymore, you can look upstream and see that information is coming from Snowflake, you can reach out to the tech owner and inform them that the dashboard is broken because the data source has issues. When you look upstream, you can see what is upstream, visualize the source and the relationships of things and being able to see metadata about the different parts of the lineage graph, for example, who is the tech owner, what is the policy, that is the status (Approved, Deprecated).

  • Risk and impact analysis: Data producers and data engineers need to know when they want to change something, how are they going to impact the users of the data. This also helps them troubleshoot when they accidentally broke something, who all did they impact. For example, when they are looking at the Lineage of a data source they can see that there are a bunch of people down streams in Tableau who are building dashboards using a data source that you are about to change and they need to be informed about the change. This helps data producers and data engineers be better stewards to the organization.

. When can you expect cross-system lineage relationships?
  • You can expect a lineage relationship between two objects from two source systems if we have the two collectors for those sources and those collectors harvest the lineage relationship. The exceptions to this statement are 1) the source metadata is lacking the relationship or 2) the source metadata is lacking enough information to positively identify both resources.

Customizing the Lineage pages

The system allows you to enhance the lineage pages to highlight key metadata about the resource. You can do this by adding custom fields to the sidebar of the pages. To learn everything about the out-of-the-box fields that are available in the sidebar and instructions on adding custom fields, please see this documentation.

sample_sidebar.png

Access needed for viewing Explorer lineage

  • Users who have View access to the resource will be able to see the Explorer lineage on the Resource page and interact with it in fullscreen mode.

  • If there are resources in the lineage graph the user does not have permission to see, users will see an indication that there is upstream or downstream content that they are unable to see.

    lineage_limited_view01.png
    lineage_limited_view02.png
    lineage_limited_view03.png

Viewing Explorer lineage

Important

To view the list of data sources for which automated lineage collection is available, please see Supported sources.

To view the Explorer lineage:

  1. Locate a resource for which automated lineage metadata is available. On the resource page, scroll down to the Explorer lineage section.

  2. The Explorer lineage section shows a preview of the asset that you are currently looking at and upstream resources from where this resource came from and downstream resources that are coming from this particular resource. This gives a quick overview of how the resource fits into the larger landscape.

  3. Click the View in fullscreen button to see the full details and to interact with the lineage information.

    view_explorer_lineage.png

Interacting with the Explorer lineage in fullscreen mode

To interact with the Explorer lineage:

  1. On the resource page, in the the Explorer lineage section, click the View in fullscreen button to open the full view of the lineage.

  2. The right-side shows the Lineage summary which summarizes all the resources that are part of the specific lineage. Click through to the various nodes in the summary view or use the Filter resources option to find the resource by name.

    lineage_view02.gif
  3. Focused lineage view puts the main resource that you have selected into the center and shows how it fits into the overall landscape.

    For example, we can see that the RETAIL_ORDER database table was derived using DBT from the RETAIL_ORDER_STAGE table, which was brought into the Snowflake database using FiveTran from a SQL server database. This provides a quick, at a glance view of how this information passes through the overall set of environments.

    lineage_view01.gif
  4. Expand the nodes and see more information such as the columns that are a part of a table and see how information moves at that field or column level. This kind of traceability gives you the ability to see how this information moves from a technical level for root cause analysis, impact analysis, or sensitivity analysis.

    For example, if a particular column has a warning on it, it shows to the user that there might be some problems with it and the user should use caution when using dashboards or other analysis that leverages this field.

    lineage_view03.gif
  5. Note that when columns are part of the table but not included in the lineage, they are not connected to the other components of the lineage and show under the Shownmore resources. You only see this option for the resource for which you are viewing the lineage.

    lineage_show_more.png
  6. To focus on the section of the lineage that you are looking at, click the Re-center view button. Select outside and click Re-center view again, to zoom back to the full lineage view.

    lineage_view04.gif
  7. You can use the lineage diagram to learn about other related resources.

    For example, if we expand the related embedded data source Orders and choose Days to ship Scheduled, we can learn more about this calculated field. In the right side you can view the details about the field along with the formula that is used to calculate the field information. Click View lineage of this resource to see the detailed lineage of the specific field.

    lineage_view05.gif
  8. Use the following shortcuts to interact with the Explorer lineage. Access the list of shortcuts by clicking the ? button on the Explorer lineage page.

    Table 1. Shortcuts for Explorer lineage

    Action

    Keyboard shortcut

    Multi-select

    Click and drag while holding down SHIFT key on the keyboard.

    Expand selection

    Select nodes that have > in the top left and click E on the keyboard.

    Collapse selection

    Select nodes that have V in the top left and click C on the keyboard.

    Recenter

    Spacebar

    Pan around

    Arrow keys

    Jump through node selections

    Tab



Downloading Explorer lineage

The system provides an option to export and download Explorer Lineage information as an excel file. Before downloading, users can preview the data and decide to pre-filter the data prior to export. The export summary provides an aggregate of what is about to be exported. As the data is filtered, the summary updates.

To download the Explorer lineage information:

  1. On the resource page, in the the Explorer lineage section, click the View in fullscreen button to open the full view of the lineage.

  2. On the Explorer lineage page, click the Download as CSV button.

    lineage_download_button.png
  3. The Download lineage of this resource window opens. It shows a summary of what all is included by default for the selected resource. Use the following filters on the page to narrow down the data you want to download. The summary information on page updates as you filter items.

    1. Lineage: Select if you want to download only upstream or downstream information or both.

    2. Resource types: Select from the resources available for the lineage. For example, you will see options like Database columns, Tableau Dashboards. You can select multiple items from the drop down list.

    3. Source: Select the source of information. This is the list of each source technology from which the resources were collected and sourced. For example, you will see options like Snowflake, dbt, Fivetran, etc. You can select multiple items from the drop down list.

    4. Status: Select from statuses, such as, Approved, Pending, Warning. If you want to download a list of resources that have no status, select Resources without a status. You can select multiple items from the drop down list.

    5. Steward: Select from the available stewards or select Resources without a steward to download a list of resources that have no steward. You can select multiple items from the drop down list.

      lineage_download.png
  4. Click the Preview button.

  5. The next screen shows a preview of the information that will be downloaded. It includes two additional fields along with what was selected for downloading: Distance from focused - how far the related resource is from the focused resource (NOTE: distance 0 means it is the focused resource) and Parent - the name of the parent resource the listed resource is part of (if applicable).

    lineage_download_preview.png
  6. Click the Back to filters button to adjust the filters or click the Download button to download a copy of the information.