Skip to main content

About the dbt Core collector

The dbt collector processes artifacts from your dbt Core project to harvest dbt assets and lineage relationships from dbt transformations.

Note

The latest version of the Collector is 2.159. To view the release notes for this version and all previous versions, please go here.

How does the dbt Core collector work?

The collector harvests metadata from dbt generated files.

The dbt Core collector will also identify how dbt moves data between tables (i.e., lineage). To accomplish this, the dbt collector needs to parse View SQL. Without specifying the target database information, no lineage relationships between columns specified through views can be harvested. The connection information is passed in via dbt’s profiles.yml file or can be supplied with the data.world YAML file or CLI command.

Note that the collector however does not harvest everything that the target database collector would harvest. For example, Snowflake can harvest profiling, tags, and policies that the dbt Core collector will not harvest. It is recommended to run both the dbt Core collector and the target database collector to build a comprehensive data catalog.

What is cataloged

The information cataloged by the collector includes metadata for the following dbt Core resources:

Table 1.

Object

Information cataloged

Analysis

Name, Description, Path, Root path, Package name, Unique ID, Alias, Meta, Raw SQL, Compiled SQL, Enabled, Materialized, Resource type

Model

Name, Description, Path, Root path, Package name,Unique ID, Alias, Meta, Raw SQL, Compiled SQL, Enabled, Materialized, Resource type

Project

Name, Project version

Snapshot

Name, Description, Path, Root path, Package name, Unique ID, Alias, Meta, Raw SQL, Compiled SQL, Enabled, Materialized, Resource type

Seed

Name, Description, Path, Root path, Package name, Unique ID, Alias, Meta, Raw SQL, Compiled SQL, Enabled, Materialized, Resource type

Source

Name, Description, Path, Root path, Package name, Unique ID, Alias, Meta, Raw SQL, Compiled SQL, Enabled, Source name, Resource type

Test

Name, Description, Path, Root path, Package name, Unique ID, Alias, Meta, Raw SQL, Compiled SQL, Enabled, Materialized, Resource type



Relationships between objects

By default, the harvested metadata includes catalog pages for the following resource types. Each catalog page has a relationship to the other related resource types. If the metadata presentation for this data source has been customized with the help of the data.world Solutions team, you may see other resource pages and relationships.

Table 2.

Resource page

Relationship

Model

  • Project containing dbt model

  • Tests testing the integrity of model, dbt resources (test, seed, model, snapshot, source) that are upstream of model

  • dbt resources (test, seed, model, snapshot, source) that are downstream of model

Project

  • Dbt resources (test, seed, model, snapshot, source) contained within project

Snapshot

  • Project containing dbt project

  • dbt resources (test, seed, model, source) that are upstream of snapshot

  • dbt resources (test, seed, model, source) that are downstream of snapshot

Seed

  • Project containing dbt project

  • dbt resources (test, seed, model, snapshot, source) that are upstream of seed

  • dbt resources (test, seed, model, snapshot, source) that are downstream of seed

Source

  • Project containing dbt project

  • dbt resources (test, seed, model, snapshot) that are downstream of seed

  • database table that is the source of data for source

Test

  • Project containing dbt project

  • dbt model that has its integrity tested by this test



Lineage for dbt Core

Table 3.

Object

Lineage available

dbt model materialized as view

Referenced database tables and columns in dbt model materialized as view.

dbt resource

dbt resources that are upstream and downstream (for example, seeds that are upstream of models, and tests that are downstream of models) of dbt resource.



The collector also harvests column-level lineage for the following databases in the dbt collector:

  • PostgreSQL

  • Redshift

  • Snowflake

Important

For Eureka Explorer, these harvested lineage relationships display from the page of theupstreamordownstreamresource from dbt. For example, you can see and access Eureka Explorer from a downstream Snowflake table resource page to see what upstream Snowflake table was transformed as a result of a view associated with a dbt model. The dbt resource will also appear in Eureka Explorer.

Supported versions of dbt Core

The collector supports the following dbt Core versions:

  • dbt 1.0.5

  • dbt 1.1.0

Authentication supported

The collector supports the following authentication methods to the target databases:

  • Username and password authentication

When authenticating to Snowflake, the collector also supports:

  • Username and key pair authentication.