About the dbt Core collector

The dbt collector processes artifacts from your dbt Core project to harvest dbt assets and lineage relationships from dbt transformations.

Important

The dbt Core collector can be run on-premise using Docker or Jar files.

Note

The latest version of the Collector is 2.292. To view the release notes for this version and all previous versions, please go here.

How does the dbt Core collector work?

The dbt Core collector is designed to extract metadata from artifacts generated by the dbt docs generate command, specifically focusing on manifest.json and catalog.json files. Typically, these files are created or updated in the target subdirectory of your dbt project directory. To ensure that the metadata is up-to-date with the current state of your dbt project, it is recommended to run dbt docs generate immediately after executing dbt run and/or dbt snapshot.

Some important things to note:

Databse Lineage metadata: As an Extract-Transform-Load (ETL) tool, dbt primarily generates lineage metadata for database objects such as tables and views. The dbt Core Collector requires accurate and relevant database information to properly identify these associated objects. By default, this information is sourced from the profiles.yml file used to configure dbt. This file is typically located in a .dbt subdirectory within the current user's home directory. If the file is in an alternative location, users should specify that location using the --profile-file option. The first profile listed in the profiles.yml file by default serves as the environment definition for the scanned artifacts. If a different profile is preferred, it can be specified by name using the --profile option.
Missing database information: Certain database details, such as passwords, may not be included in the profiles.yml file. Since the dbt Core collector needs to connect to the target database to gather catalog/schema information, any missing data must be supplied via command options. These options also enable users to override existing data in profiles.yml if needed.
Artifacts scanning and output: The dbt Core collector scans manifest.json to produce catalog resources for any models, snapshots, seeds, and tests it discovers. For dbt models expressed as database views, it writes lineage metadata linking each view's columns to the columns in source tables referenced in the view's SQL DDL (SELECT statement).
Detailed metadata limitations: It is important to note that the dbt Core collector focuses solely on lineage information rather than detailed metadata about database objects. To gather detailed metadata, users should execute the relevant database collectors for the target databases.
Utilizing run_results.json: The dbt Core collector also utilizes one or more run_results.json files, if they are available in the target artifacts directory. These files provide metadata on the processes that result in lineage, such as the manifestation of models as database views, including timestamps and status information.

What is cataloged

The information cataloged by the collector includes metadata for the following dbt Core resources:

Important

Starting with dbt 1.3, the raw_sql and compiled_sql properties in dbt artifacts are now renamed to raw_code and compiled_code. Depending on the version of dbt that you are using, you may see either fields for Raw SQL/Compiled SQL or Compiled Code/Raw code.

Table 1.

Object	Information cataloged
Analysis	Name, Description, Path, Root path, Package name, Unique ID, Alias, Meta, Raw SQL/Raw Code, Compiled SQL/Compiled Code, Enabled, Materialized, Resource type
Model	Name, Description, Path, Root path, Package name,Unique ID, Alias, Meta, Raw SQL/Raw Code, Compiled SQL/Compiled Code, Enabled, Materialized, Resource type, Model Columns
Model column	Column name
Project	Name, Project version
Snapshot	Name, Description, Path, Root path, Package name, Unique ID, Alias, Meta, Raw SQL/Raw Code, Compiled SQL/Compiled Code, Enabled, Materialized, Resource type
Seed	Name, Description, Path, Root path, Package name, Unique ID, Alias, Meta, Raw SQL/Raw Code, Compiled SQL/Compiled Code, Enabled, Materialized, Resource type
Source	Name, Description, Path, Root path, Package name, Unique ID, Alias, Meta, Raw SQL/Raw Code, Compiled SQL/Compiled Code, Enabled, Source name, Resource type, Columns
Test	Name, Description, Path, Root path, Package name, Unique ID, Alias, Meta, Raw SQL/Raw Code, Compiled SQL/Compiled Code, Enabled, Materialized, Resource type
Test result	Time the test was executed, Status, Count of failures (if any), Message emitted by the test (if any)
Semantic Models	Name, Description, Path, Package name, Unique ID, Enabled, Resource Type, Semantic Model Components, Primary Entity
Entities	Title, SQL Expression, Entity Type
Dimensions	Title, Dimension Type
Measures	Title, Description, Has Measure Aggregation
Metrics	Title, Description, Path, Package Name, Unique ID, Metric Type

Relationships between objects

By default, the harvested metadata includes catalog pages for the following resource types. Each catalog page has a relationship to the other related resource types. If the metadata presentation for this data source has been customized with the help of the data.world Solutions team, you may see other resource pages and relationships.

Table 2.

Resource page	Relationship
Model	Project containing dbt model Tests testing the integrity of model, dbt resources (test, seed, model, snapshot, source) that are upstream of model dbt resources (test, seed, model, snapshot, source) that are downstream of model
Semantic Model	Project containing the semantic model dbt model related to the semantic model dbt semantic model components (dimensions, entities, measures) Metric that the semantic model provides context for
Model column	The database column in the manifested table or view
Project	Dbt resources (test, seed, model, snapshot, source) contained within project
Snapshot	Project containing dbt project dbt resources (test, seed, model, source) that are upstream of snapshot dbt resources (test, seed, model, source) that are downstream of snapshot
Seed	Project containing dbt project dbt resources (test, seed, model, snapshot, source) that are upstream of seed dbt resources (test, seed, model, snapshot, source) that are downstream of seed
Source	Project containing dbt project dbt resources (test, seed, model, snapshot) that are downstream of seed database schema that the source represents
Test	Project containing dbt project dbt model that has its integrity tested by this test
Test result	The dbt test that was executed to produce the result

Lineage for dbt Core

Table 3.

Object	Lineage available
dbt model materialized as view	Referenced database tables and columns in dbt model materialized as view.
dbt resource	dbt resources that are upstream and downstream (for example, seeds that are upstream of models, and tests that are downstream of models) of dbt resource.

Supported cross-system lineage

The currently supported data sources for cross-system lineage are:

Important

While other data sources are not formally supported, running the collector for those sources may still enable you to view cross-system lineage between dbt Core and these sources.

BigQuery
PostgreSQL
Redshift
Snowflake
Azure Synapse (the only supported dbt/Synapse adapter is dbt-synapse adapter)
Microsoft SQL server (the only supported adapter is dbt-sqlserver).
Important
For Eureka Explorer, these harvested lineage relationships display from the page of the upstream or downstream resource from dbt. For example, you can see and access Eureka Explorer from a downstream Snowflake table resource page to see what upstream Snowflake table was transformed as a result of a view associated with a dbt model. The dbt resource will also appear in Eureka Explorer.

Authentication supported

The collector supports the following authentication methods to the Snowflake, BigQuery, PostgreSQL, Redshift, Azure Synapse, Microsoft SQL server databases:

Username and password authentication

When authenticating to Snowflake, the collector also supports:

Username and key pair authentication.

Supported versions of dbt Core

The collector supports the following dbt Core versions:

dbt 1.0.0
dbt 1.0.5
dbt 1.1.1
dbt 1.3.0
dbt 1.5.0
dbt 1.6.0
dbt 1.7.0

In this section:

About the dbt Core collector

Important

Note

How does the dbt Core collector work?

What is cataloged

Important

Relationships between objects

Lineage for dbt Core

Important

Supported cross-system lineage

Important

Important

Authentication supported

Supported versions of dbt Core

Search results