About the dbt Core collector
The dbt collector processes artifacts from your dbt Core project to harvest dbt assets and lineage relationships from dbt transformations.
Important
The dbt Core collector can be run on-premise using Docker or Jar files.
Note
The latest version of the Collector is 2.235. To view the release notes for this version and all previous versions, please go here.
How does the dbt Core collector work?
The collector harvests metadata from dbt generated files.
The dbt Core collector will also identify how dbt moves data between tables (i.e., lineage). To accomplish this, the dbt collector needs to parse View SQL. Without specifying the target database information, no lineage relationships between columns specified through views can be harvested. The connection information is passed in via dbt’s profiles.yml file or can be supplied with the data.world YAML file or CLI command.
Important
If the dbt profiles.yml file is not provided, no lineage relationships between columns and views will be available and no database resources will be harvested.
Note that the collector however does not harvest everything that the target database collector would harvest. For example, Snowflake can harvest profiling, tags, and policies that the dbt Core collector will not harvest. It is recommended to run both the dbt Core collector and the target database collector to build a comprehensive data catalog.
What is cataloged
The information cataloged by the collector includes metadata for the following dbt Core resources:
Important
Starting with dbt 1.3, the raw_sql and compiled_sql properties in dbt artifacts are now renamed to raw_code and compiled_code. Depending on the version of dbt that you are using, you may see either fields for Raw SQL/Compiled SQL or Compiled Code/Raw code.
Object | Information cataloged |
---|---|
Analysis | Name, Description, Path, Root path, Package name, Unique ID, Alias, Meta, Raw SQL/Raw Code, Compiled SQL/Compiled Code, Enabled, Materialized, Resource type |
Model | Name, Description, Path, Root path, Package name,Unique ID, Alias, Meta, Raw SQL/Raw Code, Compiled SQL/Compiled Code, Enabled, Materialized, Resource type, Model Columns |
Model column | Column name |
Project | Name, Project version |
Snapshot | Name, Description, Path, Root path, Package name, Unique ID, Alias, Meta, Raw SQL/Raw Code, Compiled SQL/Compiled Code, Enabled, Materialized, Resource type |
Seed | Name, Description, Path, Root path, Package name, Unique ID, Alias, Meta, Raw SQL/Raw Code, Compiled SQL/Compiled Code, Enabled, Materialized, Resource type |
Source | Name, Description, Path, Root path, Package name, Unique ID, Alias, Meta, Raw SQL/Raw Code, Compiled SQL/Compiled Code, Enabled, Source name, Resource type, Columns |
Test | Name, Description, Path, Root path, Package name, Unique ID, Alias, Meta, Raw SQL/Raw Code, Compiled SQL/Compiled Code, Enabled, Materialized, Resource type |
Test result | Time the test was executed, Status, Count of failures (if any), Message emitted by the test (if any) |
Relationships between objects
By default, the harvested metadata includes catalog pages for the following resource types. Each catalog page has a relationship to the other related resource types. If the metadata presentation for this data source has been customized with the help of the data.world Solutions team, you may see other resource pages and relationships.
Resource page | Relationship |
---|---|
Model |
|
Model column |
|
Project |
|
Snapshot |
|
Seed |
|
Source |
|
Test |
|
Test result |
|
Lineage for dbt Core
Object | Lineage available |
---|---|
dbt model materialized as view | Referenced database tables and columns in dbt model materialized as view. |
dbt resource | dbt resources that are upstream and downstream (for example, seeds that are upstream of models, and tests that are downstream of models) of dbt resource. |
Supported cross-system lineage
The currently supported data sources for cross-system lineage are:
Important
While other data sources are not formally supported, running the collector for those sources may still enable you to view cross-system lineage between dbt Core and these sources.
BigQuery
PostgreSQL
Redshift
Snowflake
Azure Synapse (the only supported dbt/Synapse adapter is dbt-synapse adapter)
Microsoft SQL server (the only supported addapter is dbt-sqlserver)
Important
For Eureka Explorer, these harvested lineage relationships display from the page of the upstream or downstream resource from dbt. For example, you can see and access Eureka Explorer from a downstream Snowflake table resource page to see what upstream Snowflake table was transformed as a result of a view associated with a dbt model. The dbt resource will also appear in Eureka Explorer.
Supported versions of dbt Core
The collector supports the following dbt Core versions:
dbt 1.0.0
dbt 1.0.5
dbt 1.1.1
dbt 1.3.0
dbt 1.5.0
dbt 1.6.0
dbt 1.7.0
Authentication supported
The collector supports the following authentication methods to the target databases:
Username and password authentication
When authenticating to Snowflake, the collector also supports:
Username and key pair authentication.