About the dbt cloud collector
The dbt cloud collector connects to your dbt cloud project, harvest dbt assets, and column-level lineage relationships from database views associated with dbt assets.
Important
The dbt cloud collector can be run in the Cloud or on-premise using Docker or Jar files.
Note
The latest version of the Collector is 2.239. To view the release notes for this version and all previous versions, please go here.
How does the dbt cloud collector work?
dbt Cloud is an extract, transform, and load (ETL) platform that allows users to specify jobs that move and transform data between relational database tables and views in a target database. It makes rich metadata available about the processes that perform this movement and transformation of data.
The collector connects to dbt Cloud and harvests metadata from dbt Cloud artifacts such as models and sources.
To fully represent lineage relationships between the tables, for any of the supported target databases, the collector parses the definition of any database views created by dbt Cloud, connecting to the target database to obtain database metadata where necessary.
How does the collector know which dbt Cloud run to harvest from?
To make sure that the correct metadata is being harvested, the dbt Cloud collector needs to know the Project, Job, and Environment combination.
In dbt Cloud, a job is typically executed on a schedule. The dbt Cloud collector uses the following logic to determine the job run from which it will obtain the metadata artifacts to process:
The collector examines only dbt Cloud runs associated with the dbt Cloud project specified by the user.
Any unsuccessful runs and any runs that did not produce metadata artifacts are discarded.
If the user specifies a job (by identifier or name) or an environment (by identifier or name), then only runs from that job and/or environment are examined.
If multiple successful runs are found, then the collector harvests the metadata from the most recent run.
Running target database collector in addition to dbt Cloud Collector
The dbt Cloud collector only captures the lineage relationships but does not harvest any information about the database tables/views and columns involved in lineage relationships. In order to harvest information such as the table/view or column name, descriptions, data types, and table/view-column associations, you need to run the appropriate database collector (for example, the Snowflake collector) on the target database.
What is cataloged
The information cataloged by the collector includes metadata for the following dbt resources:
Note
Starting with dbt 1.3, the raw_sql and compiled_sql properties in dbt artifacts are now renamed to raw_code and compiled_code. Depending on the version of dbt that you are using, you may see either fields for Raw SQL/Compiled SQL or Compiled Code/Raw code.
Object | Information cataloged |
---|---|
Analysis | Name, Description, Path, Root path, Package name, Unique ID, Alias, Meta, Raw SQL, Compiled SQL/Compiled Code, Enabled, Materialized, Resource type |
Model | Name, Description, Path, Root path, Package name, Unique ID, Alias, Meta, Raw SQL, Compiled SQL/Compiled Code, Enabled, Materialized, Resource type |
Model column | Column name |
Project | Name, Project version |
Snapshot | Name, Description, Path, Root path, Package name, Unique ID, Alias, Meta, Raw SQL/Raw Code, Compiled SQL/Compiled Code, Enabled, Materialized, Resource type |
Seed | Name, Description, Path, Root path, Package name, Unique ID, Alias, Meta, Raw SQL/Raw Code, Compiled SQL/Compiled Code, Enabled, Materialized, Resource type |
Source | Name, Description, Path, Root path, Package name, Unique ID, Alias, Meta, Raw SQL/Raw Code, Compiled SQL/Compiled Code, Enabled, Source name, Resource type |
Test | Name, Description, Path, Root path, Package name, Unique ID, Alias, Meta, Raw SQL/Raw Code, Compiled SQL/Compiled Code, Enabled, Materialized, Resource type |
Test result | Time the test was executed, Status, Count of failures (if any), Message emitted by the test (if any) |
Note: The collector also harvests the following information about the dbt Cloud resources. This information is available in the collector output and not presented in the UI. You can use this information for querying and Eureka Automations.
The run that produced the artifacts
The job that configured the run
The environment for the job (that specifies the target database)
The dbt Cloud project in which the job is defined
The dbt Cloud account
Relationship between objects
By default, the data.world catalog will include catalog pages for the resource types below. Each catalog page will have a relationship to other related resource types. Note that the catalog presentation and relationships are fully configurable, so these will list the default configuration.
Resource page | Relationship |
---|---|
Model |
|
Model column |
|
Project | dbt resources (Test, Seed, Model, Snapshot, Source) contained within project |
Snapshot |
|
Seed |
|
Source |
|
Test |
|
Test result |
|
Lineage for dbt
Object | Lineage available |
---|---|
dbt model materialized as view |
|
dbt resource |
|
Supported cross-system lineage
Important
For Eureka Explorer, these harvested lineage relationships display from the page of the upstream or downstream resource from dbt. For example, you can see and access Eureka Explorer from a downstream Snowflake table resource page to see what upstream Snowflake table was transformed as a result of a view associated with a dbt model. The dbt resource will also appear in Eureka Explorer.
The currently supported data sources for cross-system lineage are:
Snowflake
Important
While other data sources are not formally supported, running the collector for those sources may still enable you to view cross-system lineage between dbt Cloud and these sources.
Supported versions of dbt
The collector supports the following dbt versions:
dbt 1.0.0
dbt 1.0.5
dbt 1.1.1
dbt 1.3.0
dbt 1.4.0
dbt 1.5.0
dbt 1.6.0
dbt 1.7.0
Authentication supported
The collector supports authenticating to dbt cloud using API key.
When the collector authenticates to Snowflake as the target database, the collector supports either:
Username and key pair authentication.
Username and password authentication
Warning
dbt is deprecating user tokens on September 18. If you are using legacy user tokens, you will need to generate either a service token or an account-scoped access token to use with the collector.
In your collector configuration, update the dbt cloud API key (--dbt-cloud-api-key) parameter with the new token.