Skip to main content

Preparing to run the dbt Core collector

Setting up pre-requisites for running the collector

Make sure that the machine from where you are running the collector meets the following hardware and software requirements.

Table 1.

Item

Requirement

Hardware

RAM

8 GB

CPU

2 Ghz processor

Software

Docker

Click here to get Docker.

Java Runtime Environment

OpenJDK 17 is supported and available here.

data.world specific objects

Dataset

You must have a ddw-catalogs (or other) dataset set up to hold your catalog files when you are done running the collector.



Preparing dbt Core for collectors

Harvesting metadata from dbt Core artifacts themselves requires that the artifact files be in a filesystem directory for which the user running the collector has at least read access. In order to harvest intra-database (column-level) lineage for dbt models materialized as views, the collector must be provided with a credential to dbt’s target database that has SELECT privileges on those views and tables referenced by those views. This database credential can be supplied via CLI options or obtained from the profiles.yml file.

Generating dbt Core metadata artifacts to pass to the collector

  • profiles.yml - It is located in the ~/.dbt directory by default. For more information see the dbt connection profiles documentation.

  • dbt_project.yml - Is found at the top level of the dbt project.

  • catalog.json, manifest.json and run_results.json - These files can be generated by running the command dbt docs generate. More information is available here and here.

Important

The files catalog.json, manifest.json , and profiles.yml must be in the same directory on the host machine. For example, /artifact_directory.