Preparing to run the dbt Core collector
Setting up pre-requisites for running the collector
Make sure that the machine from where you are running the collector meets the following hardware and software requirements.
Item | Requirement |
---|---|
Hardware | |
RAM | 8 GB |
CPU | 2 Ghz processor |
Software | |
Docker | Click here to get Docker. |
Java Runtime Environment | OpenJDK 17 is supported and available here. |
data.world specific objects | |
Dataset | You must have a ddw-catalogs (or other) dataset set up to hold your catalog files when you are done running the collector. |
Preparing dbt Core for collectors
Harvesting metadata from dbt Core artifacts themselves requires that the artifact files be in a filesystem directory for which the user running the collector has at least read access. In order to harvest intra-database (column-level) lineage for dbt models materialized as views, the collector must be provided with a credential to dbt’s target database that has SELECT
privileges on those views and tables referenced by those views. This database credential can be supplied via CLI options or obtained from the profiles.yml
file.
Generating dbt Core metadata artifacts to pass to the collector
profiles.yml - It is located in the
~/.dbt
directory by default. For more information see the dbt connection profiles documentation.dbt_project.yml - Is found at the top level of the dbt project.
catalog.json, manifest.json and run_results.json - These files can be generated by running the command dbt docs generate. More information is available here and here.
Important
The files catalog.json, manifest.json , and profiles.yml must be in the same directory on the host machine. For example, /artifact_directory.