Skip to main content

Preparing to run the dbt cloud collector

Setting up pre-requisites for running the collector

Make sure that the machine from where you are running the collector meets the following hardware and software requirements.

Table 1.

Item

Requirement

Hardware (for on-premise runs only)

Note: The following specs are based upon running one collector process at a time. Please adjust the hardware if you are running multiple collectors at the same time.

RAM

8 GB

CPU

2 Ghz processor

Software (for on-premise runs only)

Docker

Click here to get Docker.

data.world specific objects (for both cloud and on-premise runs)

Dataset

You must have a ddw-catalogs dataset set up to hold your catalog files when you are done running the collector.

If you are using Catalog Toolkit , follow these instructions to prepare the datasets for collectors.

Network connection

Allowlist IPs and domains

Follow these instructions to configure your network.



Preparing dbt cloud for collectors

Obtaining account ID, project ID, Job ID, and job run ID

This section talks about generating the account ID, project ID, job ID, and job run ID. You will use this information while configuring the collector for harvesting metadata.

The dbt cloud collector assumes that your dbt Cloud instance has an environment and job set up with at least one successful run.

To obtain this information:

  1. Under the Deploy menu at the top navigation, go to Jobs.

  2. Copy the Account ID and the Project ID from the URL:

    https://cloud.getdbt.com/deploy/<accountID>/projects<projectID>/jobs

  3. Under the Deploy menu, go to Environment, and select the environment that you want to run the collector against.

  4. You can identify the Environment Name from the title on this page, or the Environment ID from the URL:

    https://cloud.getdbt.com/deploy/<accountID>/projects<projectID>/environments/<environmentID>

  5. From the Jobs section, select the Job associated with your Environment

  6. You can identify the Job name from the title on this page, or the Job ID from the following URL:

    https://cloud.getdbt.com/deploy/<accountID>/projects<projectID>/jobs/<jobID>

  7. If you want the collector to harvest from a specific dbt Cloud job run, and not the most recent successful job run, then from the Job page, click on a specific job run. Copy the Job Run ID from the URL:

    https://cloud.getdbt.com/deploy/<accountID>/projects<projectID>/runs/<jobRunID>.

Obtaining dbt Cloud API token

  1. From the top right menu dropdown, select Profile Settings.

  2. Navigate to the API section. Click copy to the right of the API Key.

  3. Create either service token or a account-scoped access token to use with the collector.

    You will use this API key for dbt cloud API key (--dbt-cloud-api-key) when setting up the collector to authenticate to dbt cloud.

Updating job execution settings

Important

You must enable this setting to successfully harvest the dbt resources.

  1. Under the Deploy menu at the top navigation, go to Jobs.

  2. From the Environment dropdown, select the environment that you want to run the collector against.

  3. Select the Job associated with your Environment and click Settings.

  4. Under Execution Settings, ensure that Generate docs on run is selected.