Preparing to run the dbt cloud collector
Setting up pre-requisites for running the collector
Make sure that the machine from where you are running the collector meets the following hardware and software requirements.
Item | Requirement |
---|---|
Hardware (for on-premise runs only) Note: The following specs are based upon running one collector process at a time. Please adjust the hardware if you are running multiple collectors at the same time. | |
RAM | 8 GB |
CPU | 2 Ghz processor |
Software (for on-premise runs only) | |
Docker | Click here to get Docker. |
data.world specific objects (for both cloud and on-premise runs) | |
Dataset | You must have a ddw-catalogs dataset set up to hold your catalog files when you are done running the collector. If you are using Catalog Toolkit , follow these instructions to prepare the datasets for collectors. |
Network connection | |
Allowlist IPs and domains |
Preparing dbt cloud for collectors
Obtaining account ID, project ID, Job ID, and job run ID
This section talks about generating the account ID, project ID, job ID, and job run ID. You will use this information while configuring the collector for harvesting metadata.
The dbt cloud collector assumes that your dbt Cloud instance has an environment and job set up with at least one successful run.
To obtain this information:
Under the Deploy menu at the top navigation, go to Jobs.
Copy the Account ID and the Project ID from the URL:
https://cloud.getdbt.com/deploy/<accountID>/projects<projectID>/jobs
Under the Deploy menu, go to Environment, and select the environment that you want to run the collector against.
You can identify the Environment Name from the title on this page, or the Environment ID from the URL:
https://cloud.getdbt.com/deploy/<accountID>/projects<projectID>/environments/<environmentID>
From the Jobs section, select the Job associated with your Environment
You can identify the Job name from the title on this page, or the Job ID from the following URL:
https://cloud.getdbt.com/deploy/<accountID>/projects<projectID>/jobs/<jobID>
If you want the collector to harvest from a specific dbt Cloud job run, and not the most recent successful job run, then from the Job page, click on a specific job run. Copy the Job Run ID from the URL:
https://cloud.getdbt.com/deploy/<accountID>/projects<projectID>/runs/<jobRunID>.
Obtaining dbt Cloud API token
The steps for obtaining a dbt Cloud API token differ depending on whether you are on an Enterprise plan or Team plan.
If you are on the Enterprise Plan:
Note
For Enterprise accounts, we recommend using a Service Account Token.
Follow the dbt Cloud docs to create a service account token.
When creating the token, attach the following roles:
Account Viewer: Assigned at the account level.
Job Viewer: Assigned at the project level for each project to collect from.
Once created, copy the token value.
You will use this API key for dbt cloud API key (--dbt-cloud-api-key) when setting up the collector to authenticate to dbt cloud.
If you are on the Team Plan:
Note
The team plan does not support Service account tokens. You must create a personal access token using a dedicated user account.
Create a dedicated user account for the collector.
Assign it the Admin role.
Log in as that user and go to Profile → Access Tokens.
Generate a Personal Access Token and copy the key.
You will use this API key for dbt cloud API key (--dbt-cloud-api-key) when setting up the collector to authenticate to dbt cloud.
Updating job execution settings
Important
You must enable this setting to successfully harvest the dbt resources.
Under the Deploy menu at the top navigation, go to Jobs.
From the Environment dropdown, select the environment that you want to run the collector against.
Select the Job associated with your Environment and click Settings.
Under Execution Settings, ensure that Generate docs on run is selected.