Preparing to run the dbt cloud collector
Setting up pre-requisites for running the collector
Make sure that the machine from where you are running the collector meets the following hardware and software requirements.
Item | Requirement |
---|---|
Hardware (for on-premise runs only) Note: The following specs are based upon running one collector process at a time. Please adjust the hardware if you are running multiple collectors at the same time. | |
RAM | 8 GB |
CPU | 2 Ghz processor |
Software (for on-premise runs only) | |
Docker | Click here to get Docker. |
data.world specific objects (for both cloud and on-premise runs) | |
Dataset | You must have a ddw-catalogs dataset set up to hold your catalog files when you are done running the collector. If you are using Catalog Toolkit , follow these instructions to prepare the datasets for collectors. |
Preparing dbt cloud for collectors
Obtaining account ID, project ID, Job ID, and job run ID
This section talks about generating the account ID, project ID, job ID, and job run ID. You will use this information in the collector command/YAML file.
The dbt cloud collector assumes that your dbt Cloud instance has an environment and job set up with at least one successful run.
To obtain this information:
Under the Deploy menu at the top navigation, go to Jobs.
Copy the Account ID and the Project ID from the URL:
https://cloud.getdbt.com/deploy/<accountID>/projects<projectID>/jobs
Under the Deploy menu, go to Environment, and select the environment that you want to run the collector against.
You can identify the Environment Name from the title on this page, or the Environment ID from the URL:
https://cloud.getdbt.com/deploy/<accountID>/projects<projectID>/environments/<environmentID>
From the Jobs section, select the Job associated with your Environment
You can identify the Job name from the title on this page, or the Job ID from the following URL:
https://cloud.getdbt.com/deploy/<accountID>/projects<projectID>/jobs/<jobID>
If you want the collector to harvest from a specific dbt Cloud job run, and not the most recent successful job run, then from the Job page, click on a specific job run. Copy the Job Run ID from the URL:
https://cloud.getdbt.com/deploy/<accountID>/projects<projectID>/runs/<jobRunID>.
Obtaining dbt Cloud API token
From the top right menu dropdown, select Profile Settings.
Navigate to the API section. Click copy to the right of the API Key.
Create either service token or a account-scoped access token to use with the collector.
You will use this API key for dbt cloud API key (--dbt-cloud-api-key) when setting up the collector to authenticate to dbt cloud.
Updating job execution settings
Important
You must enable this setting to successfully harvest the dbt resources.
Under the Deploy menu at the top navigation, go to Jobs.
From the Environment dropdown, select the environment that you want to run the collector against.
Select the Job associated with your Environment and click Settings.
Under Execution Settings, ensure that Generate docs on run is selected.