Skip to main content

Power BI Gov and the Collector



The latest version of the Collector is 2.129. To view the release notes for this version and all previous versions, please go here.

The Collector harvests metadata from your source system. Please read over the Collector FAQ to familiarize yourself with the Collector.


Version of data source

Setting up access for cataloging Power BI resources

Authentication types supported

There are two separate ways to authenticate to Power BI:

  • Service principal

  • User and password

This section will walk you through the process for both authentication types. All these tasks are performed on the Azure Portal.

STEP 1: Registering your application

To register a new application:

  1. Select Azure Active Directory.

  2. Click the App Registrations option in the left sidebar.

  3. Click New Registration and enter the following information:

    1. Application Name: DataDotWorldPowerBIApplication

    2. Supported account types: Accounts in this organizational directory only

  4. Click Register to complete the registration.

STEP 2: Creating Client secret and getting the Client ID

To create a Client Secret:

  1. On the application page, select Certificates and Secrets.

  2. Click on Secret and add a description.

  3. Set the expiration to Never.

  4. Click on Create, and copy the secret value.

To get the Client ID from the Azure portal:

  1. Click on the Overview tab in the left sidebar of the application home page.

  2. Copy the Client ID from the Essentials section.

STEP 3: Setup metadata scanning

Enable access to the detailed data source information (like tables and columns) provided by Power BI through the read-only admin APIs. For details about doing this task, please see this documentation.

STEP 4: Setting up REST API for service principals


Perform this task only if you are using the service principal for authentication.

If you are using service principal as your authentication type, ensure that you enable service principals to use the Power BI APIs. For detailed instructions for doing this task, please see this documentation.

STEP 5: Setting up permissions for username & password authentication


Perform this task only if you are using user and password for authentication.

To add permissions:

  1. Click on API Permissions, and select Add Permission.

  2. Search for the Power BI service, and click on Delegated permissions. Select the following permissions:

    • App.Read.All

    • Capacity.Read.All

    • Dashboard.Read.All

    • Dataflow.Read.All

    • Dataset.Read.All

    • Gateway.Read.All

    • Report.Read.All

    • StorageAccount.Read.All

    • Tenant.Read.All

    • Workspace.Read.All

  3. Click on the Grant Admin consent button, which is located next to the Add permission button. This allows the collector to run as a daemon without having to ask the user permission on every crawler run.


Only administrators of the tenant can grant admin consent.

What is cataloged

Lineage Power BI

The Power BI collector identifies the datasets that reports and dashboard tiles source their data from.

Ways to run the Collector

There are a few different ways to run the Collector--any of which can be combined with an automation strategy to keep your catalog up to date:

  • Create a configuration file (config.yml) - This option stores all the information needed to catalog your data sources. It is an especially valuable option if you have multiple data sources to catalog as you don't need to run multiple scripts or CLI commands separately.

  • Run the collector though a CLI - Repeat runs of the collector requires you to re-enter the command for each run.


This section walks you through the process of running the collector using CLI.

CLI instructions


Do not forget to replace x.y in datadotworld/dwcc:x.y with the version of the Collector you want to use (e.g., datadotworld/dwcc:2.113).

Basic parameters

Each collector has parameters that are required, parameters that are recommended, and parameters that are completely optional. Required parameters must be present for the command to run. Recommended parameters are either:

  • parameters that exist in pairs, and one or the other must be present for the command to run (e.g., --agent and --base)

  • parameters that we recommend to improve your experience running the command in some way

Together, the required and recommended parameters make up the Basic parameters for each collector. The Basic parameters for this collector are:

Docker and the Collector

Detailed information about the Docker portion of the command can be found here. When you run the command, run will attempt to find the image locally, and if it doesn't find it, it will go to Dockerhub and download it automatically:


Collector runtime and troubleshooting

The catalog collector may run in several seconds to many minutes depending on the size and complexity of the system being crawled. If the catalog collector runs without issues, you should see no output on the terminal, but a new file that matching *.dwec.ttl should be in the directory you specified for the output. If there was an issue connecting or running the catalog collector, there will be either a stack trace or a *.log file. Both of those can be sent to support to investigate if the errors are not clear. A list of common issues and problems encountered when running the collectors is available here.

Upload the .ttl file generated from running the Collector

When the Collector runs successfully, it creates a .ttl file in the directory you specified as the dwcc-output directory. The automatically-generated file name is databaseName.catalogName.dwec.ttl. You can rename the file or leave the default, and then upload it to your ddw-catalogs dataset (or wherever you store your catalogs).


If there is already a .ttl catalog file with the same name in your ddw-catalogs dataset, when you add the new one it will overwrite the existing one.

Automating updates to your metadata catalog

Keep your metadata catalog up to date using cron, your Docker container, or your automation tool of choice to run the catalog collector on a regular basis. Considerations for how often to schedule include:

  • Frequency of changes to the schema

  • Business criticality of up-to-date data

For organizations with schemas that change often and where surfacing the latest data is business critical, daily may be appropriate. For those with schemas that do not change often and which are less critical, weekly or even monthly may make sense. Consult your representative for more tailored recommendations on how best to optimize your catalog collector processes.