Preparing to run the Microsoft Fabric collector
Setting up pre-requisites for running the collector
Make sure that the machine from where you are running the collector meets the following hardware and software requirements.
Item | Requirement |
---|---|
Hardware (for on-premise runs only) Note: The following specs are based upon running one collector process at a time. Please adjust the hardware if you are running multiple collectors at the same time. | |
RAM | 8 GB |
CPU | 2 Ghz processor |
Software (for on-premise runs only) | |
Docker | Click here to get Docker. |
data.world specific objects (for both cloud and on-premise runs) | |
Dataset | You must have a ddw-catalogs dataset set up to hold your catalog files when you are done running the collector. If you are using Catalog Toolkit , follow these instructions to prepare the datasets for collectors. |
Network connection | |
Allowlist IPs and domains |
Setting up access for the Microsoft Fabric collector
Important things to note
A Fabric administrator is needed to enable settings in the Fabric Admin Portal.
Dataflows require the service principal to be added to the workspace with at least contributor access.
The collector can only harvest metadata for Fabric workspaces to which the service principal has been given access.
The collector uses a JDBC connection to catalog database resources from Warehouses and Lakehouses with service principal authentication. The collector also uses the following APIs to catalog detailed metadata about Fabric resources: Fabric APIs, OneLake APIs, and Power BI APIs
STEP 1: Registering your application
To register a new application:
Go to the Azure Portal. Click the App Registrations option in the Azure services.
Click New Registration and enter the following information:
Application Name: For example, DataDotWorldFabricApplication.
Supported account types: Accounts in this organizational directory only.
Click Register to complete the registration.
STEP 2: Creating Client secret and getting the Client ID and Tenant ID
To create a Client Secret:
Go to the Azure Portal.
On the application page, select Certificates and Secrets.
Click on Secret and add a description.
Select the desired expiration date.
Click on Create, and copy the secret value.
To get the Client ID and Tenant ID from the Azure portal:
Click the Overview tab in the left sidebar of the App registration.
Copy the Client ID from the Essentials section.
Copy the Tenant ID from the Essentials section.
STEP 3: Setting up authentication
To set up service principal authentication:
Sign into Fabric using a Fabric Admin account.
On the Settings page, browse to Admin Portal.
Under developer settings, search for Service principals can use Fabric APIs. Enable the setting and select if it applies to The entire organization or Specific security group and make sure to select a security group that includes the Service principal. Click Apply to save the changes. See Microsoft documentation for more details.
The Service Principal must be added to the workspaces:
Open the workspace, click on Manage access.
Search for the Service Principal or the Security Group the Service Principal belongs to. If dataflows are used, then at a minimum Contributor access is required, otherwise select Viewer.
Click Add.
STEP 4: Setting up metadata scanning
Set up metadata scanning to access the detailed data source information, such as tables and columns, provided by the Fabric read-only admin APIs. Before running metadata scanning on an organization's Fabric workspaces, a Fabric administrator must set it up. The collector uses the Fabric Scanner APIs to establish lineage to source tables and columns. Be sure to familiarize yourself with the limitations to the scanner APIs.
To set up metadata scanning:
Follow the Fabric documentation to enable service principal authentication for Fabric read-only APIs.
Next, follow the Fabric documentation to enable the following enhanced tenant settings for metadata scanning.
Enhance admin APIs responses with detailed metadata
Enhance admin APIs responses with DAX and mashup expressions
Important
When running under a service principal, no API permissions are required, and the registered App especially must have no admin-consent-required permissions set on your app in the Azure portal. For more information, see Enable service principal authentication for read-only admin APIs.
Configuring Fabric for Report Image Harvesting
You must perform these tasks to enable the harvesting of preview images from reports in Fabric.
Enable the Export reports as image files setting from the Admin settings.
Ensure that the reports to be exported are located in a workspace with Premium, Embedded, or Fabric capacity. For details, see the Fabric documentation.
If using non-Fabric database sources, see the Power BI collector documentation for instructions on setting up a data source YAML file for Lineage Mapping. This can be used with the --datasource-mapping-file option in the Microsoft Fabric collector.