Preparing to run the Azure Data Factory collector
Note
The latest version of the Collector is 2.247. To view the release notes for this version and all previous versions, please go here.
Setting up pre-requisites for running the collector
Make sure that the machine from where you are running the collector meets the following hardware and software requirements.
Item | Requirement |
---|---|
Hardware (for on-premise runs only) Note: The following specs are based upon running one collector process at a time. Please adjust the hardware if you are running multiple collectors at the same time. | |
RAM | 8 GB |
CPU | 2 Ghz processor |
Software (for on-premise runs only) | |
Docker | Click here to get Docker. |
data.world specific objects (for both cloud and on-premise runs) | |
Dataset | You must have a ddw-catalogs dataset set up to hold your catalog files when you are done running the collector. If you are using Catalog Toolkit , follow these instructions to prepare the datasets for collectors. |
Network connection | |
Allowlist IPs and domains |
Setting up access for cataloging Azure Data Factory resources
Authentication types supported
The Azure Data Data Factory collector authenticates using Azure Service Principal.
STEP 1: Registering your application
To register a new application:
Go to the Azure Portal.
Select Azure Active Directory.
Click the App Registrations option in the left sidebar.
Click New Registration and enter the following information:
Application Name: DataDotWorldADFApplication.
Supported account types: Accounts in this organizational directory only.
Click Register to complete the registration.
STEP 2: Creating Client secret and getting the Client ID
To create a Client Secret:
On the new application page you created, select Certificates and Secrets.
Under the Client secrets tab, click the New client secret button.
Add a Description.
Set the expiration for the client secret.
Click Add, and copy the secret value.
To get the Client ID from the Azure portal:
Click on the Overview tab in the left sidebar of the application home page.
Copy the Application (Client) ID from the Essentials section.
STEP 3: Obtaining Subscription ID and Tenant ID
From the page of new application you created from step 1, copy and save the Directory (tenant) ID. You will use this for the --tenant-id parameter.
Navigate to a storage account that you would like to harvest from. From the Overview page, copy the Subscription ID. You will use this for the --subscription-id parameter.
STEP 4: Grant Service Principal access to each Data Factory
The service principal does not require explicit permission for each data factory to access its metadata. If the data factory or factories a user wishes to catalog were created within a specific subscription, then it is sufficient to add the service principal to that subscription with the Reader role.
Go to to the Subscriptions page in the Azure portal.
Select the appropriate subscription and go to the Access Control (IAM) tab.
In the Add tab, select Add Role Assignment.
Under Job Function Roles, search for Reader and select it.
Go to the Members tab, click Select Members and search for the application DataDotWorldADFApplication and select it.
Click Review + assign.