About the Azure Data Factory Collector

Important

The collector can be run in the cloud or on-premise using docker or JAR files.

Azure Data Factory (ADF) empowers users to collect, transform, and relocate data. Use this collector to harvest metadata from ADF, encompassing details on pipelines, datasets, dataflows, linked services, triggers, integration runtimes, and global parameters. Additionally, it gathers lineage information between ADF datasets and between ADF and external sources such as Snowflake.

Note

The latest version of the Collector is 2.290. To view the release notes for this version and all previous versions, please go here.

What is cataloged

The collector catalogs the following information from Azure Data Factory.

Table 1.

Object	Information collected
Factory	ID, Name, ETag, Location, Create Time, Provisioning State, Version, Public Network Access, Factory Tags, Repository configuration (Account name, Collaboration Branch, Repository Name, Disable Publish, Root Folder, Host Name, Client ID, Project Name, Last Commit ID, Tenant ID, Repo Configuration Type).
Pipeline	ID, Name, Description, Etag, Concurrency, Folder, Parameters, Metric Policy Duration, Variables
Pipeline Activity	Name, Description, Type, Inactivity Status, State, User Properties, Activity Policy (Retry, Timeout, Retry Interval In Secs, Secure Input, Secure Output)
Linked Service	ID, Name, Description, Type, Etag, Connection String, Domain, Parameters Note: Harvesting of Connection String for SFTP Linked Services is not supported.
Dataset	ID, Name, Etag, Type, Database, Schema, Table, Folder, Container, File Name, Parameters
Dataflow	ID, Name, Etag, Type, Description, Folder
Trigger	ID, Name, Etag, Type, State, Description, Frequency, Interval, Start time, End time
Integration Runtime	ID, Etag, Name, Type, Description, State Compute Properties (Node Size, Number of Nodes, Max Parallel Execution Per Node, Core Count, Compute Type, Clean up, Number of External Nodes, Number of Pipeline Nodes), SSIS properties ( Catalog Server Endpoint, Catalog Admin Username, Catalog Pricing Tier, License Type, Dual Standby PairName, Edition)
Global Parameter	ID, Name, Value, Type
ADF Table	ID, Name
ADF Column	ID, Name, Type, Precision, Scale

Relationships between objects

By default, the data.world catalog will include catalog pages for the resource types below. Each catalog page will have a relationship to other related resource types. Note that the catalog presentation and relationships are fully configurable, so these will list the default configuration.

Table 2.

Resource page	Relationship
Factory	Contains Global Parameter, Contains Pipeline, Contains Dataset, Contains Dataflow, Contains Trigger, Contains Integration Runtime
Pipeline	Has Tag (also known as Annotation), Contains Activity
Activity	Belongs to Pipeline, Contains Activity, Depends on Activity, uses Linked Service, uses Integration Runtime, uses Dataset
Linked Service	Uses Integration Runtime, Has Tag (also known as Annotation), Connects to database
Dataset	Uses Linked Service, Has Tabular Datasource, Has Tag (also known as Annotation)
Dataflow	Uses Dataflow, Imports Data From Linked Service, Exports Data From Linked Service, Imports Data From Dataset, Exports Data From Dataset, has Tag (also known as Annotation)
Integration Runtime	Uses Integration Runtime, uses Linked Service
Trigger	Triggers Pipeline, Has Tag (also known as Annotation)

Lineage for Azure Data Factory

The following lineage information is collected by the Azure Data Factory collector.

Table 3.

Object	Lineage available
Dataset	The collector identifies the source or sink of the dataset: when the source/sink is Snowflake, Databricks, PostgreSQL, MySQL, Oracle, Teradata, DB2, and SQLServer. when there is a Copy Activity Run copying data between two datasets.
ADF table	The collector identifies the associated table in an upstream table where the data is sourced from/sinked to.
ADF column	The collector identifies the associated table in an upstream column where the data is sourced from/sinked to.

Supported cross-system lineage

The currently supported data sources for cross-system lineage are:

Snowflake
Databricks
Important
While other data sources are not formally supported, running the collector for those sources may still enable you to view cross-system lineage between Azure Data Factory and these sources.

Authentication supported

Authenticate to Azure Data Factory using Service principal.

In this section:

About the Azure Data Factory Collector

Important

Note

What is cataloged

Relationships between objects

Lineage for Azure Data Factory

Important

Supported cross-system lineage

Important

Authentication supported

Search results