Skip to main content

About the Microsoft Fabric collector

Warning

This collector is in public preview. It has passed our standard testing, but it is not yet widely adopted. You might encounter unforeseen edge cases in your environment. data.world is committed to promptly addressing any issues with public preview collectors. If you face any problems, please report them through your Customer Success Director, implementation team, or support team for assistance.

Use this collector to harvest metadata from Microsoft Fabric workspaces and their child resources, including items commonly found in Power BI.

Important

The Microsoft Fabric collector can be run on-premise using Docker or JAR files.

Note

The latest version of the Collector is 2.270. To view the release notes for this version and all previous versions, please go here.

What is cataloged

The collector catalogs the following information.

Table 1.

Object

Information cataloged

Workspaces

ID, Name, Description

Warehouses

ID, Name, Description, Created Date, Last Modified Date, Created By, Modified By, Collation Type, Connection String, JDBC URL

Lakehouses

ID, Name, Description, Created Date, Last Modified Date, Created By, Modified By, OneLake Tables Path, OneLake Files Path, Connection String, JDBC Url

Fabric Data Pipelines

ID, Name, Description, Created Date, Last Modified Date, Created By, Modified By

Eventhouses

ID, Name, Description, Created Date, Last Modified Date, Created By, Modified By

Dataflows

ID, Name, Description, Last Modified Date

Mirrored Databases

ID, Name, Description

Notebooks

ID, Name, Description, Created Date, Last Modified Date, Created By, Modified By

Spark Job Definitions

ID, Name, Description, Created Date, Last Modified Date, Created By, Modified By

SQL Analytics Endpoints

ID, Name, Description, Last Modified Date, Created By, Modified By, Provisioning Status, Connection String

Lakehouse Folders

ID, Name, Description, Created Date, Last Modified Date, ABFS File Path

Lakehouse Files

ID, Name, Description, Created Date, Last Modified Date, ABFS File Path

Schemas

Name

Extended Metadata: Created Date, Modified Date

Fabric Database Tables

Name, Description, Primary Key, ABFS File Path

Extended metadata: Created date, Modified date

Database Columns

Name, Description, JDBC Type, Column Type, Is Nullable, Default Value, Key Type (Primary, Foreign), Column Size, Column Index, Decimal Digits

Views

Name, Description, SQL Definition

Reports

ID, Name, Description, Type, Preview Image (not supported for paginated report types), Created Date, Last Modified Date, Created by, Modified by, External URL, Embed URL

Report Pages

Name

Dashboards

ID, Name, External URL, Embed URL

Dashboard Tiles

Name, Embed URL

Semantic Models

ID, Title, Description, Created Date, Created By, External URL

Data Sources

Name, Type, Connection Details

Fabric Logical Tables

Name, Description, Is Hidden, Is Entered Data, Expression

Fabric Calculated Tables

Name, Description, Is Hidden, Is Entered Data, Expression

Fabric Logical Columns

Name, Description, Data Type, Column Type, Is Hidden, Expression

Measures

Name, Description, Is Hidden, Expression



Profiling and sampling specific information

The Microsoft Fabric collector supports the profiling and sampling specific parameters which the SQL Server collector supports and these apply to Warehouses and Lakehouses. If you include the profiling and sampling specific parameters while running the collector, the following additional information is harvested for Columns and Tables.

Table 2.

Object

Information cataloged

Column

  • Average Length (sample)

  • Average Value (sample)

  • Data Distribution

  • Distinct Values

  • Estimated Distinct Values

  • Estimated Non-null Values

  • Maximum Length (sample)

  • Maximum Value (sample) sorted numerically or alphabetically (z-a) 

  • Minimum Length (sample)

  • Minimum Value (sample) sorted numerically or alphabetically (a-z) 

  • Non-null Values (sample)

  • Sample String Values (first 5 items in a column)

Table

  • Row Count

  • Sample Count (Target sample size)



Relationships between objects

By default, the harvested metadata includes catalog pages for the following resource types. Each catalog page has a relationship to the other related resource types. If the metadata presentation for this data source has been customized with the help of the data.world Solutions team, you may see other resource pages and relationships.

Table 3.

Resource page

Relationship

Workspace

  • Warehouses

  • Lakehouses

  • Fabric Data Pipelines

  • Notebooks

  • Dataflows

  • Semantic Models

  • SQL Analytics Endpoints

  • Eventhouses

  • Spark Job Definitions

  • Reports

  • Dashboards

Warehouses

  • Workspace

  • Schemas

Lakehouses

  • Workspace

  • Schemas

  • SQL Analytics Endpoint

  • Lakehouse Folders

Fabric Data Pipelines

  • Workspace

Eventhouses

  • Workspace

Dataflows

  • Workspace

  • Logical Tables

  • Data Sources

Notebooks

  • Workspace

Spark Job Definitions

  • Workspace

SQL Analytics Endpoints

  • Workspace

Lakehouse Folders

  • Lakehouse

  • Child Folders

  • Lakehouse Files

Lakehouse Files

  • Lakehouse Folder

Schemas

  • Lakehouse/Warehouse/Database

  • Tables

  • Views

Fabric Database Tables

  • Schema

  • Columns

Database Columns

  • Table or View

Views

  • Schema

  • Columns

Reports

  • Workspace

  • Dashboard Tile

  • Report Pages (not applicable for paginated report types)

  • Report

Report Pages

  • Report

Dashboards

  • Workspace

  • Dashboard Tiles

Dashboard Tiles

  • Dashboard

Semantic Models

  • Workspace

  • Logical Tables

  • Data Sources

Data Sources

  • Semantic Models

  • Dataflows



Lineage and dependencies for Microsoft Fabric

Lineage

The following lineage information is collected by the Microsoft Fabric collector.

Table 4.

Object

Lineage available

Database View

The collector identifies the associated column in an upstream view or table:

  • Where the data is sourced from

  • That sort the rows via ORDER BY

  • That filter the rows via WHERE/HAVING

  • That aggregate the rows via GROUP BY

Note: For Views, the collector first tries to parse the view SQL to harvest lineage metadata. If the SQL parser of the collector cannot parse the view SQL, the collector will catalog some lineage relationships using the dm_sql_referencing_entities system function, when available. For each row in the referenced entities, if is_selected or is_select_all is true, the collector will catalog a relationship between the referencing entity and the database column.

Semantic Model

Dataflows and Semantic Models this Semantic Model uses data from.

Dataflow

Other Dataflows this Dataflow used data from.

Logical Table

Associated tables that the table sources its data from

Note: The collector uses expressions returned by the Metadata Scan APIs to parse the lineage to the source columns/tables.

Calculated Table

Logical tables and columns from which the calculated table calculates its values.

Logical Column

Associated Logical and Database columns that the column sources its data from or calculates its values from.

Measure

Associated Logical Columns that the measure sources it data from.

Dashboard Tile

Associated Semantic Model

Report

Associated Semantic Model, Report which this Report was published from.

Dataflow

Other Dataflows this Dataflow uses data from.



Supported cross-system lineage

The currently supported data sources for cross-system lineage:

  • Fabric Lakehouse (currently limited to SQL endpoint connections)

  • Fabric Warehouse (currently limited to SQL endpoint connections)

  • Oracle

  • Denodo

  • Snowflake

  • SQL Server

  • PostgreSQL

  • Redshift

  • Databricks

  • CSV documents

    Important

    While other data sources are not formally supported, running the collector for those sources may still enable you to view cross-system lineage between Microsoft Fabric and these sources.

Supported Power Query (M) Functions and Expressions for Lineage Metadata`

Note

Any table operations or transformations not listed in the following table as supported or unsupported are ignored.

This section captures supported transformations, source expressions, calculated columns, and measure expressions when harvesting lineage metadata.

Table 5.

Category

Supported/Unsupported objects

Supported Parameterized Expressions

The collector parses source expressions that use parameters in place of the following elements of the expressions: full Source value, server/host value, warehouse value, database name, schema name, table name, SQL expressions which incorporate parameters into them

Supported data functions

Csv.Document, Excel.Workbook, File.Contents, Folder.Contents, Folder.Files, Json.Document, Odbc.DataSource, Odbc.InferOptions, Odbc.Query, Xml.Document, Web.Contents, Web.Headers, Web.BrowserContents, AmazonRedshift.Database, Sql.Database, Sql.Databases, Snowflake.Databases, PostgreSQL.Database, Databricks.Catalogs, Oracle.Database, Denodo.Contents, Databricks.Query

Supported table functions

Table.AddColumn, Table.AddIndexColumn, Table.RenameColumns, Table.NestedJoin, Table.ExpandTableColumn, Table.SplitColumn, Table.DuplicateColumn, Table.CombineColumns

Unsupported table operations

Note: Contact data.world support if you have any expressions that use the following unsupported table operations.

Table.Pivot, Table.PromoteHeaders, Table.DemoteHeaders, Table.PrefixColumns, Table.TransformColumnNames, Table.Unpivot, Table.UnpivotOtherColumns, Table.AddFuzzyClusterColumn, Table.AddJoinColumn, Table.AggregateTableColumn, Table.Combine, Table.CombineColumnsToRecord, Table.ExpandRecordColumn, Table.Join, Table.Transpose

Supported dataflow functions

  • PowerPlatform.Dataflows

  • PowerBI.Dataflows

Supported value functions

  • Value.NativeQuery

Supported calculated columns

Lineage from calculated column expressions containing columns with and without table references, Columns or tables with alphanumeric characters, Spaces, Hyphens, and Underscore are supported

Supported measures

Lineage from measure expressions containing columns or tables with alphanumeric characters, Spaces, Hyphens, Underscore, Surrounding quotes are supported



Dependencies

  • Dependencies between Microsoft Fabric resources are cataloged from the Fabric metadata scan APIs, and these relationships can be seen in the Lineage Explorer.

Authentication supported

  • The collector supports the Service principal authentication method.