About the Microsoft Fabric collector
Warning
This collector is in public preview. It has passed our standard testing, but it is not yet widely adopted. You might encounter unforeseen edge cases in your environment. data.world is committed to promptly addressing any issues with public preview collectors. If you face any problems, please report them through your Customer Success Director, implementation team, or support team for assistance.
Use this collector to harvest metadata from Microsoft Fabric workspaces and their child resources, including items commonly found in Power BI.
Important
The Microsoft Fabric collector can be run on-premise using Docker or JAR files.
Note
The latest version of the Collector is 2.270. To view the release notes for this version and all previous versions, please go here.
What is cataloged
The collector catalogs the following information.
Object | Information cataloged |
---|---|
Workspaces | ID, Name, Description |
Warehouses | ID, Name, Description, Created Date, Last Modified Date, Created By, Modified By, Collation Type, Connection String, JDBC URL |
Lakehouses | ID, Name, Description, Created Date, Last Modified Date, Created By, Modified By, OneLake Tables Path, OneLake Files Path, Connection String, JDBC Url |
Fabric Data Pipelines | ID, Name, Description, Created Date, Last Modified Date, Created By, Modified By |
Eventhouses | ID, Name, Description, Created Date, Last Modified Date, Created By, Modified By |
Dataflows | ID, Name, Description, Last Modified Date |
Mirrored Databases | ID, Name, Description |
Notebooks | ID, Name, Description, Created Date, Last Modified Date, Created By, Modified By |
Spark Job Definitions | ID, Name, Description, Created Date, Last Modified Date, Created By, Modified By |
SQL Analytics Endpoints | ID, Name, Description, Last Modified Date, Created By, Modified By, Provisioning Status, Connection String |
Lakehouse Folders | ID, Name, Description, Created Date, Last Modified Date, ABFS File Path |
Lakehouse Files | ID, Name, Description, Created Date, Last Modified Date, ABFS File Path |
Schemas | Name Extended Metadata: Created Date, Modified Date |
Fabric Database Tables | Name, Description, Primary Key, ABFS File Path Extended metadata: Created date, Modified date |
Database Columns | Name, Description, JDBC Type, Column Type, Is Nullable, Default Value, Key Type (Primary, Foreign), Column Size, Column Index, Decimal Digits |
Views | Name, Description, SQL Definition |
Reports | ID, Name, Description, Type, Preview Image (not supported for paginated report types), Created Date, Last Modified Date, Created by, Modified by, External URL, Embed URL |
Report Pages | Name |
Dashboards | ID, Name, External URL, Embed URL |
Dashboard Tiles | Name, Embed URL |
Semantic Models | ID, Title, Description, Created Date, Created By, External URL |
Data Sources | Name, Type, Connection Details |
Fabric Logical Tables | Name, Description, Is Hidden, Is Entered Data, Expression |
Fabric Calculated Tables | Name, Description, Is Hidden, Is Entered Data, Expression |
Fabric Logical Columns | Name, Description, Data Type, Column Type, Is Hidden, Expression |
Measures | Name, Description, Is Hidden, Expression |
Profiling and sampling specific information
The Microsoft Fabric collector supports the profiling and sampling specific parameters which the SQL Server collector supports and these apply to Warehouses and Lakehouses. If you include the profiling and sampling specific parameters while running the collector, the following additional information is harvested for Columns and Tables.
Object | Information cataloged |
---|---|
Column |
|
Table |
|
Relationships between objects
By default, the harvested metadata includes catalog pages for the following resource types. Each catalog page has a relationship to the other related resource types. If the metadata presentation for this data source has been customized with the help of the data.world Solutions team, you may see other resource pages and relationships.
Resource page | Relationship |
---|---|
Workspace |
|
Warehouses |
|
Lakehouses |
|
Fabric Data Pipelines |
|
Eventhouses |
|
Dataflows |
|
Notebooks |
|
Spark Job Definitions |
|
SQL Analytics Endpoints |
|
Lakehouse Folders |
|
Lakehouse Files |
|
Schemas |
|
Fabric Database Tables |
|
Database Columns |
|
Views |
|
Reports |
|
Report Pages |
|
Dashboards |
|
Dashboard Tiles |
|
Semantic Models |
|
Data Sources |
|
Lineage and dependencies for Microsoft Fabric
Lineage
The following lineage information is collected by the Microsoft Fabric collector.
Object | Lineage available |
---|---|
Database View | The collector identifies the associated column in an upstream view or table:
Note: For Views, the collector first tries to parse the view SQL to harvest lineage metadata. If the SQL parser of the collector cannot parse the view SQL, the collector will catalog some lineage relationships using the dm_sql_referencing_entities system function, when available. For each row in the referenced entities, if is_selected or is_select_all is true, the collector will catalog a relationship between the referencing entity and the database column. |
Semantic Model | Dataflows and Semantic Models this Semantic Model uses data from. |
Dataflow | Other Dataflows this Dataflow used data from. |
Logical Table | Associated tables that the table sources its data from Note: The collector uses expressions returned by the Metadata Scan APIs to parse the lineage to the source columns/tables. |
Calculated Table | Logical tables and columns from which the calculated table calculates its values. |
Logical Column | Associated Logical and Database columns that the column sources its data from or calculates its values from. |
Measure | Associated Logical Columns that the measure sources it data from. |
Dashboard Tile | Associated Semantic Model |
Report | Associated Semantic Model, Report which this Report was published from. |
Dataflow | Other Dataflows this Dataflow uses data from. |
Supported cross-system lineage
The currently supported data sources for cross-system lineage:
Fabric Lakehouse (currently limited to SQL endpoint connections)
Fabric Warehouse (currently limited to SQL endpoint connections)
Oracle
Denodo
Snowflake
SQL Server
PostgreSQL
Redshift
Databricks
CSV documents
Important
While other data sources are not formally supported, running the collector for those sources may still enable you to view cross-system lineage between Microsoft Fabric and these sources.
Supported Power Query (M) Functions and Expressions for Lineage Metadata`
Note
Any table operations or transformations not listed in the following table as supported or unsupported are ignored.
This section captures supported transformations, source expressions, calculated columns, and measure expressions when harvesting lineage metadata.
Category | Supported/Unsupported objects |
---|---|
Supported Parameterized Expressions | The collector parses source expressions that use parameters in place of the following elements of the expressions: full Source value, server/host value, warehouse value, database name, schema name, table name, SQL expressions which incorporate parameters into them |
Csv.Document, Excel.Workbook, File.Contents, Folder.Contents, Folder.Files, Json.Document, Odbc.DataSource, Odbc.InferOptions, Odbc.Query, Xml.Document, Web.Contents, Web.Headers, Web.BrowserContents, AmazonRedshift.Database, Sql.Database, Sql.Databases, Snowflake.Databases, PostgreSQL.Database, Databricks.Catalogs, Oracle.Database, Denodo.Contents, Databricks.Query | |
Table.AddColumn, Table.AddIndexColumn, Table.RenameColumns, Table.NestedJoin, Table.ExpandTableColumn, Table.SplitColumn, Table.DuplicateColumn, Table.CombineColumns | |
Unsupported table operations | Note: Contact data.world support if you have any expressions that use the following unsupported table operations. Table.Pivot, Table.PromoteHeaders, Table.DemoteHeaders, Table.PrefixColumns, Table.TransformColumnNames, Table.Unpivot, Table.UnpivotOtherColumns, Table.AddFuzzyClusterColumn, Table.AddJoinColumn, Table.AggregateTableColumn, Table.Combine, Table.CombineColumnsToRecord, Table.ExpandRecordColumn, Table.Join, Table.Transpose |
Supported dataflow functions |
|
| |
Supported calculated columns | Lineage from calculated column expressions containing columns with and without table references, Columns or tables with alphanumeric characters, Spaces, Hyphens, and Underscore are supported |
Supported measures | Lineage from measure expressions containing columns or tables with alphanumeric characters, Spaces, Hyphens, Underscore, Surrounding quotes are supported |
Dependencies
Dependencies between Microsoft Fabric resources are cataloged from the Fabric metadata scan APIs, and these relationships can be seen in the Lineage Explorer.
Authentication supported
The collector supports the Service principal authentication method.