Skip to main content

Troubleshooting Power BI Service collector issues

Collector runtime and troubleshooting

The catalog collector may run in several seconds to many minutes depending on the size and complexity of the system being crawled.

  • If the catalog collector runs without issues, you should see no output on the terminal, but a new file that matching *.dwec.ttl should be in the directory you specified for the output.

  • If there was an issue connecting or running the catalog collector, there will be either a stack trace or a *.log file. Both of those can be sent to support to investigate if the errors are not clear.

A list of common issues and problems encountered when running the collectors is available here.

Issue 1: Error authenticating to Microsoft Graph API

  • Errors observed: The following errors are observed:

    • API error encountered getting Datasources. HTTP error code 403 Forbidden.

    • Error invoking Power BI API to authenticate. HTTP status: 401.

  • Cause: There was an authorization issue with Power BI related to permissions.

  • Solution: Review the authentication methods used to connect to Power BI and ensure you have completed all the set up instructions.

Issue 2: Unable to see lineage from Snowflake to Power BI

  • Cause: Unable to parse expression for table {Table name} because it has no expression, so also unable to determine the source. The source table cannot be cataloged.

  • Solution: Enable access to the detailed data source information (like tables and columns) provided by Power BI through the read-only admin APIs. For details about doing this task, please see this documentation.

Issue 3: Stack overflow error observed while running the collector

  • Cause: The collector parser hit a stack size limit due to a complex SQL statement or DAX expression.

  • Solution: Add the -e DWCC_JVM_OPTIONS="-Xss2m" parameter to the command to increase the stack size. For example, the command will look like: docker run -it --rm -e DWCC_JVM_OPTIONS="-Xss2m".... This sets the stack size to 2 MB.

    If you are using a jar file to run the collector, the equivalent java command is: java -Xss2m -jar ...

Issue 4: Warning related to ODBC connections observed while harvesting lineage information

  • Warnings observed: Following two error messages are observed in the log files:

    • WARN: Category: Lineage; No datasource value map provided, unable to collect the source table Test_Table (5) because Power BI doesn't provide source info for ODBC connections. Add a datasource value map for DSN: "SQL Server" and run again with the --datasource-mapping-file option.

    • WARN: Category: Lineage; Unable to determine data source information for table Test_Table (5) in Data Source dsn=SQL Server, tables and columns in this source will not be cataloged.

  • Cause: Data source mapping file is required for ODBC data sources.

  • Solution: Create the data source mapping file and use if while running the collectors.

Issue 5: Collector run takes a long time to complete

  • Problem: Collector runs take more than 6 hours.

  • Cause: The processing time for the Power BI parser is increased due to the presence of some large expressions.

  • Solution: As a workaround, you can set the Maximum Power BI Expression Length (--max-parseable-expression-length) parameter while running the collector. For example, you can set the value to 10,000. This means expressions exceeding this character count will not be parsed for lineage metadata. As a result, the lineage information harvested by the collector will be impacted.

Issue 6: Errors observed while harvesting reports pages

  • Warnings observed: Following warning messages are observed in the log files:

    • WARN: Category: Other; API error encountered getting Pages at https://api.powerbi.com/v1.0/myorg/groups/<ID>/reports/<ID>/pages; this object will not be included in catalog. HTTP error code 404 Not Found.

    • WARN: Category: Permissions; API error encountered getting Pages at https://api.powerbi.com/v1.0/myorg/groups/<ID>/pages; this object will not be included in catalog. HTTP error code 401 Unauthorized.

  • Cause: In order to retrieve Report Pages a service principal or user must be given access to both the workspace that the Report is in, as well as the workspace which the Dataset used in the report is in. This is only required for Report Pages due to limitations with the Power BI REST APIs and Admin APIs.

  • Solution: For details about setting the required access for harvesting Reports pages, see Setting up access for cataloging Power BI resources.

    If acquiring page names for reports is not required and adding the service principal or user to each workspace is not possible, these messages can be ignored.

Issue 7: Unable to get lineage for Power BI tables to source tables

  • Warnings observed: Following warning messages are observed in the log files.

    • WARN: Category: Lineage; Unable to parse expression for table <table name> because it has no expression, so also unable to determine the source. The source table cannot be cataloged.

  • Cause: This error occurs when the Enhance admin APIs responses with DAX and mashup expressions setting is not enabled or the service principal or user is not a part of the security group for which this setting is enabled.

  • Solution: For details about configuring settings for metadata scanning, see Setting up access for cataloging Power BI resources.