Skip to main content

Troubleshooting Power BI Service collector issues

Collector runtime and troubleshooting

The catalog collector may run in several seconds to many minutes depending on the size and complexity of the system being crawled.

  • If the catalog collector runs without issues, you should see no output on the terminal, but a new file that matching *.dwec.ttl should be in the directory you specified for the output.

  • If there was an issue connecting or running the catalog collector, there will be either a stack trace or a *.log file. Both of those can be sent to support to investigate if the errors are not clear.

A list of common issues and problems encountered when running the collectors is available here.

Issue 1: Authentication errors observed

  • Error observed: The following error is observed:

    • Error invoking Power BI API to authenticate. HTTP status: 401.

  • Cause: There was an authorization issue while connecting to Power BI.

  • Solution: Verify that the Power BI authentication details provided during the collector configuration are correct.

Issue 2: Authorization errors observed while using username and password authentication

  • Errors observed: The following error is observed.

    WARN: Category: Permissions; API error encountered getting <resource> at <link>; 
    this object will not be included in catalog. HTTP error code 401 Unauthorized.
  • Cause: This error occurs when using username/password authentication and when the user does not have proper access to the resources.

  • Solution: Follow these instructions to configure proper access for the user.

Issue 3: API health check failed warning when using service principal authentication

  • Warning observed: Following warning messages appear in the log files.

    WARN: Category: Other; An error was encountered during the collector run, the results are most likely incomplete.
    See the log for more information. Error message: API health check failed the APIs may not be available at the moment, unable to continue. 
    Please try again later. Unauthorized. HTTP error code 401.
  • Cause: This issue occurs if the collector fails immediately in the health check with a 401 error code when using Service Principals, and there is no additional error JSON section from Microsoft. This indicates that the Fabric APIs have not been enabled for the Service Principal.

  • Solution: Enable Fabric APIs for the Service Principal.

Issue 4: API health check failed error when using service principal authentication

  • Error observed: Following error messages appear in the log files.

    "error": "unauthorized_client",
    "error_description": "A0016: Application with identifier '083eb699' was not found in the directory 'data.world'."
  • Cause: The collector encounters a 400 or 401 error code due to incorrect Service Principals credentials such as client ID or secret.

  • Solution: Correct the client ID or secret based on the details provided by Microsoft in the error message. Microsoft will specify what went wrong, such as an invalid client ID, etc.

Issue 5: Stack overflow error observed while running the collector

  • Error observed: Following Stack overflow error observed while running the collector.

    INFO: Category: Status Update; Cataloging PowerBI Data Source/Dataflow table Account with parent resource d55ab87e-d93: https://org.linked.data.world/d/ddw-catalogs/powerbiTable.1d6a854.
    Exception in thread "main" java.lang.StackOverflowError
    	at org.antlr.v4.runtime.atn.ATNConfigSet.add(ATNConfigSet.java:146)
    	at org.antlr.v4.runtime.atn.ATNConfigSet.add(ATNConfigSet.java:122)
    	at org.antlr.v4.runtime.atn.LexerATNSimulator.closure(LexerATNSimulator.java:446)
  • Cause: The collector parser hit a stack size limit due to a complex SQL statement or DAX expression.

  • Solution: Add the -e DWCC_JVM_OPTIONS="-Xss2m" parameter to the command to increase the stack size. For example, the command will look like: docker run -it --rm -e DWCC_JVM_OPTIONS="-Xss2m".... This sets the stack size to 2 MB. if you continue to get an error, increase the stack size further.

    If you are using a jar file to run the collector, the equivalent java command is: java -Xss2m -jar ...

Issue 6: Collector run takes a long time to complete

  • Problem: Collector runs take more than 6 hours.

  • Cause: The processing time for the Power BI parser is increased due to the presence of some large expressions.

  • Solution: As a workaround, you can set the Maximum Power BI Expression Length (--max-parseable-expression-length) parameter while running the collector. For example, you can set the value to 10,000. This means expressions exceeding this character count will not be parsed for lineage metadata. As a result, the lineage information harvested by the collector will be impacted.

Issue 7: Errors observed while harvesting reports pages

  • Warnings observed: Following warning messages are observed in the log files:

    • WARN: Category: Other; API error encountered getting Pages at https://api.powerbi.com/v1.0/myorg/groups/<ID>/reports/<ID>/pages; this object will not be included in catalog. HTTP error code 404 Not Found.

    • WARN: Category: Permissions; API error encountered getting Pages at https://api.powerbi.com/v1.0/myorg/groups/<ID>/pages; this object will not be included in catalog. HTTP error code 401 Unauthorized.

  • Cause: In order to retrieve Report Pages a service principal or user must be given access to both the workspace that the Report is in, as well as the workspace which the Dataset used in the report is in. This is only required for Report Pages due to limitations with the Power BI REST APIs and Admin APIs.

  • Solution: For details about setting the required access for harvesting Reports pages, see Setting up access for cataloging Power BI resources.

    If acquiring page names for reports is not required and adding the service principal or user to each workspace is not possible, these messages can be ignored.

Issue 8: Warning message observed: Export report to image is disabled on tenant level

  • Warnings observed:

    WARN: Category: Permissions; API error encountered posting ExportToFileResponse at
    https://api.powerbi.com/v1.0/myorg/groups/133d6f64-6dde-4e20/reports/61a96178/ExportTo; 
    this object will not be included in catalog. HTTP error code 403 Forbidden.
     Detailed information from the API about this error: 
    {  "error" : {    
      "code" : "InvalidRequest", 
      "message" : "Export report to image is disabled on tenant level"
     }}.
  • Possible cause: The setting to allow export of report preview images is disabled.

  • Solution: Enable the Export reports as image files option in the Power BI Admin Portal for the Service Principal.

Issue 9: Unable to export report preview images

  • Error observed: Following error messages are observed in the log files:

    Category: Permissions; API error encountered posting ExportToFileResponse at 
    https://api.powerbi.com/v1.0/myorg/groups/0c41ce4f-cd51-4f01/reports/f46/
    ExportTo; this object will not be included in catalog. HTTP error code 403 Forbidden. 
    Detailed information from the API about this error: 
    {  "error" : {
        "code" : "InvalidRequest",
        "message" : "Report requested for export is not on dedicated capacity"
     }}.
  • Cause: In Power BI APIs, in order to export reports the workspace must be on fabric, premium, or embedded capacity.

  • Solution: Ensure that you have completed all pre-requisites tasks for using this feature.

Issue 10: Warning message observed: Unable to initiate metadata scan (User and password authentication)

  • Warnings observed: Following warning messages are observed in the log files.

    WARN: Category: Permissions; Unable to initiate metadata scan for workspace with id 0c41. 
    No detailed table information will be retrieved for this workspace. HTTP status: 401.
  • Cause: The user must have administrator rights (such as Microsoft 365 Global Administrator or Power BI Service Administrator) to use metadata scanning.

  • Solution: Ensure that you have completed all pre-requisites tasks for using this feature. If you do not wish to give the user administrator rights, you can ignore the warning messages.

Issue 11: Warning message observed: Unable to initiate metadata scan (Service principal authentication)

  • Warnings observed: Following warning messages are observed in the log files.

    WARN: Category: Permissions; Unable to initiate metadata scan for workspace with id 133d. 
    No detailed table information will be retrieved for this workspace. HTTP status: 403. 
    Detailed information from the API about this error: {  "Message" : "API is not accessible for application"}.
  • Cause: The Admin API is not enabled for the Service Principal, or the App Registration contains API permissions that require admin consent.

  • Solution:

    1. Verify you have enabled the Admin API for the Service Principal. If you do not want to give the Service Principal read access to everything in the tenant/org, then you can ignore these messages, but you will not be able to get lineage from datasets to source databases/files etc. 

    2. If the Admin API looks correct,  make sure the App registration does not include any Admin Consent Required permissions in API permissions in Azure.

Issue 12: Warning message observed: No workspaces were found

  • Warning observed: The following warning messages are recorded in the log files:

    WARN: Category: Configuration; No workspaces were found. 
    The credentials used may not have access to workspaces, or an error occurred retrieving the workspaces.
  • Cause: The issue may be due to either incorrect configuration of options and parameters while running the collector, or insufficient permissions for harvesting workspaces.

  • Solution: Ensure that you have completed all pre-requisites tasks to correctly set permissions for harvesting workspaces. This section also provides guidance on selecting the appropriate options and parameters for successful workspace harvesting.

Issue 13: The Power BI Table Has No Expression

If you encounter the following log message and it occurs for all tables in all datasets, it indicates that the Enhance admin APIs responses with DAX and mashup expressions option needs to be enabled in the Admin Portal:

Unable to parse expression for table <Table> because it has no expression, so also unable to determine the source. 
The source table cannot be cataloged.
  • Cause 1: Missing Permissions

    If this log message appears for all tables in all datasets, it is likely due to the required option not being enabled.

  • Solution 1: Enable Required Option

    Enable the Enhance admin APIs responses with DAX and mashup expressions option in the Admin Portal.

  • Cause 2: Bad State Tables

    In some cases, if log messages indicate attempts to parse source expressions in datasets and some lineage information from Power BI Tables to sources is present, but a few of the messages above appear, it is likely that some tables are in a bad state and genuinely do not have a source expression. In this scenario, the permissions are already set correctly.

  • Solution 2: No Action Needed

    If it is confirmed that some tables genuinely lack a source expression, no further action is required regarding permissions.

Issue 14: Unable to get lineage for Power BI tables to source tables

  • Warnings observed: Following warning messages are observed in the log files.

    WARN: Category: Lineage; Unable to parse expression for table &lt;table name&gt; because it has no expression, so also unable to determine the source. The source table cannot be cataloged.

  • Issue: If there are no tables and columns displayed in any datasets, and the previously mentioned error about initiating metadata scan is not occurring, with no other warning messages, it is likely that the Enhance Admin APIs responses with detailed metadata option is not enabled.

    Important things to note:

    • Dataflows do not require the Admin API unless --all-workspaces-and-apps=true is used, so you may see Power BI Tables and Columns in dataflows, but not for datasets.

    • If permissions are set correctly and the workspace has Power BI tables in datasets, you would see messages about Power BI Tables and Columns being cataloged after the highlighted messages, but in this case, it just moves on to the next step in the collector run.

    • If there are tables and columns in even one dataset, then permissions are set up correctly, and the issue is limited to some datasets that do not have any tables and columns, as Power BI API setup is all or nothing.

      INFO: Category: Status Update; Cataloging workspace Other User Test (f3ae87f2)
      INFO: Category: Status Update; Initiating scan for detailed data source information for workspace f3ae87f2-c.
  • Cause: The required option Enhance Admin APIs responses with detailed metadata is not enabled.

  • Solution: Ensure that the Enhance Admin APIs responses with detailed metadata option is enabled in the Admin Portal. For details about configuring settings for metadata scanning, see Setting up access for cataloging Power BI resources.

Issue 15: Report pages issues when using service principal authentication

When collecting metadata from Power BI, there are a few known issues related to report pages, especially when using the All workplaces and apps (--all-workspaces-and-apps) option.

Important

Always check the URL where the issue occurred to diagnose API-related problems.

API error encountered getting Pages at https://api.powerbi.com/v1.0/myorg/groups/d06dfe3b-112f-4b69-a0b4-2b6346512781/reports/128c4a63-4a96-4f33-9a10-23771f25e1a2/pages; this object will not be included in catalog. HTTP error code 404 Not Found. Detailed information from the API about this error: {
  "error" : {
    "code" : "PowerBIEntityNotFound",
    "pbi.error" : {
      "code" : "PowerBIEntityNotFound",
      "parameters" : { },
      "details" : [ ],
      "exceptionCulprit" : 1
      ```

Scenario 1: Missing report pages when service principal lacks access to worksplaces

When using the All workplaces and apps (--all-workspaces-and-apps) option, if the Service Principal is not given access to a workspace, report pages in the workspace cannot be retrieved.

  • Cause: This is due to the lack of an Admin API endpoint in Power BI for Report Pages.

  • Solution: When using  All workplaces and apps (--all-workspaces-and-apps) add the Service Principal to workspaces. If you do not want to do this due to high volumes, the harvested metadata will continue to miss the name and ID of report pages.

Scenario 2: Unauthorized errors when fetching report pages for reports in Apps

Attempting to get report pages for reports in Apps using Service Principals results in 403 unauthorized errors.

  • Cause: It is not possible to fetch report pages for reports in Apps with Service Principals.

Issue 16: 401 Unauthorized errors when accessing dashboards or other high-level resources

When using a username and password authentication, you encounter 401 Unauthorized errors for various API calls such as Apps, Workspaces, Dashboards, Reports, Datasets, and Dataflows. This issue does not apply to Service Principals authentication.

WARN: Category: Permissions; API error encountered getting Dashboards at https://api.powerbi.com/v1.0/myorg/groups/133d6f64/dashboards; 
this object will not be included in catalog. HTTP error code 401 Unauthorized.
  • Cause: Missing API permissions for the App registration.

  • Solution: Ensure that all necessary API permissions are correctly added for the App registration.

Issue 17: Unauthorized error for dataflows only

  • Issue: When running the collector, you encounter a 401 Unauthorized error specifically for Dataflows, even though other Power BI resources are being cataloged successfully.

    WARN: Category: Permissions; API error encountered getting DataflowDefinition at https://api.powerbi.com/v1.0/myorg/groups/f3ae87f2/dataflows/30506e06; this object will not be included in catalog.
     HTTP error code 401 Unauthorized. Detailed information from the API about this error: 
    {  "error" : {   
     "code" : "DataflowUnauthorizedError",   
     "message" : "You do not have permissions to manage this dataflow."  
    }}.
  • Cause: The Service Principal needs at least Contributor access to the workspace.

  • Solution: Grant the Service Principal at least Contributor access to the workspace. Note that this is not required if using --all-workspaces-and-apps=true, in which case the Service Principal does not need to be added to the workspace, and this error would not occur.

Issue 18: Too many request errors

  • Cause: Some of the Power BI APIs, particularly the Admin APIs, impose limits on the number of API requests that can be made in an hour. Exceeding these limits results in 429 Too Many Requests errors.

  • Solution: Ensure the collector is configured according to your needs:

    If you prefer the collector to wait for the 429 errors to clear, no further action is needed.

    If you want to disable this behavior, enable the Disable max requests wait (--disable-max-requests-wait=true) option.

    Example log messages when the option is disabled:

    Category: Connection; API error encountered getting Groups at https://api.powerbi.com/v1.0/myorg/admin/groups?$top=1; this object will not be included in catalog. HTTP error code 429 Too Many Requests. 
    Detailed information from the API about this error: {
      "message" : "You have exceeded the amount of requests allowed in the current time frame and further requests will fail. Retry in 1088 seconds."
    }. The API response indicates that an API endpoint is currently unreachable due to a max requests limit being reached.

    Example log messages when the option is not disabled:

    WARN  o.o.c.powerbi.api.AbstractPowerBIApi - Category: Connection; Power BI API max requests were reached, the collector will continue to retry until the max requests error has cleared, or a maximum of one hour duration, whichever occurs first. 
    Waiting 5 minutes before trying again. To disable this behavior, set --disable-max-requests-wait=true.

Issue 19: Unable to Parse Expression or Find Details for a Source Table

  • Issue: Log messages indicating inability to resolve source information or find details for a source table. These messages suggest that certain source types or transformations are not supported by the collector.

    Category: Lineage; Unable to resolve source information for table testdf_dataflow in Dataflow f4fd61af-532 from the source table expression, the source tables and columns for this Power BI Table will not be collected. 
    This may occur if this source expression is a partial parameter and not a table, or if the source expression could not be parsed or used a currently unsupported source type or transformation. 
    
    Category: Lineage; Unable to resolve the data source for table "User" using the table's source expression. 
    The source table and columns for this Power BI Table cannot be cataloged. The source table expression may be using a currently unsupported source type or transformation. 
    
  • Cause: There are source types, transformation methods, and other syntaxes that the Power BI collector currently does not support for parsing lineage to sources.

  • Solution: Contact support with details of the unsupported source type or transformation method encountered.

Issue 20: Warning related to ODBC connections observed while harvesting lineage information

  • Warnings observed: Following two error messages are observed in the log files:

    • WARN: Category: Lineage; No datasource value map provided, unable to collect the source table Test_Table (5) because Power BI doesn't provide source info for ODBC connections. Add a datasource value map for DSN: "SQL Server" and run again with the --datasource-mapping-file option.

    • WARN: Category: Lineage; Unable to determine data source information for table Test_Table (5) in Data Source dsn=SQL Server, tables and columns in this source will not be cataloged.

  • Cause: You have a data source in Power BI which uses an ODBC connection. In these instances, Power BI does not provide the host or database type of the source.

  • Solution: Create the data source mapping file and use it while running the collectors.

Issue 21: Warning related to multiple server name aliases observed while harvesting lineage information

  • Warnings observed: Following two error messages are observed in the log files:

    Category: Lineage; Unable to collect database information because the database name was not able to be determined, this may be due to a source type in the Power BI table expression that is not currently supported, or missing information in the data source mapping file. 
    Source tables within the data source <data source key/name> can not be cataloged. See the log for more information.
    If the source uses an ODBC connection, then add the default database name to the data source mapping file.
  • Cause: You have multiple server names (aliases) for the same database instance (host) and the database collector uses a different alias than the one defined in the Power BI connection.

  • Solution: Set up a YAML file to map the database host to user-specified aliases and use it while running the collectors.

Issue 22: Warning related to Custom SQL Statements observed while harvesting lineage information

  • Warnings observed: Following two error messages are observed in the log files:

    Category: Lineage; Unable to harvest lineage from the source SQL statement for Power BI Table "<name>". 
    To resolve SQL statements in Power BI, an entry must be added to the datasource mapping file using 
    the --datasource-mapping-file option using the host or the DSN for the Power BI source 
    using: <instance details>.amazonaws.com and provide the appropriate credentials and connection information.
  • Cause: Custom SQL statements are used in Power BI table source definitions. The Power BI collector currently supports connecting to the following database types to resolve lineage from SQL statements: Snowflake, SQL Server, PostgreSQL, Redshift, Oracle.

  • Solution: Set up a YAML file to configure databases specified in custom SQL statements and use it while running the collectors.