Skip to main content

Troubleshooting Azure Data Lake Storage Gen2 collector issues

Collector runtime and troubleshooting

The catalog collector may run in several seconds to many minutes depending on the size and complexity of the system being crawled.

  • If the catalog collector runs without issues, you should see no output on the terminal, but a new file that matching *.dwec.ttl should be in the directory you specified for the output.

  • If there was an issue connecting or running the catalog collector, there will be either a stack trace or a *.log file. Both of those can be sent to support to investigate if the errors are not clear.

A list of common issues and problems encountered when running the collectors is available here.

Issue 1: Resources from certain storage accounts are not getting cataloged

  • Cause: This generally happens when the bucket has more than 10,000 resources or what is set in the --max-resource-limit parameter.

  • Solution: Check if the --max-resource-limit parameter is set and if so, what value is configured for the parameter.

Issue 2: Collector did not harvest metadata from a specific storage account

  • Cause: The collector did not have permissions to read from a storage account.

  • Solution: Ensure that the Service Principal has Storage Blob Data Reader role for each of the Storage Accounts you want to harvest.