Skip to main content

Troubleshooting the collectors

If you are having difficulty running a collector, the following list of common problems can help you troubleshoot what went wrong. If your issue is unanswered, please contact data.world support team for more assistance.

Errors logged on Command Line

This sections list some of the common errors you may see in the Command Line while running the collector.

Table 1.

CLI error

Cause

Solution

zsh: command not found: [command]

A parameter used in the command was not recognized by the terminal.

Check for a missing newline forward slashes (\) to delineate a new line. The forward slash should be at the end of a line before a line break.

Missing required options: [options]

A required parameter to run the collector was not specified.

Add the command the required parameters to the command and set the parameter values correctly.

Unknown option: [option]

A parameter was specified that is not supported by the collector.

Remove the unsupported parameter from the command.

docker: Error response from daemon: invalid mount config for type "bind": bind source path does not exist: [path]

A specified directory path provided for linking a local host directory to the container directory does not exist.

  • Ensure the directory exists. Source is the host directory path. Target is a path in the Docker container.

  • Check for proper casing of path. Some systems have case sensitive paths.

docker: invalid reference format.

The command is malformed. A possible issue here is that there are trailing spaces after a line continuation character (\).

Remove trailing spaces after the \ character.

Unable to connect to database [No suitable driver for [location]

A driver is required to connect to a system and it was not specified.

  • For Docker, add --mount type=bind,source=/path/where/jar/was/downloaded,target=/usr/src/dwcc-config/lib

  • For jar, add -Djdbc.driver-directory=<your driver directory>

  • When using PowerShell, you must use double quotes while adding the driver "-Djdbc.driver-directory=<your driver directory>".

    Example command: java -Dlog-level=DEBUG -Dlogdir=..\logs "-Djdbc.driver-directory=..\jdbcdrivers" -jar dwcc.jar --config-file .\config.yaml

Unable to connect to database [driver]) Communication link failure. Failed to connect to server. Reason: No more data available..]

Collector was unable to connect to the source system.

Check that the credentials provided for running the collector are correct.

Check to make sure that the location information are correct.

ERROR: The selected output directory: [path] does not exist.

The output path that stores the catalog output does not exist.

  • Check that you have mounted a source directory on your machine to a target directory on the container that will store the catalog output.

  • Check that the path specified by --output or -o is the path specified by the target directory on the container.

ERROR: Config file [path] does not exist

The config file contains the parameters that run the collector. The file path does not exist.

Check that you have mounted a source directory on your machine to a target directory on the container.

Check that the file exists in the source directory on your machine.

Check that the file path specified by --config-file is the file path on the target directory of the container.

StackOverflowError

The collector parser hit a stack size limit due to a complex SQL statement or DAX expression.

Add the -e DWCC_JVM_OPTIONS="-Xss2m" parameter to the command to increase the stack size. For example, the command will look like: docker run -it --rm -e DWCC_JVM_OPTIONS="-Xss2m". This sets the stack size to 2 MB.

If you are using a jar file to run the collector, the equivalent java command is: java -Xss2m -jar ...

java.lang.OutOfMemoryError: Java heap space

The process may have been terminated due to memory constraints.

See Troubleshooting steps for heap memory issues.

ERROR: Missing required argument(s): --all-schemas

This error occurs when you have specified the --include-information-schema option but did not specify the --all-schemas

When running specific collectors, if you provide the --include-information-schema option, you must also specify the --all-schemas.



Errors logged in log files

This sections list some of the common errors you may see in the log files while running the collector.

Table 2.

Error in log file

Cause

Solution

java.lang.RuntimeException [details]

An error occurred. More information is specified in the details.

  • Read the error details for clues to resolve the issue. If there are no clear steps to troubleshoot, continue to the next steps.

  • Run the collector with debug mode on:

    For Docker, add -e log_level=DEBUG

    For jar, add -Dlog_level=DEBUG

    Example: java -Dlog_level=DEBUG -jar [path]

    See if debug logs contain useful information to troubleshoot. Open a support ticket if blocked.

  • Trace level logs may sometimes be required for further troubleshooting. Change the DEBUG value to TRACE to have the collector log TRACE level logs.

dwcc was unable to upload the catalog to data.world via the API at https://api.data.world/v0/data.world

API exception: API token unauthorized

Automatic upload of collector output did not work. The collector could not connect to data.world using the API token.

Check that the API token is correct and not expired.

dwcc was unable to upload the catalog to data.world via the API at https://api.data.world/v0/data.world

data.world API exception: http status 403

The service account used for the collector does not have permission to upload output to the dataset.

Make sure that the service account associated with the collector has edit access to the dataset.

data.world API exception:  http status 400

Automatic upload of collector output did not work. There was an issue uploading the catalog to data.world.

  1. Check that the dataset specified by --upload-location exists.

  2. Make sure to use the dataset name referred to in the url.

    For instance, if the dataset name is Dataset Space, the portion of the url identifying the dataset is dataset-space. Use dataset-space rather than Dataset Space.

    Note that this should be lower-case.

org.open_kos.CollectorException: Database error during cataloging

There was an error connecting to the source system.

  1. Check that the source system location information, credentials, and any roles are correct.

  2. Use the --dry-run option to validate that the source system location information and credentials are correct.

  3. Confirm that the source system is network routable from the machine where the collector is running. This may include any firewall rules.

401 or 403 errors

There was an authorization issue while connecting to a system.

  1. This typically means the location information specified is correct, but there was a credential issue.

  2. Check that the source system credentials, and any roles are correct.

  3. Check that the credentials have the right permissions (typically read permissions) to the objects that the collector will harvest from the system.

Could not establish a secure connection to your data.world instance

There is an issue preventing the collector from authenticating to data.world.

Check with your IT team to ensure there is no network configuration that may be blocking data.world. For instance, you may need to add data.world to the allow list on a proxy server.

Collector could not establish a connection to a source technology using the provided base API URL.

There could be a network routing issue or connection issue from the collector to the source technology.

Run the following commands to check for network connections issues:

  • If using Docker, run:

    docker run -it --rm --entrypoint curl datadotworld/dwcc:<CollectorVersion> <URL>

  • If using JAR files, run:

    curl <URL>



Heap memory issues for collectors

If you encounter the following OutOfMemoryError messages in the console output, it indicates that the process was terminated due to memory constraints. Similarly, if the collector takes a long time to complete, there may not be enough memory allocated for the collector to run.

java.lang.OutOfMemoryError: Java heap space

To resolve this issue, we need to address the memory allocation for various components: the Heap, Docker runtime, and the underlying OS/VM.

STEP 1: Allocating Sufficient Resources on the Operating System/VM Instance

  • Make sure that the underlying operating system or virtual machine running Docker or the JAR file has enough resources allocated to support the containers. Increasing the memory available to the OS/VM instance provides resources for Docker and the hosted containers or the JVM if using JAR files. Follow the documentation for the OS/virtual machines to make adjustments to the allocated resources.

STEP 2: Increasing Docker Memory

  • Ensure that Docker has sufficient memory allocated to it. Depending on the platform and configuration, Docker may have a default memory limit, often around 2 GB, which might be insufficient. We recommend increasing it to a minimum of 8 GB using Docker Desktop's settings. You can verify the current Docker memory limit by running docker system info or by checking the Docker Desktop configurations. If your setting is already at 8 GB or higher and you are still running into issues, increase the memory to a higher number and try again.

STEP 3: Adjusting Heap Memory

  • If increasing the Docker Memory does not resolve the issue, modify the Java heap memory settings using the docker run -e DWCC_JVM_OPTIONS='-Xmx<value to be set>' option. By increasing the heap memory, you provide more memory for the Java application running inside the Docker container.

    For example, to set the Java heap memory to 12GB use: docker run -e DWCC_JVM_OPTIONS='-Xmx12g'

  • If running the JAR file and the typical default heap size is smaller than the recommended memory, you should explicitly set it using -Xms and -Xmx.

    For example, to set the maximum available memory of the JVM to 6GB, use: java -Xms6g -Xmx6g -jar …

    Note: You can find the maximum available memory for the JVM in the collector debug logs. You will see a log message like: DEBUG: Maximum memory available to the JVM: 2.00GB

Collectors take a long time to complete the run

Activating the data profiling feature may extend the running time of the collectors. This is because the collector needs to read the table data to be able to gather metadata for profiling. If you are running into slowness issues with such collectors, try the following.

Solutions:

  1. Reduce the target sample size (--target-sample-size parameter) for column statistics collection.

  2. If that does not help, try turning off the data profiling feature for the collector and see if that helps.

Permission Denied Error When Writing TTL File

  • Error Encountered: Permission denied error occurs when attempting to write the TTL file. User sees an error like:

    Caused by: java.io.FileNotFoundException: /dwcc-output/standard_db.collectortest.dwec.ttl (Permission denied)
  • Cause: The error is typically caused by the lack of sufficient permissions for the user running the collector to write to the output directory. This issue may arise due to improper file system permissions or system configurations, especially when using Docker on systems such as Red Hat Enterprise Linux (RHEL) with SELinux enabled.

To resolve this issue, verify directory permissions and check SELinux Settings on RHEL.

Step 1: Verify Directory Permissions

  • Ensure the user running the collector has the necessary write permissions to the output directory. If you are running from Docker, verify the source directory mounted to the Docker container has appropriate permissions.

    For example, on Linux, you can check the permissions using:

    ls -ld ${HOME}/dwcc

    Ensure the output allows write access to the user.

Step 2: SELinux Settings on RHEL

If you are operating on RHEL with SELinux enabled, it might block Docker from writing to directories despite file system permissions.

  1. Check SELinux Status: Run the following command:

    getenforce

    If the output is Enforcing, SELinux is active.

  2. Do one of the following:

    1. Update SELinux Context. If SELinux is blocking Docker, update the directory's SELinux context:

      sudo chcon -Rt svirt_sandbox_file_t <directory>

      Replace <directory> with the path to your directory, allowing Docker the necessary permissions.

    2. Temporarily Disable SELinux (if applicable). If SELinux policies are not required for your setup, temporarily disable enforcement:

      sudo setenforce 0

Important

If the error persists and you have confirmed adequate permissions, further environmental restrictions may exist. It's advisable to consult your system administrator to ensure no additional settings hinder directory write access.