Troubleshooting the collectors
If you are having difficulty running a collector, the following list of common problems can help you troubleshoot what went wrong. If your issue is unanswered, please contact support@data.world for more assistance.
Errors logged on Command Line
This sections list some of the common errors you may see in the Command Line while running the collector.
CLI error | Cause | Solution |
---|---|---|
zsh: command not found: [command] | A parameter used in the command was not recognized by the terminal. | Check for a missing newline forward slashes (\) to delineate a new line. The forward slash should be at the end of a line before a line break. |
Missing required options: [options] | A required parameter to run the collector was not specified. | Add the command the required parameters to the command and set the parameter values correctly. |
Unknown option: [option] | A parameter was specified that is not supported by the collector. | Remove the unsupported parameter from the command. |
docker: Error response from daemon: invalid mount config for type "bind": bind source path does not exist: [path] | A specified directory path provided for linking a local host directory to the container directory does not exist. |
|
docker: invalid reference format. | The command is malformed. A possible issue here is that there are trailing spaces after a line continuation character (\). | Remove trailing spaces after the \ character. |
Unable to connect to database [No suitable driver for [location] | A driver is required to connect to a system and it was not specified. |
|
Unable to connect to database [driver]) Communication link failure. Failed to connect to server. Reason: No more data available..] | Collector was unable to connect to the source system. | Check that the credentials provided for running the collector are correct. Check to make sure that the location information are correct. |
ERROR: The selected output directory: [path] does not exist. | The output path that stores the catalog output does not exist. |
|
ERROR: Config file [path] does not exist | The config file contains the parameters that run the collector. The file path does not exist. | Check that you have mounted a source directory on your machine to a target directory on the container. Check that the file exists in the source directory on your machine. Check that the file path specified by --config-file is the file path on the target directory of the container. |
StackOverflowError | The collector parser hit a stack size limit due to a complex SQL statement or DAX expression. | Add the -e DWCC_JVM_OPTIONS="-Xss2m" parameter to the command to increase the stack size. For example, the command will look like: docker run -it --rm -e DWCC_JVM_OPTIONS="-Xss2m". This sets the stack size to 2 MB. |
Errors logged in log files
This sections list some of the common errors you may see in the log files while running the collector.
Error in log file | Cause | Solution |
---|---|---|
java.lang.RuntimeException [details] | An error occurred. More information is specified in the details. |
|
dwcc was unable to upload the catalog to data.world via the API at https://api.data.world/v0/data.world API exception: API token unauthorized | Automatic upload of collector output did not work. The collector could not connect to data.world using the API token. | Check that the API token is correct and not expired. |
data.world API exception: http status 400 | Automatic upload of collector output did not work. There was an issue uploading the catalog to data.world. |
|
org.open_kos.CollectorException: Database error during cataloging | There was an error connecting to the source system. |
|
401 or 403 errors | There was an authorization issue while connecting to a system. |
|
Out Of Memory Errors for collectors
If you encounter OutOfMemoryError messages in the console output, it indicates that the process was terminated due to memory constraints. To resolve this issue, we need to address the memory allocation for various components: the Heap, Docker runtime, and the underlying OS/VM.
STEP 1: Allocating Sufficient Resources on the Operating System/VM Instance
Make sure that the underlying operating system or virtual machine running Docker has enough resources allocated to support the containers. Increasing the memory available to the OS/VM instance provides resources for Docker and the hosted containers. Follow the documentation for the OS/virtual machines to make adjustments to the allocated resources.
STEP 2: Increasing Docker Memory
Ensure that Docker has sufficient memory allocated to it. Depending on the platform and configuration, Docker may have a default memory limit, often around 2 GB, which might be insufficient. We recommend increasing it to minimum of 8 GB using Docker Desktop's settings. You can verify the current Docker memory limit by running docker system info or by checking the Docker Desktop configurations. If your setting is already at 8 GB or higher and you are still running into issues, increase the memory to a higher number and try again.
STEP 3: Adjusting Heap Memory
If increasing the Docker Memory doesn't resolve the issue, modify the Java heap memory settings using the docker run -e _JAVA_OPTIONS='-Xmx<value to be set>' option. By increasing the heap memory, you provide more memory for the Java application running inside the Docker container. For example, to set the Java heap memory to 12GB use: docker run -e _JAVA_OPTIONS='-Xmx12g'
Collectors take a long time to complete the run
Activating the data profiling feature may extend the running time of the collectors. This is because the collector needs to read the table data to be able to gather metadata for profiling. If you are running into slowness issues with such collectors, try the following.
Solutions:
Reduce the target sample size (--target-sample-size parameter) for column statistics collection.
If that does not help, try turning off the data profiling feature for the collector and see if that helps.