Troubleshooting the collectors
If you are having difficulty running a collector, the following list of common problems can help you troubleshoot what went wrong. If your issue is unanswered, please contact support@data.world for more assistance.
Errors logged on Command Line
This sections list some of the common errors you may see in the Command Line while running the collector.
CLI error | Cause | Solution |
---|---|---|
zsh: command not found: [command] | A parameter used in the command was not recognized by the terminal. | Check for a missing newline forward slashes (\) to delineate a new line. The forward slash should be at the end of a line before a line break. |
Missing required options: [options] | A required parameter to run the collector was not specified. | Add the command the required parameters to the command and set the parameter values correctly. |
Unknown option: [option] | A parameter was specified that is not supported by the collector. | Remove the unsupported parameter from the command. |
docker: Error response from daemon: invalid mount config for type "bind": bind source path does not exist: [path] | A specified directory path provided for linking a local host directory to the container directory does not exist. |
|
docker: invalid reference format. | The command is malformed. A possible issue here is that there are trailing spaces after a line continuation character (\). | Remove trailing spaces after the \ character. |
Unable to connect to database [No suitable driver for [location] | A driver is required to connect to a system and it was not specified. |
|
Unable to connect to database [driver]) Communication link failure. Failed to connect to server. Reason: No more data available..] | Collector was unable to connect to the source system. | Check that the credentials provided for running the collector are correct. Check to make sure that the location information are correct. |
ERROR: The selected output directory: [path] does not exist. | The output path that stores the catalog output does not exist. |
|
ERROR: Config file [path] does not exist | The config file contains the parameters that run the collector. The file path does not exist. | Check that you have mounted a source directory on your machine to a target directory on the container. Check that the file exists in the source directory on your machine. Check that the file path specified by --config-file is the file path on the target directory of the container. |
StackOverflowError | The collector parser hit a stack size limit due to a complex SQL statement or DAX expression. | Add the -e DWCC_JVM_OPTIONS="-Xss2m" parameter to the command to increase the stack size. For example, the command will look like: docker run -it --rm -e DWCC_JVM_OPTIONS="-Xss2m". This sets the stack size to 2 MB. If you are using a jar file to run the collector, the equivalent java command is: java -Xss2m -jar ... |
ERROR: Missing required argument(s): --all-schemas | This error occurs when you have specified the --include-information-schema option but did not specify the --all-schemas | When running specific collectors, if you provide the --include-information-schema option, you must also specify the --all-schemas. |
Errors logged in log files
This sections list some of the common errors you may see in the log files while running the collector.
Error in log file | Cause | Solution |
---|---|---|
java.lang.RuntimeException [details] | An error occurred. More information is specified in the details. |
|
dwcc was unable to upload the catalog to data.world via the API at https://api.data.world/v0/data.world API exception: API token unauthorized | Automatic upload of collector output did not work. The collector could not connect to data.world using the API token. | Check that the API token is correct and not expired. |
dwcc was unable to upload the catalog to data.world via the API at https://api.data.world/v0/data.world data.world API exception: http status 403 | The service account used for the collector does not have permission to upload output to the dataset. | Make sure that the service account associated with the collector has edit access to the dataset. |
data.world API exception: http status 400 | Automatic upload of collector output did not work. There was an issue uploading the catalog to data.world. |
|
org.open_kos.CollectorException: Database error during cataloging | There was an error connecting to the source system. |
|
401 or 403 errors | There was an authorization issue while connecting to a system. |
|
Could not establish a secure connection to your data.world instance | There is an issue preventing the collector from authenticating to data.world. | Check with your IT team to ensure there is no network configuration that may be blocking data.world. For instance, you may need to add data.world to the allow list on a proxy server. |
Collector could not establish a connection to a source technology using the provided base API URL. | There could be a network routing issue or connection issue from the collector to the source technology. | Run the following commands to check for network connections issues:
|
Memory issues for collectors
If you encounter OutOfMemoryError messages in the console output, it indicates that the process was terminated due to memory constraints. Similarly, if the collector takes a long time to complete, there may not be enough memory allocated for the collector to run.
To resolve this issue, we need to address the memory allocation for various components: the Heap, Docker runtime, and the underlying OS/VM.
STEP 1: Allocating Sufficient Resources on the Operating System/VM Instance
Make sure that the underlying operating system or virtual machine running Docker or the JAR file has enough resources allocated to support the containers. Increasing the memory available to the OS/VM instance provides resources for Docker and the hosted containers or the JVM if using JAR files. Follow the documentation for the OS/virtual machines to make adjustments to the allocated resources.
STEP 2: Increasing Docker Memory
Ensure that Docker has sufficient memory allocated to it. Depending on the platform and configuration, Docker may have a default memory limit, often around 2 GB, which might be insufficient. We recommend increasing it to minimum of 8 GB using Docker Desktop's settings. You can verify the current Docker memory limit by running docker system info or by checking the Docker Desktop configurations. If your setting is already at 8 GB or higher and you are still running into issues, increase the memory to a higher number and try again.
STEP 3: Adjusting Heap Memory
If increasing the Docker Memory doesn't resolve the issue, modify the Java heap memory settings using the docker run -e _JAVA_OPTIONS='-Xmx<value to be set>' option. By increasing the heap memory, you provide more memory for the Java application running inside the Docker container.
For example, to set the Java heap memory to 12GB use: docker run -e _JAVA_OPTIONS='-Xmx12g'
If running the JAR file and the typical default heap size is smaller than the recommended memory, you should explicitly set it using -Xms and -Xmx.
For example, to set the maximum available memory of the JVM to 6GB, use: java -Xms6g -Xmx6g -jar …
Note: You can find the maximum available memory for the JVM in the collector debug logs. You will see a log message like: DEBUG: Maximum memory available to the JVM: 2.00GB
Collectors take a long time to complete the run
Activating the data profiling feature may extend the running time of the collectors. This is because the collector needs to read the table data to be able to gather metadata for profiling. If you are running into slowness issues with such collectors, try the following.
Solutions:
Reduce the target sample size (--target-sample-size parameter) for column statistics collection.
If that does not help, try turning off the data profiling feature for the collector and see if that helps.