Integrations

MySQL and DWCC

Prerequisites

  • The computer running the catalog collector should have connectivity to the internet or access to the source instance, a minimum of 2G memory, and a 2Ghz processor.

  • If you are using Docker, it must be installed. For more information see https://docs.docker.com/get-docker/.

  • The user defined to run DWCC must have read access to all resources being cataloged.

  • The computer running dwcc needs a Java Runtime Environment (JRE), version 11 or higher. (OpenJDK available here)

Basic parameters

These are the basic parameters needed to run the collector. A list of all parameter options is at the end of this article. Where available, either short (e.g., -a) or long (--account) forms can be used.

Writing the DWCC command

The easiest way to create your DWCC command is to:

  1. Copy the example below

  2. Edit it for your organization and data source

  3. Open a terminal window in any Unix environment that uses a Bash shell (e.g., MacOS and Linux) and paste your command into it.

The example command includes the minimal parameters required to run the collector (described below)--your instance may require more. A description of all the available parameters is at the end of this article. Edit the command by adding any other parameters you wish to use, and by replacing the values for all your parameters with your information as appropriate. Parameters required by the collector are in bold.

Detailed information about the Docker portion of the command can be found here. When you run the command, run will attempt to find the image locally, and if it doesn't find it, it will go to Dockerhub and download it automatically:Introduction to Docker

dwcc_and_cli.png

Collector runtime and troubleshooting

The catalog collector may run in several seconds to many minutes depending on the size and complexity of the system being crawled. If the catalog collector runs without issues, you should see no output on the terminal, but a new file that matching *.dwec.ttl should be in the directory you specified for the output. If there was an issue connecting or running the catalog collector, there will be either a stack trace or a *.log file. Both of those can be sent to support to investigate if the errors are not clear. A list of common issues and problems encountered when running the collectors is available here.Troubleshooting the collectors

Collector troubleshooting

You can find a list of common issues and problems encountered when running the collectors here.Troubleshooting the collectors

Handling custom certificates

If the target data instance has a custom SSL certificate, we recommend extending our Docker image and installing the custom cert like this (where ./ca.der is the name and location of the cert file).

Dockerfile:

FROM datadotworld/dwcc:x.y
ADD ./ca.der ca.der 
RUN keytool -importcert -alias startssl -cacerts -storepass changeit -noprompt -file ca.der

Important

Do not forget to replace x.y in datadotworld/dwcc:x.y with the version of DWCC you want to use. E.g., datadotworld/dwcc:2.345.

Then, in the directory with that Dockerfile:

docker build -t DWCC-cert

Finally, change the docker run command to use DWCC-cert instead of DWCC.

Automatic updates to your metadata catalog

Keep your metadata catalog up to date using cron, your Docker container, or your automation tool of choice to run the catalog collector on a regular basis. Considerations for how often to schedule include:

  • Frequency of changes to the schema

  • Business criticality of up-to-date data

For organizations with schemas that change often and where surfacing the latest data is business critical, daily may be appropriate. For those with schemas that do not change often and which are less critical, weekly or even monthly may make sense. Consult your data.world representative for more tailored recommendations on how best to optimize your catalog collector processes.