Enterprise docs

Run DWCC with Docker

The DWCC collectors are distributed as Docker images available via dockerhub. To run one of the collectors, you will use the fully-qualified name of the collector on a CLI (datadotworld/dwcc:x.y, where x.y is the version of the collector that you wish to run). The Docker client on your machine will pull the image if you don’t have it already--there is no need for you to explicitly install it. The image is run with a series of parameters and outputs a file with the extension *.dwec.ttl. This file is then uploaded to data.world manually, or you can have the catalog collector upload it automatically using an API token.

JDBC drivers

JDBC database sources require a JDBC driver to run DWCC. We bundle a database driver with some of the database collectors we catalog. For licensing reasons, we cannot bundle other drivers. The drivers we include are:

  • Hive (for Hive and Hive Metastore)

  • Postgres

  • Presto

  • Snowflake

  • SQL Server

If you are cataloging one of the following database sources, please check with the database vendor for the proper driver to use with your version. You will need to obtain and license the driver yourself, and pass the full path to that directory as the value of that system property (examples shown in the scripts below):

  • DB2

  • Databricks

  • Denodo

  • Dremio

  • Infor ION

  • MySQL

  • Oracle

  • Redshift

  • SQL Anywhere

  • Vertica

Where to get a DWCC collector

The DWCC collectors are distributed as images on Dockerhub. If you run DWCC from Docker, the run command will attempt to find the image locally, and if it doesn't find it, it will go to Dockerhub and download it automatically:

dwcc_and_cli.png

If you are running DWCC from a .jar file, you will get the correct file from customer support.

If you are unsure what version of a DWCC collector to use, the most current releases of the collectors are always listed in the Catalog collector change log. However If you don't know the complete version name, or if you would like to see a list of the DWCC collector versions, you can go to our Dockerhub repositories. There are two repositories, one for released versions and one for release candidate versions:

  • datadotworld/dwcc- Contains all of the officially released versions of the DWCC

  • datadotworld/dwcc-rc - Contains the "release candidate" versions. Release candidates are test versions, they are not officially supported and released. They are primarily used for quick customer fixes until the official release comes out.

Caution

Do not use the versions named Latest from either repository--only specify numeric releases (e.g., dwcc:2.36).

Warning

Do not use a release candidate (rc) version of the DWCC unless you have been explicitly directed to do so by your customer success or support representative.

The name you specify on the CLI should match exactly the version name on Dockerhub. For example:

  • The name of the DWCC collector version 2.36 is datadotworld/dwcc:2.36

  • The name of the third DWCC RC collector version of 2.37 is datadotworld/dwcc-rc:2.37-rc-0003 (RC versions are padded to four digits).

Validating a manually installed Docker image for a DWCC collector

If you manually installed a Docker image for your DWCC collector instead of pulling it as part of the run command in the CLI, you can validate that it is an authorized version by using the hash on the file. The hash for every released version after 2.36 is provided right below the version number in the Catalog collector change log:

DWCC_hash.png

To compare the hash from your version to the authorized version run the following command from your CLI:

docker inspect datadotworld/dwcc:x.y where x.y is the version of the release (e.g., 2.36)

You will get back something that looks like this:

check_hash.png

Compare the value in Digest with the value in RepoDigests and if they are the same, you have an authorized version. If they are not the same, contact support.

Editing severity level of reported error messages for DWCC collectors

It is now possible for users to set the level (severity) of log messages written to the console and log file from DWCC collectors. By default, we write “info” level messages; users can choose to write only errors (level=“ERROR”), errors+warnings (level=“WARN”), or all messages including debug trace (level=“DEBUG”). This is useful if we want to have customers run a DWCC collector with debug logging turned on, for troubleshooting problems etc.

If you are using Docker, to set the level to something other than "info", add the statement -e log_level=DEBUG to your run Docker... statement.

How to display DWCC collector license information

To display the licensing information for any version of a DWCC collector after 2.24, run the following command in your terminal window:

docker run -it --rm datadotworld/dwcc:X.XX display-license

where X.XX is the version number for the DWCC collector.