Enterprise docs

Troubleshooting the collectors

If you are having difficulty running one of our metadata catalog collectors, this article contains a regularly updated list of tips for figuring out what went wrong. If you are still having trouble, please contact support@data.world for more assistance.

User permission issues for DWCC collectors

If your run of the DWCC collector does not capture everything in the catalog that you think should be there, the first thing to check is the user account you use to connect to your resource to ensure that you can authenticate to the resource outside of the collector and find those objects. For instance, with a database, you should be able to log into the database with a client (preferably a JDBC client like DBeaver) and see the objects. If the objects don't show up there either, it's a permissions issue.

Overwriting files on upload to the catalog

When you run a collector, the output file name is of the form [database name].[collection name].dwec.ttl. The result is that any time the collector is run more than one time against the same database and uploaded to the same collection, the output file will be overwritten. Overwriting the results when cataloging all schemas in a database is fine as the previously produced file is just updated.

However there are instances--e.g., when it is necessary to catalog one schema in a database at a time--where using the same name for the output file results in an overwrite of unique information as opposed to an update. In this case it would be desirable to have unique names for each of the output files before they were uploaded to a collection in the catalog.

Currently the way to achieve uploading of unique files from different schema in the same database is to:

  1. Disable automatic upload of the TTL files when running the collector

  2. Rename each output file with a unique name after running the collector

  3. Manually upload each of the newly created TTL files.

Allocating additional memory to Docker

When running a collector via Docker to catalog large bodies of metadata (e.g., a data source with hundreds or thousands of tables and many thousands of columns), you might exhaust the available memory in the docker container for the collector process. To address this problem, increase the memory available to Docker. On Windows and MacOS, this is handled via a Docker desktop preference change. If you are running this on a native Linux host, the Docker host and native host are the same (so memory available to Docker is all machine memory). On a Mac, e.g., go to Docker preferences:

docker_prefernces.png

And select Resources > Advanced. In this example the memory allowance is set to 2 GB. Increase it to 4 GB by moving the slider for Memory:

docker_resources_allocation.png

You can also increase the memory available to the DWCC container by terminating other containers running within the Docker host.