Enterprise docs

What is DWCC?

DWCC is a program that we use to collect metadata from your data source. DWCC pulls only metadata from your source--it doesn't collect any data.

Commands to run DWCC collectors

Command

Action

catalog-marquez

Catalog a Marquez instance

catalog-datakin

Catalog a Datakin instance

catalog-athena

Catalog an Athena database via Glue Data Catalog.

catalog-awsglue

Catalog a group of jobs on AWS Glue.

catalog-bigquery

Catalog a BigQuery Dataset.

catalog-databricks

Catalog a databricks database and workspace.

catalog-db2

Catalog a DB2 database.

catalog-denodo

Catalog a Denodo database.

catalog-domo

Catalog a Domo instance.

catalog-dremio

Catalog a Dremio database.

catalog-generic

Catalog a generic JDBC database.

catalog-hive

Catalog a Hive database.

catalog-hive-metastore

Catalog a Hive database via a Hive metastore database.

catalog-infor-ion

Catalog an Infor ION Data Lake database.

catalog-information-schema

Catalog csv files representing metadata for a database following ISO information schema standard

catalog-looker

Catalog a Looker instance.

catalog-manta

Catalog a MANTA repository graph.

catalog-montecarlo

Catalog a Monte Carlo instance via the Monte Carlo API.

catalog-mysql

Catalog a MySQL database.

catalog-openapi

Catalog an OpenAPI/Swagger API specification.

catalog-oracle

Catalog an Oracle database.

catalog-postgresql

Catalog a PostgreSQL database.

catalog-powerbi

Catalog a PowerBI Cloud Service instance.

catalog-presto

Catalog a Presto database.

catalog-redshift

Catalog a Redshift database.

catalog-reltio

Catalog a Reltio MDM Configuration.

catalog-salesforce

Catalog a Salesforce database.

catalog-snowflake

Catalog a Snowflake database.

catalog-sql-anywhere

Catalog a SAP/Sybase SQL Anywhere database.

catalog-sqlserver

Catalog a SQL Server database.

catalog-tableau

Catalog a Tableau instance via the Tableau API.

catalog-vertica

Catalog a Vertica database.

Information Schema Catalog Collector

There are occasionally times when a database configuration makes it difficult to connect the DWCC directly to the data source. The Information Schema Catalog Collector (ISCC) makes it possible to use the information schema of a database as a source for cataloging the database's metadata.

The information schema is an ANSI-standard set of read-only views of all the tables, views, columns, and procedures in an RDBMS. The ISCC works by parsing CSV files created from the information schema and using them as an input source for DWCC. You can find more information on the information schema here. Using DWCC directly is the preferred method for cataloging an RDMS, but the following instructions provide a secondary access method when needed.

Note

We have tested this collector against a SQLServer database, but it can be used on any database for which you can generate the four CSV files described in this guide.

Options and requirements

There are a couple of different ways to create the CSV files necessary for the Information Schema Catalog Collector. The first, and easiest, is to run SQL queries directly against the information schema of the database. Below are the requirements for using this method:

  • The database supports Information Schema and SQL querying.

  • You have permissions to query the database.

The second way to create the CSV files is to build them manually. If you use this option, your CSV files must contain the following columns.

For tables.csv:

  • TABLE_SCHEMA,

  • TABLE_NAME

  • TABLE_TYPE

For columns.csv:

  • TABLE_SCHEMA

  • TABLE_NAME

  • COLUMN_NAME

  • ORDINAL_POSITION

  • IS_NULLABLE

  • DATA_TYPE

  • COLUMN_DEFAULT

  • CHARACTER_MAXIMUM_LENGTH

  • NUMERIC_PRECISION

For table_constraints.csv:

  • TABLE_SCHEMA

  • TABLE_NAME

  • CONSTRAINT_NAME

  • CONSTRAINT_TYPE

For constraint_column_usage:

  • TABLE_SCHEMA

  • TABLE_NAME

  • COLUMN_NAME

  • CONSTRAINT_NAME

Note

If you are creating the CSV files manually you can find information about the data formats and how they are nulled in this document.

Generate CSV files with SQL query

To generate the CSV files from SQL queries against a database's information schema, run the following four SQL queries against your database:

select * from information_schema.tables

select * from information_schema.columns

select * from information_schema.table_constraints

select * from information_schema.constraint_column_usage

Export the results of each query as CSV files with the names:

  • tables.csv

  • columns.csv

  • table_constraints.csv

  • contraint_column_usage.csv

These files will be loaded into the csv-file-directory used with DWCC.

Create CSV files manually

If you use this option to create the CSV files to use with DWCCV, they must contain the following columns.

For tables.csv:

  • TABLE_SCHEMA,

  • TABLE_NAME

  • TABLE_TYPE

For columns.csv:

  • TABLE_SCHEMA

  • TABLE_NAME

  • COLUMN_NAME

  • ORDINAL_POSITION

  • IS_NULLABLE

  • DATA_TYPE

  • COLUMN_DEFAULT

  • CHARACTER_MAXIMUM_LENGTH

  • NUMERIC_PRECISION

For table_constraints.csv:

  • TABLE_SCHEMA

  • TABLE_NAME

  • CONSTRAINT_NAME

  • CONSTRAINT_TYPE

For constraint_column_usage:

  • TABLE_SCHEMA

  • TABLE_NAME

  • COLUMN_NAME

  • CONSTRAINT_NAME

Note

If you are creating the CSV files manually you can find information about the data formats and how they are nulled in this document.