Skip to main content

Preparing to run the Information Schema Catalog Collector (ISCC) collector

Setting up pre-requisites for running the collector

Make sure that the machine from where you are running the collector meets the following hardware and software requirements.

Table 1.

Item

Requirement

Hardware

Note: The following specs are based upon running one collector process at a time. Please adjust the hardware if you are running multiple collectors at the same time.

RAM

8 GB

CPU

2 Ghz processor

Software

Docker

Click here to get Docker.

data.world specific objects

Dataset

You must have a ddw-catalogs dataset set up to hold your catalog files when you are done running the collector.

If you are using Catalog Toolkit , follow these instructions to prepare the datasets for collectors.

Network connection

Allowlist IPs and domains

Follow these instructions to configure your network.



Preparing Information Schema Catalog Collector (ISCC)

There are a couple of different ways to create the CSV files necessary for the Information Schema Catalog Collector. The first, and easiest, is to run SQL queries directly against the information schema of the database. Below are the requirements for using this method:

  • The database supports Information Schema and SQL querying.

  • You have permissions to query the database.

The second way to create the CSV files is to build them manually. If you use this option, your CSV files must contain the following columns.

For tables.csv:

  • TABLE_SCHEMA,

  • TABLE_NAME

  • TABLE_TYPE

  • TABLE_DESCRIPTION

For columns.csv:

  • TABLE_SCHEMA

  • TABLE_NAME

  • COLUMN_NAME

  • ORDINAL_POSITION

  • IS_NULLABLE

  • DATA_TYPE

  • COLUMN_DEFAULT

  • COLUMN_DESCRIPTION

  • CHARACTER_MAXIMUM_LENGTH

  • NUMERIC_PRECISION

For table_constraints.csv:

  • TABLE_SCHEMA

  • TABLE_NAME

  • CONSTRAINT_NAME

  • CONSTRAINT_TYPE

For constraint_column_usage:

  • TABLE_SCHEMA

  • TABLE_NAME

  • COLUMN_NAME

  • CONSTRAINT_NAME

Note

If you are creating the CSV files manually you can find information about the data formats and how they are nulled in this document.

Generating CVS file with SQL Query

To generate the CSV files from SQL queries against a database's information schema, run the following four SQL queries against your database:

select * from information_schema.tables

select * from information_schema.columns

select * from information_schema.table_constraints

select * from information_schema.constraint_column_usage

Export the results of each query as CSV files with the names:

  • tables.csv

  • columns.csv

  • table_constraints.csv

  • constraint_column_usage.csv

These files will be loaded into the csv-file-directory used with the data.world Collector.

Creating a CVS files manually

If you use this option to create the CSV files to use with the data.world Collector, they must contain the following columns.

For tables.csv:

  • TABLE_SCHEMA,

  • TABLE_NAME

  • TABLE_TYPE

  • TABLE_DESCRIPTION

For columns.csv:

  • TABLE_SCHEMA

  • TABLE_NAME

  • COLUMN_NAME

  • ORDINAL_POSITION

  • IS_NULLABLE

  • DATA_TYPE

  • COLUMN_DEFAULT

  • COLUMN_DESCRIPTION

  • CHARACTER_MAXIMUM_LENGTH

  • NUMERIC_PRECISION

For table_constraints.csv:

  • TABLE_SCHEMA

  • TABLE_NAME

  • CONSTRAINT_NAME

  • CONSTRAINT_TYPE

For constraint_column_usage:

  • TABLE_SCHEMA

  • TABLE_NAME

  • COLUMN_NAME

  • CONSTRAINT_NAME

Note

If you are creating the CSV files manually you can find information about the data formats and how they are nulled in this document.

Once you have the CSV files you are ready to run Docker and the the data.world Collector against them just as for any other data source.

  • The CSV files created in a previous step. Two of them--tables.csv and columns.csv--are required, the other two are optional.About Information Schema Catalog Collector (ISCC)