Preparing to run the Information Schema Catalog Collector (ISCC) collector

Setting up pre-requisites for running the collector

Make sure that the machine from where you are running the collector meets the following hardware and software requirements.

Table 1.

Item	Requirement
Hardware Note: The following specs are based upon running one collector process at a time. Please adjust the hardware if you are running multiple collectors at the same time.
RAM	8 GB
CPU	2 Ghz processor
Software Docker or Java Runtime Environment
Docker	Click here to get Docker.
Java Runtime Environment	OpenJDK 17 is supported and available here.
data.world specific objects
Dataset	You must have a ddw-catalogs dataset set up to hold your catalog files when you are done running the collector. If you are using Catalog Toolkit , follow these instructions to prepare the datasets for collectors.
Network connection
Allowlist IPs and domains	Follow these instructions to configure your network. Use these tools to check network connections before running the collector.

Preparing Information Schema Catalog Collector (ISCC)

There are a couple of different ways to create the CSV files necessary for the Information Schema Catalog Collector. The first, and easiest, is to run SQL queries directly against the information schema of the database. Below are the requirements for using this method:

The database supports Information Schema and SQL querying.
You have permissions to query the database.

The second way to create the CSV files is to build them manually. If you use this option, your CSV files must contain the following columns.

For tables.csv:

TABLE_SCHEMA,
TABLE_NAME
TABLE_TYPE
TABLE_DESCRIPTION

For columns.csv:

TABLE_SCHEMA
TABLE_NAME
COLUMN_NAME
ORDINAL_POSITION
IS_NULLABLE
DATA_TYPE
COLUMN_DEFAULT
COLUMN_DESCRIPTION
CHARACTER_MAXIMUM_LENGTH
NUMERIC_PRECISION

For table_constraints.csv:

TABLE_SCHEMA
TABLE_NAME
CONSTRAINT_NAME
CONSTRAINT_TYPE

For constraint_column_usage:

TABLE_SCHEMA
TABLE_NAME
COLUMN_NAME
CONSTRAINT_NAME

Note

If you are creating the CSV files manually you can find information about the data formats and how they are nulled in this document.

Generating CVS file with SQL Query

To generate the CSV files from SQL queries against a database's information schema, run the following four SQL queries against your database:

select * from information_schema.tables

select * from information_schema.columns

select * from information_schema.table_constraints

select * from information_schema.constraint_column_usage

Export the results of each query as CSV files with the names:

tables.csv
columns.csv
table_constraints.csv
constraint_column_usage.csv

These files will be loaded into the csv-file-directory used with the data.world Collector.

Creating a CVS files manually

If you use this option to create the CSV files to use with the data.world Collector, they must contain the following columns.

For tables.csv:

TABLE_SCHEMA,
TABLE_NAME
TABLE_TYPE
TABLE_DESCRIPTION

For columns.csv:

TABLE_SCHEMA
TABLE_NAME
COLUMN_NAME
ORDINAL_POSITION
IS_NULLABLE
DATA_TYPE
COLUMN_DEFAULT
COLUMN_DESCRIPTION
CHARACTER_MAXIMUM_LENGTH
NUMERIC_PRECISION

For table_constraints.csv:

TABLE_SCHEMA
TABLE_NAME
CONSTRAINT_NAME
CONSTRAINT_TYPE

For constraint_column_usage:

TABLE_SCHEMA
TABLE_NAME
COLUMN_NAME
CONSTRAINT_NAME

Note

If you are creating the CSV files manually you can find information about the data formats and how they are nulled in this document.

Once you have the CSV files you are ready to run Docker and the the data.world Collector against them just as for any other data source. Two of them--tables.csv and columns.csv--are required, the other two are optional.

In this section:

Preparing to run the Information Schema Catalog Collector (ISCC) collector

Setting up pre-requisites for running the collector

Preparing Information Schema Catalog Collector (ISCC)

Note

Generating CVS file with SQL Query

Creating a CVS files manually

Note

Search results