Preparing to run the Information Schema Catalog Collector (ISCC) collector
Setting up pre-requisites for running the collector
Make sure that the machine from where you are running the collector meets the following hardware and software requirements.
Item | Requirement |
---|---|
Hardware Note: The following specs are based upon running one collector process at a time. Please adjust the hardware if you are running multiple collectors at the same time. | |
RAM | 8 GB |
CPU | 2 Ghz processor |
Software | |
Docker | Click here to get Docker. |
data.world specific objects | |
Dataset | You must have a ddw-catalogs dataset set up to hold your catalog files when you are done running the collector. If you are using Catalog Toolkit , follow these instructions to prepare the datasets for collectors. |
Network connection | |
Allowlist IPs and domains |
Preparing Information Schema Catalog Collector (ISCC)
There are a couple of different ways to create the CSV files necessary for the Information Schema Catalog Collector. The first, and easiest, is to run SQL queries directly against the information schema of the database. Below are the requirements for using this method:
The database supports Information Schema and SQL querying.
You have permissions to query the database.
The second way to create the CSV files is to build them manually. If you use this option, your CSV files must contain the following columns.
For tables.csv:
TABLE_SCHEMA,
TABLE_NAME
TABLE_TYPE
TABLE_DESCRIPTION
For columns.csv:
TABLE_SCHEMA
TABLE_NAME
COLUMN_NAME
ORDINAL_POSITION
IS_NULLABLE
DATA_TYPE
COLUMN_DEFAULT
COLUMN_DESCRIPTION
CHARACTER_MAXIMUM_LENGTH
NUMERIC_PRECISION
For table_constraints.csv:
TABLE_SCHEMA
TABLE_NAME
CONSTRAINT_NAME
CONSTRAINT_TYPE
For constraint_column_usage:
TABLE_SCHEMA
TABLE_NAME
COLUMN_NAME
CONSTRAINT_NAME
Note
If you are creating the CSV files manually you can find information about the data formats and how they are nulled in this document.
Generating CVS file with SQL Query
To generate the CSV files from SQL queries against a database's information schema, run the following four SQL queries against your database:
select * from information_schema.tables
select * from information_schema.columns
select * from information_schema.table_constraints
select * from information_schema.constraint_column_usage
Export the results of each query as CSV files with the names:
tables.csv
columns.csv
table_constraints.csv
constraint_column_usage.csv
These files will be loaded into the csv-file-directory used with the data.world Collector.
Creating a CVS files manually
If you use this option to create the CSV files to use with the data.world Collector, they must contain the following columns.
For tables.csv:
TABLE_SCHEMA,
TABLE_NAME
TABLE_TYPE
TABLE_DESCRIPTION
For columns.csv:
TABLE_SCHEMA
TABLE_NAME
COLUMN_NAME
ORDINAL_POSITION
IS_NULLABLE
DATA_TYPE
COLUMN_DEFAULT
COLUMN_DESCRIPTION
CHARACTER_MAXIMUM_LENGTH
NUMERIC_PRECISION
For table_constraints.csv:
TABLE_SCHEMA
TABLE_NAME
CONSTRAINT_NAME
CONSTRAINT_TYPE
For constraint_column_usage:
TABLE_SCHEMA
TABLE_NAME
COLUMN_NAME
CONSTRAINT_NAME
Note
If you are creating the CSV files manually you can find information about the data formats and how they are nulled in this document.
Once you have the CSV files you are ready to run Docker and the the data.world Collector against them just as for any other data source.
The CSV files created in a previous step. Two of them--
tables.csv
andcolumns.csv
--are required, the other two are optional.