Preparing to run the Redshift collector

Setting up pre-requisites for running the collector

Make sure that the machine from where you are running the collector meets the following hardware and software requirements.

Table 1.

Item	Requirement
Hardware (for on-premise runs only) Note: The following specs are based upon running one collector process at a time. Please adjust the hardware if you are running multiple collectors at the same time.
RAM	8 GB
CPU	2 Ghz processor
Software (for on-premise runs only) Docker or Java Runtime Environment
Docker	Click here to get Docker.
Java Runtime Environment	OpenJDK 17 is supported and available here.
data.world specific objects (for both cloud and on-premise runs)
Dataset	You must have a ddw-catalogs dataset set up to hold your catalog files when you are done running the collector. If you are using Catalog Toolkit , follow these instructions to prepare the datasets for collectors.
Network connection
Allowlist IPs and domains	Follow these instructions to configure your network. Use these tools to check network connections before running the collector.

Preparing Redshift for collectors

Setting up permissions

Run the following SQL statements to set up a new user with appropriate permissions to harvest from Redshift. For more information, see the Redshift documentation.

Create a new user.

CREATE USER ddw_user PASSWORD '<password>';

Grant the following permissions to the new user. Update <schemaName> for each schema you want to harvest.

-- Grant USAGE access to the target schemas
GRANT USAGE ON SCHEMA <schemaName> TO ddw_user;

-- Grant USAGE access on the pg_catalog schemas to query stored procedures and extended metadata
GRANT USAGE ON SCHEMA pg_catalog TO ddw_user;

-- Grant SELECT access on all tables within the target schemas
GRANT SELECT ON ALL TABLES IN SCHEMA <schemaName> TO ddw_user;

-- Grant SELECT access on all tables within the pg_catalog schemas to query stored procedures and extended metadata
GRANT SELECT ON ALL TABLES IN SCHEMA pg_catalog TO ddw_user;

Obtaining the driver

Important

This task is only required for on-premise run of collectors.

Make sure you download the appropriate JDBC driver for Redshift on the machine from where you will run the collector.

In this section: