Preparing to run the Redshift collector
Setting up pre-requisites for running the collectors
Make sure that the machine from where you are running the collector meets the following hardware and software requirements.
Item | Requirement |
---|---|
Hardware (for on-premise runs only) | |
RAM | 8 GB |
CPU | 2 Ghz processor |
Software (for on-premise runs only) | |
Docker | Click here to get Docker. |
JDBC Driver | The computer should have the appropriate JDBC driver on its file system. |
data.world specific objects (for both cloud and on-premise runs) | |
Dataset | You must have a ddw-catalogs (or other) dataset set up to hold your catalog files when you are done running the collector. If you are using Catalog Toolkit , follow these instructions to prepare the datasets for collectors. |
Preparing Redshift for collectors
Setting up permissions
Run the following SQL statements to set up a new user with appropriate permissions to harvest from Redshift. For more information, see the Redshift documentation.
Create a new user.
CREATE USER ddw_user PASSWORD '<password>';
Grant the following permissions to the new user. Update <schemaName> for each schema you want to harvest.
-- Grant USAGE access to the target schemas GRANT USAGE ON SCHEMA <schemaName> TO ddw_user; -- Grant USAGE access on the pg_catalog schemas to query stored procedures and extended metadata GRANT USAGE ON SCHEMA pg_catalog TO ddw_user; -- Grant SELECT access on all tables within the target schemas GRANT SELECT ON ALL TABLES IN SCHEMA <schemaName> TO ddw_user; -- Grant SELECT access on all tables within the pg_catalog schemas to query stored procedures and extended metadata GRANT SELECT ON ALL TABLES IN SCHEMA pg_catalog TO ddw_user;
Obtaining the driver
Important
This task is only required for on-premise run of collectors.
Make sure you download the appropriate JDBC driver for Redshift on the machine from where you will run the collector.