Preparing to run the AWS Glue collector
Note
The latest version of the Collector is 2.200. To view the release notes for this version and all previous versions, please go here.
Setting up pre-requisites for running the collector
Make sure that the machine from where you are running the collector meets the following hardware and software requirements.
Item | Requirement |
---|---|
Hardware (for on-premise runs only) Note: The following specs are based upon running one collector process at a time. Please adjust the hardware if you are running multiple collectors at the same time. | |
RAM | 8 GB |
CPU | 2 Ghz processor |
Software (for on-premise runs only) | |
Docker | Click here to get Docker. |
data.world specific objects (for both cloud and on-premise runs) | |
Dataset | You must have a ddw-catalogs dataset set up to hold your catalog files when you are done running the collector. If you are using Catalog Toolkit , follow these instructions to prepare the datasets for collectors. |
Preparing AWS Glue for the collector
An AWS credentials file for authentication which contains the user profile to determine which AWS account's instance to catalog. Typically the AWS_CREDENTIALS_FILE is at [user’s home directory]/.aws/credentials. See the AWS documentation on configuration and credential file settings for information on setting up this file.