Skip to main content

Preparing to run the Amazon S3 collector

Setting up pre-requisites for running the collector

Make sure that the machine from where you are running the collector meets the following hardware and software requirements.

Table 1.

Item

Requirement

Hardware

RAM

8 GB

CPU

2 Ghz processor

Software

Docker

Click here to get Docker.

Java Runtime Environment

OpenJDK 17 is supported and available here.

data.world specific objects

Dataset

You must have a ddw-catalogs (or other) dataset set up to hold your catalog files when you are done running the collector.



Setting up authentication for cataloging Amazon S3

This section will walk you through the process of setting up an account with S3 read-access policy and setting up a credentials profile file.

Creating a user

Skip this step if you already have an user that you want to run the collector with and the user has ReadOnlyAccess access to Amazon S3. Detailed AWS documentation on this topic is available here.

  1. Login to the AWS portal and navigate to IAM service. Under Users, click Add users to add an user. You can also select an existing user.

  2. On the next screen:

    1. In the Permissions option, select Add permissions (attach policies directly).

    2. In, Permissions policies section, select AmazonS3ReadOnlyAccess.

      Click Next.

  3. On the last screen click the Add permissions or Create user.

Obtaining access key for the user

Skip this step if you already have the access key for the user that you plan to use for running the collector. Detailed AWS documentation on this topic is available here.

  1. Login to the AWS portal and navigate to IAM service.

  2. Under Users, select the user that plan to use for the collector.

  3. On the Security credentials tab, click Create access key.

  4. Select Application running outside AWS. Click Next.

  5. Add the optional Description tag. Click Create Access key.

  6. Note down the Access key ID and Secret access key. You will need this information for setting up the credentials file.

Setting up credentials file

Skip this step if you already have the AWS CLI installed and credentials profiles file set up.

  1. Install the AWS CLI.

  2. From the command line, run aws configure. This stores the credentials to ~/.aws/credentials.