Preparing to run the Amazon S3 collector
Setting up pre-requisites for running the collector
Make sure that the machine from where you are running the collector meets the following hardware and software requirements.
Item | Requirement |
---|---|
Hardware (for on-premise runs only) Note: The following specs are based upon running one collector process at a time. Please adjust the hardware if you are running multiple collectors at the same time. | |
RAM | 8 GB |
CPU | 2 Ghz processor |
Software (for on-premise runs only) | |
Docker | Click here to get Docker. |
data.world specific objects (for both cloud and on-premise runs) | |
Dataset | You must have a ddw-catalogs dataset set up to hold your catalog files when you are done running the collector. If you are using Catalog Toolkit , follow these instructions to prepare the datasets for collectors. |
Network connection | |
Allowlist IPs and domains |
Setting up authentication for cataloging Amazon S3
This section will walk you through the process of setting up an account with S3 read-access policy and setting up a credentials profile file.
Creating a user
Create a user for running the collector.
Login to the AWS portal and navigate to IAM service. Under Users, click Add users to add an user. You can also select an existing user.
Ensure the user has the following permissions.
Table 2.Permission
AWS API
Object cataloged using the permission
s3:ListAllMyBuckets
List of all buckets owned by the user.
s3:GetBucketLocation
The region the bucket resides in.
s3:ListBucket
The list of objects within included buckets.
s3:GetBucketVersioning
The versioning state of the bucket.
s3:GetBucketAcl
The access control list (ACL) of the bucket.
s3:GetObjectAcl
The access control list (ACL) of the object.
s3:GetObject
The metadata of an object.
Obtaining access key for the user
Skip this step if you already have the access key for the user that you plan to use for running the collector. Detailed AWS documentation on this topic is available here.
Login to the AWS portal and navigate to IAM service.
Under Users, select the user that plan to use for the collector.
On the Security credentials tab, click Create access key.
Select Application running outside AWS. Click Next.
Add the optional Description tag. Click Create Access key.
Note down the Access key ID and Secret access key. You will need this information for setting up the credentials file.
Setting up credentials file
Skip this step if you already have the AWS CLI installed and credentials profiles file set up.
Install the AWS CLI.
From the command line, run aws configure. This stores the credentials to ~/.aws/credentials.