Skip to main content

Preparing to run the Amazon Lake Formation collector

Setting up pre-requisites for running the collector

Make sure that the machine from where you are running the collector meets the following hardware and software requirements.

Table 1.

Item

Requirement

Hardware (for on-premise runs only)

Note: The following specs are based upon running one collector process at a time. Please adjust the hardware if you are running multiple collectors at the same time.

RAM

8 GB

CPU

2 Ghz processor

Software (for on-premise runs only)

Docker

Click here to get Docker.

data.world specific objects (for both cloud and on-premise runs)

Dataset

You must have a ddw-catalogs dataset set up to hold your catalog files when you are done running the collector.

If you are using Catalog Toolkit , follow these instructions to prepare the datasets for collectors.

Network connection

Allowlist IPs and domains



Setting up authentication for cataloging Amazon Lake Formation

This section will walk you through the process of setting up an account with S3 read-access policy and setting up a credentials profile file.

Creating a user

To create a user for running the collector:

  1. Login to the AWS portal and navigate to IAM service. Under Users, click Add users to add a user. You can also select an existing user.

  2. Ensure the user has following IAM policy level permissions.

    Table 2.

    Permission

    AWS API

    Object cataloged using the permission

    lakeformation:ListLFTags

    ListLFTags

    Lists LF-tags that the requester has permission to view.

    lakeformation:ListPermissions

    ListPermissions

    List of the principal permissions on the resource, filtered by the permissions of the caller.

    lakeformation:ListDataCellsFilter

    ListDataCellsFilter

    Lists all the data cell filters on a table.

    lakeformation:SearchDatabasesByLFTags

    SearchDatabasesByLFTags

    Lists glue databases which have been tagged by the LFTag.

    lakeformation:SearchTablesByLFTags

    SearchTablesByLFTags

    Lists glue tables and columns which have been tagged by the LFTag.

    glue:GetDatabases

    GetDatabases

    All databases in the specified catalog.

    glue:GetTables

    GetTables

    All tables in the specified catalog

    s3:ListAllMyBuckets

    ListAllMyBuckets

    List of all buckets owned by the user.



  3. Next, ensure the user has the Lake Formation governance level permissions in addition to the above mentioned IAM policy level permissions.

    Table 3.

    Resource Type

    Required Permission

    Purpose

    Database

    DESCRIBE

    To list databases in Lake Formation context.

    Table

    DESCRIBE

    To list tables within a database

    LF-Tag

    DESCRIBE

    To list available LF-Tags.

    DataCellsFilter

    DESCRIBE

    To list filters applied to tables.

    Catalog

    DESCRIBE

    To use ListPermissions, user often needs access to the catalog or explicit grants.



Obtaining access key for the user

Skip this step if you already have the access key for the user that you plan to use for running the collector. Detailed AWS documentation on this topic is available here.

To obtain an access key for the user:

  1. Login to the AWS portal and navigate to IAM service.

  2. Under Users, select the user that plan to use for the collector.

  3. On the Security credentials tab, click Create access key.

  4. Select Application running outside AWS. Click Next.

  5. Add the optional Description tag. Click Create Access key.

  6. Note down the Access key ID and Secret access key.

Setting up credentials file

Important

Skip this step if you already have the AWS CLI installed and credentials profiles file set up.

 

If you are authenticating to Amazon Lake Formation using the AWS credentials file, make sure you have the credentials file located at ~/.aws/credentials. For more details, refer to the AWS documentation.

To set up credentials file:

  1. Install the AWS CLI.

  2. From the command line, run aws configure. This stores the credentials to ~/.aws/credentials.