Skip to main content

Setting AWS Elastic Container Service (ECS) to automate collector runs

This page provides the steps for setting up data.world collectors on AWS Elastic Container Service (ECS). It guides you through creating a custom collector Docker image, deploying it to AWS Elastic Container Repository (ECR), and defining and running an ECS task. While this serves as a starting point, it does not cover every aspect of potential use cases, leaving room for customization and orchestration according to your specific needs and preferences.

Important

As a pre-requisite you should have the following configured and ready for use:

  • Docker Engine/CLI.

  • AWS CLI. The configured user must be able to deploy images to AWS Elastic Container Repository.

STEP 1: Set up the collectors

To manage configurations for on-premise collectors:

  1. Follow the instructions on managing configurations for on-premise collectors. For parameters that you want to set as environment variables, set the values as variables instead of real values.

  2. Note down the environment variables and directories to be created for the collector.

    1_finalize_your_power_bi.png
  3. Download the YAML file for running the collector.

    2_download_yaml.png
  4. At the end, note the command for running the YAML file.

    tableau_03.png

STEP 2: Extend the collectors Docker image to include the configuration file

To include the configuration file in your custom collector Docker image:

  1. Locate the downloaded configuration YAML file and create a Dockerfile in the same directory with the following contents:

    FROM datadotworld/dwcc:<collector_version>
    ADD <config-file-name> /dwcc-output/

    For example:

    FROM datadotworld/dwcc:2.186
    ADD config-snowflake.yml /dwcc-output/

    If you want to add multiple configuration files or specific JDBC drivers/other artifacts, use additional ADD lines. For example:

    FROM datadotworld/dwcc:2.186
    ADD config-snowflake.yml /dwcc-output/
    ADD config-oracle.yml /dwcc-output/
    ADD ojdbc11.jar /usr/src/dwcc-config/lib/
  2. In a terminal, navigate to the directory containing the Dockerfile and run the following command to build the custom Docker image for collectors:

    docker build -t <desired-image-name> .

    For example:

    docker build -t dwcc-ecs .

    Optionally, you may add a version tag with the base collector version number:

    docker build -t dwcc-ecs:2.186 .

STEP 3: Deploy the custom Docker image to an AWS Elastic Container Repository (ECR)

  1. Follow the instructions on how to create a private repository in AWS ECR. Name the repository with the same name you gave the custom collector docker image in the build step (i.e. dwcc-ecs). You can leave default options, or make changes as required by your AWS policies.

    3_create_repository.png
  2. In the AWS Console, on the newly created ECR Repository, click the View push commands button to open the push command wizard.

    4_click_view_push_commands.png
  3. The Push commands for dwcc-ecs window gives you the list of commands which you should run in a terminal on the local machine.

    5_push_commands.png
    1. Authenticate your Docker client to your registry.

    2. Skip the step 2 because the collector docker image has been already created.

    3. Tag the custom collector docker image so it can be pushed to the repository. If you added a version tag in the build step, change the first and second :latest tag to the value of the tag used in the build command. For example:

      docker tag dwcc-ecs:2.186 
      12345.dkr.ecr.us-east-1.amazonaws.com/dwcc-ecs:2.186
    4. Push the custom collector docker image to the AWS repository. If you added a tag to the ECR image in the previous step, change the :latest tag. For example:

      docker push 12345.dkr.ecr.us-east-1.amazonaws.com/dwcc-ecs:2.186
  4. After running, you should now see the custom collector docker image (dwcc-esc) in the Images list of the repository.

    6_dwcc_image.png
  5. In the Image URI column, click the Copy URI button to get the URI of the image, and note it for using in the ECS task definition.

STEP 4: Create the ECS task definition

  1. Go to the AWS Console and select the Elastic Container Service (ECS) console.

  2. In the ECS console, click on the Task definitions in the sidebar.

  3. Click the Create new task definition button on the upper-right corner to create a new task definition, and give the task definition a meaningful name.

    7_task_definitions.png
  4. In the Infrastructure requirements section, do the following:

    1. Select AWS Fargate as the Launch type.

    2. Important

      In the Operating system/Architecture dropdown list, choose the architecture matching the local machine where the docker image was extended and built. For example, if the custom image was built on an arm64 machine, choose Linux/ARM64.

    3. In the CPU dropdown list, choose at least 2 vCPU.

    4. In the Memory dropdown list, choose at least 8 GB.

    5. Choose a suitable Task execution role. By default, it is ecsTaskExecutionRole, linked to AmazonECSTaskExecutionRolePolicy.

  5. In the Container section, do the following:

    1. In the Name field, enter the meaningful name of your container.

    2. In the Image URI field, enter the image URI copied from the ECR repository. For example:

      12345.dkr.ecr.us-east-1.amazonaws.com/dwcc-ecs:2.186
    3. In the Environment variables section, enter the Key and Value of the variable by populating it from your noted configuration file. You may also use an environment file hosted on S3 to pass environment variables.

    4. In the Docker configuration section, in the Command field, enter the location of the configuration file in the container:

      --config-file=/dwcc-output/<config-file-name>

      For example:

      --config-file=/dwcc-output/config-snowflake.yml
  6. Set other configuration options as needed or accept defaults.

  7. Click Create to save the task definition.

STEP 5: Run the task

  1. Once you create the task definition, you are automatically redirected to the Task definitions > Revision page.

  2. In the Deploy dropdown list, select Run task.

    8_run_task.png
  3. In the Environment section, choose an Existing cluster to execute the task.

    9_choose_cluster.png
  4. Set other configuration options as needed or accept default.

  5. Next, click Create to run the task. You are automatically redirected to the Clusters Task page where the task begins executing.

  6. To watch the logs and see if there are any errors, in the Tasks section, on the Task panel, click on the task ID to go to the task configuration, and do the following:

    10_task_id.png
    1. Click on the Log configuration tab > View in CloudWatch.

      11_cloud_watch.png
    2. In the CloudWatch, click Start tailing.

      12_start_tailing.png

STEP 6: Confirm upload of the output to data.world

  1. Go to the data.world Datasets page and select your dataset.

  2. You should see that a new TTL file has been uploaded to the dataset, along with a text file latest_dwcc_log.txt.

STEP 7: Schedule the task